02 The Tech in FinTech Flashcards

1
Q

What is Fintech?

A

Fintech is the application of modern technology by established financial institutions or by new players

—Massive increase in computing power, storage capacity and connectivity in past decade

—Especially connectivity has become mobile, since mobile phones are basically available anywhere.

–> wearables and samrt phones have become widespread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Cloud computing?

A

Is the delivery of on—demand, off-site computing and storage resources.

- provided by third paties via offsite servers hosted in advance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the benefits of Cloud computing?

A

General benefits: flexible, scalable, accesible
- reduces up-front investments
- allows established institutions to outsource infrastructure and launch new systems and services

attenuating legacy technonolgy debt issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the factors contributoing to TechFin success?

A
  1. Already have a very broad userbase
  2. have the technological expertise

Large technilogy companies are increasingly entering the finance space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reasons why the amount of data for FI is increasing?

A
  • Digitization: from analog to digital makes collecting, transmitting, and analyzing data substainially easier
  • APis and Open Banking Regulation: they forces FI to share data, thus making collecting data easier
  • Computing and storage capacities: advanes in network bandwidth and processing power make working with large datasets feasible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reasons why the amount of data for FI is increasing? Last two reasons

A
  • Mobile devices: communication habits, health data, new mobile devices constantly collect data
  • New “Data Awareness”:
    Increasing awareness in the usefulness of data, combined with cheap storage space, lead institutions to collect rather than discard data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three different characteristics of Big Data in Finane?

A
  1. Large in scale
  2. High dimensionality
  3. Complex structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Preview of AI and ML?

What do they prodivide?

A

They potentially provide the means to analzyze the big datasets
–>Its all about finding correlations or complex relationships within data

–>(Weak or Narrow) AI: Machines mimicking human behavior when solving specific tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Drivers of recent advances in AI and ML?

A
  • Computing Power: was necessary to tacle complex problems of machine learning
  • Big Data: since many types rely on large datasets to train and evaluate
  • Algortihms: using ML algorithm is much more accessible today than only a few years ago.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Machine Learning?

A

AI gets better with more experience and as you feed it more data, AI does not need this feature.

–> ML uses tools form computer science and statistics

General use cases of ML MEthods: Classifciation, Clustering, Regression, Prediction, Dimensionality reduction
— Training AI without ML is difficult and requires a lot of work by experts
–>Generally speaking machine learning performs well when looking at (or for) non-linear relationshios in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Supervised Machine Learning

General description

A

Supervised learning maps an input to an output
–>it uses labeld data to lean about the mapping from input to output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Supervised Machine Learning

Some Statistical methods used

A

1. Classification: Predicting labels or classes of observations (discrete)
* Logistic regression
* Naive Bayes classifier
* Support-vector machines
* Decision trees

2. Regression: Predicting continous variables
* Linear and non-linear regression
* Ridge regression and least absolute shrinkage and selection operator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Supervised Machine learning

Short Process description

A

First dataset, the training data, is used to build a first model
* more data is used to validate and adjust the model
–>as more data is added, the algorithm learns by incorporating new information

–>The final model can then be applied to label unlabeld data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unsupervised Machine Learning

Description

A

Uses unlabeld data, with the aim to discover hidden, potentially interesting structure within the data
* subgroups and clusters
* Patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unsupervised Machine Learning

Advantages and Disadvantages

A

Advantages: (compared to Supervised Learning)
* unlabeld data is much easier to obtain, since no prior classification
* can identify patterns that may not be noticed by experts

Disadvantages: (compared to supervised learning)
* Usually requires even larger datasets
* Errors and anomalies that experts would have spooted might strongly impact the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unsupervised Learning

Some statistical methods used

A
  1. Clustering
    * Find objects that are similar by looking at distances (minimizing within distance, maximize between distance)
    –>Example tools: Hierachical methods, K-means clustering

2. Dimensionality Reduction
* Reduce large dataset to small dataset by focusing on important features
* Output may be used in other analyses or visualization
–>Example tpols: Principal component analysis (PCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reinforcement Machine Learning

Description

A

Achieve a goal within a certain environment
* exact rules are not provided
* No labeled data
–>Instead, receive reward/penalty signal as feedback and maximize reward function

18
Q

Reinforcement Machine Learning

Examples use cases

A

Robots that needs to fullfill a task like carrying a box from A to B
* reward based on how close it is to bringing the box to its destination
* penalty for crashing into a wall or dropping the box

Investment algorithm finding a strategy
* reward based on how much money it makes
* penealty for taking on extreme risk or for investing in sin industries

–>Good, when labels are not available but there is a known goal

19
Q

(Artifical) Neural Networks

Description

A

Often applied to supervised learning with very good results —> inspired by biological networks like the human brain
–>Neural Networks consists of different nodes/neurons that each
* take an input
* calculate the activation as the weighted sume of inputs (plus a bias term)
* apply an activation function to the activation
* and output the resulting value

20
Q

(Artifical) Neural Networks

Description 2, what are the tasks of the nodes?

A

The Nodes can have different tasks depending on their position
1. Input nodes: take the input data
2. (Multiple layer of) Hidden nodes: process the data further
3. Output nodes: provide the final output of the network (e.g. classification of picture, default prediction, stock price prediction)

21
Q

(Artifical) Neural Networks

How does the network learn?

A

The network learns by iteratively adjusting the weights between nodes to reduce the error
* Error= difference between network output and desired output
* A loss function based on the error is minimized by adjusting the weights (root mean squared error, logarithmic loss function)

22
Q

(Artificial) Neural networks

Issues and last comments

A

Neural networks can become very complex with thousands of input nodes and hundreds of layers

  • requires substantial computing power for learning
  • Prior dimensionality reduction can address this problem
23
Q

Problems and Risk of ML approaches

1

A
  • Offerfitting (algorithm might pick up particular patterns of the training data
  • Bad data, error, outliers: data errors and outliers intensify the problem of overfitting
    –>algorithm might not identify implausible outliers, human experts could
    –>Biases in training data selection
  • Correlation versus causation
  • Discrimination based on protected attributes (existing biased (gender, age) might be picked up and reproduced by the algorithm
    –>in many jurisdictions, unintended discrimination (“disperate impact”) is still illegal
24
Q

Problems and Risks of ML approaches

2 privat and systematic risk

A
  • Privacy and data protection:
  • ML algorithms might use sensitive data at large scales,
  • thus the outcome of ML might (even unintentionally) uncover sensitive patterns in the data
  • New sources of correlated and systematic risk:
    ML applications might lead to new, unexpected, and unintended form of interconnectedness between institutions and risks
    –>Big data and ML applications are costl and require expertice
    –>Economies of scale might lead to concentration within a few players in the industry
25
Q

What does deep learning mean?

(Artifical) Neural Networks

A

Deep learning or deep neural network is a (artifical) Neural Network that consists of multiple hidden layers of nodes
= is a former machine learning that uses algorithms that work in layers inspired by the structure and function of the brain. Structure called artificial neural network can be used for supervised, unsupervised or reinforcement learning

26
Q

Three different characteristics Big Data

Large in scale

A
  1. Large in Scale
    * one day of trading consists of 1.000 observations
    * alternative data sources: Social Media data like Twitter, Facebook
    * Images, Videos, Audio

–>Data may also be large relative to what is usually available
–>a few individuals, but their entire browsing and communication history
–>Data is available at frequenvies higher than unsual or even in real-time

27
Q

Three different characteristics Big Data

High dimensionality

A
  1. High dimensionality
    * a high dimensionality datadet contains relatively many variables
    * these variable could be collected or result from interacting other variables

The curse of dimensionality: the required sample seize grows exponentially in the dimensions of the model
—>especially severe machine learning approaches were we spilt the data into training, validation and testing datasets

Dimensionality reduction sometimes neccessary: extract important characteristics of the dataset and thus reduce the seize

28
Q

Three different characteristics Big Data

Complex Structure

A
  1. Complext Structure
    * Structured datasets (= information fits neatly in a table)
    * exchange trading data, accounting data, economic data
    * census and demographic data
    * ERP and CRM systems
    * Consumption and purchase history of individuals
    * Sensors, logs, metadata, location data
    *
    Unstructured data
    * Social Media: Tweets, Facebook, Reddit
    –>Images and Videos: Satelite imagery, surveillance cameras
    –>Text and voice: Company announcements, earnings calls, web crawling, e-mails, insurance claim documentation, customers feedback
29
Q

Complex structure vs. instructed —> challenges by change

A

unstructured data is more abundant and less explored, but harder to analyze
Structured data can be extracted from unstructured data using data analysis

–>New lack of data structure brings new challenges: data might be hard to collect, store, transer, and analyze

30
Q

Three different characteristics Big Data

Complext structure: Regulatory requirmeents

A

Regulatory requirements:
* Regulaton especially regarding pricacy and data protection, puts limit on what data can be collected and used
* (Financial) Regulation is also a driver of big data by requiring transparency and auditability

31
Q

Challenges in working with Big Data

A

security of information: sensitive information has to be protected from misuse and theft, which is difficult for complext, potentially distributed systems
Quality of data: more data is not necessariyl better data
- assessing the quality of data becomes more difficult for large datasets, especially for unstructured datasets

Infrastructures, silos, and data integration: an institutoion might have large amoints of data, but it might be spread over IT systems across divisons
Analysis and actionability
- The data must be analyzed to inform decision-making or to automate processes
- conventional methods of data analysis might not be approppriate for big data

32
Q

Problems and Risk of ML ( Blackbox)

A

Transparency, Auditability, and Explainability
* ML oftentimes behave like a black box, especially deep neural network
–>The outcome can be evaluated but the process of getting to the outcome might not be understood
—>There might be unintended risks in not knowing how a conclusion was drawn
–>might also pose new cyber security risks

33
Q

Explainable AI (XAI)

A

Black box approaches are difficult to understand, hard to audit, and may lead to unexpected results
* as a consequence, regulators are increasingly requiring financial institutions to be able to explain how a model arrived at a decision
* Bank of England: “Explainability means that an interested stakeholder can comprehend the main drivers of a model-driven decision.”
–>There are several ways of adding explainability to AI models

34
Q

Financial Applications of AI and ML

Front office and customer facing

A
  • Credit scoring: take alternative data (social media, location history, online reviews,..) into account when calculating creditworthiness
  • Insurance claims management: Automate the process of claims management for small loans like cars insurance
  • Chatbots and voice assistants: unse natural language processing (NLP), typically trained on a neurel netwrk
    — reduce workload of customer Service centers,
    — can provide faster answers to typical customer questions and
    — may solve problem automatically
  • Highly personalized services by leveraging customer data
35
Q

Financial Applications of AI and ML

Middle and back office applications

A
  • Robotic process automation:
    Replace simple and repetitive day—to—da work by trained algorithms that mimicked behavior of human administrative staff
  • Risk management:
  • validation and back testing of risk models at higher frequencies
  • Stress testing of risk management safeguards
  • Fraud detection by real-time monitoring of payment patterns and additional data
  • Compliance:
  • detect fraud or terroism financing by real-time monitoring of payment patterns
  • Monitor communication of traders for misbehavior and illegal activity
36
Q

Financial Applications of AI and ML

Trading and Portfolio management

A
  • Trade execution: estimate and optimize price impact of large trades
  • Portfollio management
  • FInd new and unexploited relationships between firm characteristics and prices
  • Identify new trading signals by utilizing alternative data sources
37
Q

Advantages of Chatbots and voice assistants

A

–>They use natural language processing and are typically trained on a neural network
* reduce workload of customer service centers
* provide faster answers to typical customer questions
* may solve some problems automatically

38
Q

What is unintended discrimination?

in the context of the potential risks of machine learning.

A

Unintended discrimination may occur if the decision based on an algorithm is different for groups with different protected attributes, even when these attributes are not directly considered or used as inputs for the algorithm

39
Q

Example of unintended discrimination?

A

for example be because the algorithm picks up biases in the training data or if the algorithm uncovers a strong (but maybe unknown) correlation between a factor / a set of factors and a protected attribute.

40
Q

What is intended discrimination?

A

Intended discrimination: Explicitly discriminating based on protected attributes, such as having a rule not to provide services to minorities, the elderly,

41
Q

What is the unit cost of intermediation?

A

The unit cost of financial intermediation ist the ratio between the Income of financial intermediaries and the number of intermediated assets.

the relative unit cost has stayed remarkably constant in many countries. It was thus argued that the cost decrease was not passed on to the end users, FinTech corrected this mistake and brought up many new technologies.