Quantitative Analysis Flashcards

1
Q

Token
tokenization

A

Word
Splitting a sentence into words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Document term matrix

A

Convert unstructured data into structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

5 steps of data analysis

A

Conceptualization of modeling task
Data collection
Data preparation and wrangling
Data exploration
Model training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Errors reduced by data cleansing

A

Missing, invalid, non-uniform and inaccurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Normalization and Standardization

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Parsimonious model

A

Parsimonious models are simple models with great explanatory predictive power. They explain data with a minimum number of parameters, or predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Techniques of feature engineering

A

Numbers - four digit number usually associated with years and are assigned number4

N-grams - multiword patterns ex expansionary_monetary_policy

Name of entity (NER) - Microsoft > ORG

Parts of speech (POS) - Microsoft > proper noun, 1969 > cardinal number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Feature selection methods

A

Frequency - number of documents with that token divided by total number of documents (document frequency DF)

Chi-square - rank tokens by usefulness to a class

Mutual information (MI) - if a token appears in all classes it is not considered useful discriminant and equals to 0.
Tokens associated with 1 or fewer classes would have a MI approaching 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Steps of data exploration

A

1 exploratory data analysis

2 feature selection

3 feature engineering
One-hot-encoding (OHE) - transform categorical feature into a binary variable for machine processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is overfitting?

A

Issue with a supervised ML that results when a large number of features (indep. Variables) are included in the data sample. It will decrease the accuracy of model forecasts on out of sample data (they do not generalize well to new data - low out of sample R2 )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 tasks of model training?

A

1 method selection
Supervised learning - support vector machine (SVM) and Neural Networks (NNs)
Unsupervised learning - clustering, dimension reduction, anomaly detection
type of data
Numerical data - classification and regression trees (CART)
Text data - generalized linear model (GLM) and SVMs
Image data - NNs and deep learning methods
Size of data - large data SVMs and NNs work better with large number of observations and few features

2 Performance evaluation

3 tuning- implement changes to improve performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to divide data set for supervised learning in model training process?

A

60% for model training
20% model validation and tuning
20% test out of sample performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model fitting erros can be caused by:

A

Size of training sample (small data sets)
Number of features (small > underfitting, large > overfitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The three tasks of model training are:

A

1 method selection
Supervised (training data contains ground truth or known outcome) or unsurpervised learning (no target available)
2 Type of data
Numerical data (CART methods)
Text data (GLMs)
Image (Neural Networks and deep learning)
3 Size of data
Large data sets with many observations and features (SVMs)
Large number of observations and few features (NNs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is error type 1 and 2

A

Type 1 are false positives
Type 2 are false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Formula of model accuracy and F1 score

A
17
Q

Formula precision and recall

A
18
Q

AUD/GBP 1.5060 - 1.5067
1 mm GBP and 1 mm AUD
Apply up the bid and multiply
Down the ask and divide

A

1 GBP X 1,5060
1 AUD x 1,5067

19
Q

Z Statonato cpf 68%, 90%, 95%, 99%

T statistic of 90%, 95%, 99% os more ir less Z statistic

A
20
Q

R2 or R2adj is better? Why?

A

R2 always increases with the addition of variables and it may cause overfiting.
R2adj

21
Q

Effect of model misspecification

A
22
Q

Assumptions de regressão multipla

A
23
Q

What is heteroskedasticity type 1 and 2?

A
24
Q

What is serial correlation? What are the implications?

A
25
Q

What is serial correlation? What are the implications?

A
26
Q

How to detect serial correlation?

A
27
Q

What are the implications of multicolinearity?

A
28
Q

How to detect multicollinearity?

A

Test F or

29
Q

What is? Effect? Detection? Correction?
Conditional heteroskedasticity, serial correlation and multicollinearity

A
30
Q

What is outlier and what is high leverage point

A
31
Q

What is the rmse criterion?

A
32
Q

How to calculate mean reverting level?

A
33
Q

ARCH
What is ARCH, its effect and how to correct it.

A

Autoregressive conditional heteroskedasticity exists when the variance of the residuals from a period depends on the variance of the residuals from previous period.

34
Q

How to test serial correl in AR model? And how to fix it?

A

Can’t use DW
Use t-test on residual autocorrelation. Add a lag , seasonal lag

35
Q

ML - relation btw complexity and vias / variance

A
36
Q

ML - What is generalization

A

ML model capacity to make accurate out of sample predictions

37
Q

What is bagging? Why it is important?

A
38
Q

Como calcular accruals ratio e aggregate accruals

A

Aggregate accruals = NI - (CFO + CFI)