QM Flashcards

(80 cards)

1
Q

R^2

A

Mide la varianza explicada por el modelo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

AIC

A

Pediction o forecasting (particular models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BIC

A

Goodness of fit (Parsimonious - General model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Significancia

A

|statistic|> Critical Value
p-value<alpha
Reject de null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Conditional Heteroskedastic

A

Inflated statistics (Error Type 1), SE underestimated, biased coefficients, underestimated p-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Serrial Correlation

A

Inflated statistics (Error Type 1), SE underestimated, negative sc implica inconsistent coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multicolinearity

A

Inflated SE, se corrige ampliando la muestra, excluyendo más variables y/o usando proxys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logistic Regression

A

Outcome discreto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dick fuller Test

A

Unit root, donde H0: g=0 y Ha: g<0, g=b1-1, fail to rejcect implica Unit root (RW)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Seasonality

A

Significancia en un periodo o autocorrelation positiva en residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PCA

A

Dimension Reduction of highly correlated features (continous unsupervised)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression

A

Prediction (continous supervised)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression or classification complex non-linear data

A

CART, RF, NN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regression no complex non-linear data

A

Penalized Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Classification Labeled Data

A

Discreto supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Classification no complex non-linear data

A

KNN, SVM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Classification no Labeled Data

A

Clustering Continous Unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Clustering no complex non-linear data (#categories)

A

K-means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Clustering no complex non-linear data ( No #categories)

A

Hierichal Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Clustering complex non-linear data

A

NN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Overfitting

A

Variance error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Underfitting

A

Bias Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

SVM

A

Outlier detection, target variable binary, no defined hyperparameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

KNN

A

Clasifica nuevas observaciones encontrando similitudes en las existentes (k puntos más cercanos - #clusters), sensitive to local outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Ensamble Learning and RF
Prediction of a group of models combination (prevent overfitting)
26
Bootsrap Aggregating
Original training data set is used to generate n new training data sets
27
K-mean clustering
Centroid -> Cluster iteration until no observation is reassigned (hyperparameter = # clusters)
28
Hierachical Clustering
Agglomerate clustering (cluster de cluster)
29
Qué es CH
Var(errors) no constant
30
Medición CH
BP Test, T-test: Chi-cuadrada =nR^2 > Chi-cuadrada alpha implica CH
31
Qué es SC
Correlation of errors
32
Medición SC
DW < Critical Value implica SC BG > Critical Value implica SC
33
Qué es MC
Correlation entre IVs
34
Medición MC
VIF > 10, high R^2 y P-value=0.0000 implica MC
35
No residual autocorrelation
No significancia en todos los coeficientes o alguna autocorrelation difiere mucho de cero implica no reject de null que impica no SC
36
Testing Joint coefficientes hipótesis
H0: bi = 0 oara todo i (unrestricted model bi son las variables quitadas)
37
Unconditional heteroskedasticity
Errores no correlated
38
Autocorrelation T-stati
autocorrelationi/SE, no se usa DW
39
Random Walk
No cov stationary, b0=0, b1=1
40
First differencing
Convertir a cov stationary b0=0 y b1=0
41
RW drift
b0<>0, b1=0
42
Multiple TS (both reject DF)
No unit root
43
Multple TS (Only one reject)
Error no cov stationary que implica SE inconsistent
44
Multiple TS (Both unit root)
Test for cointegration with Engle-Granger
45
No cointegration
Error no cov stationary que implica SE inconsistent
46
Cointegration
OK! Long term economic relation exist
47
Prevent overfitting
Cross validation (form the same domain as training data)
48
K-fold cross validatrion
Data shuffled randomly divided into k sub samples (a mayor mejor con target variable specified)
49
Manage overfitting
trade off between cost and complexity
50
Neutral Networks (NN)
Input data (0,1) and weights chosen to minimize loss
51
Deep learning nets
Neutral Networks with many hidden layers (at least 3, typically >20)
52
Reinforcement Learning
Unsupervised: learns by testing new actions through millions of trials and errors
53
Data sets
Training, validation (tuning the model), test
54
Characteristics of data sets
Volume (quality of data), variety (structured and unstructured), velocity y veracity
55
Process of data sets
Task-collect data (curation)-preparation (wrangling)-exploration-training
56
invalid data
outside of meaning full range
57
Inaccurate data
no true values
58
inconsitent data
NY in Canada
59
non uniform data
jan-15 vs 15/01
60
extraction of data
new variable created from a current one
61
aggregation data
more than 1 variable aggregated into one
62
selection of data
columns that can be eliminated
63
Conversion of data
Change the type
64
Trimming
delete 1% de top y bottom
65
winsorization
replace outliers with min/max values
66
Scaling normalization
(Xi - min)/(max-min)
67
scaling standarization
(Xi - m)/sigma
68
Text cleansing
Remove caracteres innecesarios
69
Tokenization
Splitting text into tokens
70
Normalizing data
Lowercase o preposiciones
71
Stemming
analyzed to analyze
72
Lemmatization
anlayz to analyze
73
One hot encoding
decomposing de multiples a un solo valor
74
Exploratory analisis
.Charts correspondientes (structured y unstructured)
75
Feature selction
Eliminate uneed features
76
Feature engineering
creating a new feature and decomposing
77
Feature selection methods
Frecuency ratio (frequent underfitting vs sparse overfitting), Chi-square test, mutual information (distribución de un token a una clase)
78
Model training
Method selection, clssification and size of data
79
Precission
TP/(TP+FP), no caer en error type 1
80
Recall
TP/(TP+FN), no caer en error type 2