Statistics/ML Flashcards
Bonferroni
Alpha/m
Bagging
Take n bootstrap samples, fit model to each, take average
Random forest
Basically bagging with decision trees, but for each split, choose from random p/3, for example, of the covariates
Support vector machines
Maximize M subject to all but some number of points being farther away than M from decision boundary, with sum of distances for those that aren’t <= C
Newton Raphson
x_1 = x_0 - f(x_0)/f’(x_0)
Gradient descent
x_1 = x_0 - gamma * gradient of F
Logit
P(y=1) = e^(X beta)/(e^(X beta)+1)
log(p/(1-p)) = X beta
K nearest neighbors
Use plurality vote for classification, or mean for regression
Std error
Std deviation of a statistic’s sampling distribution , or an estimate of it, eg
(1/sqrt(n))sqrt(sum(xi - mean)^2)
Normal density
(1/sigma sqrt(2 pi)) e^( - (1/2)((x- mu)^2/sigma^2))
T test
Sample should be normal, but ok for large samples I believe.
Tau-hat/(se(tau-hat))
Eg difference divided by
Sigma hat * Sqrt (1/n1 + 1/n2)
Covariance of beta hat for regression
Sigma^2. (X’X)^(-1)
Estimate sigma with
1/(n-p). *. Sum of (y - X beta)^2
Law of large numbers
Lim as n -> inf
P(|mean(Y1,..,Yn) - mu|>= ep)
=0
Central Limit Theorem
Limit as n -> infinity
P(. (1/(sigma/sqrt(n)). *
(Mean(Y1,…,Yn) - mu) <= z). =
Phi(z)
Ie
(1/(sigma/sqrt(n)) times (Ybar - mu)
converges to a unit normal
Note that Ybar has std dev
sigma * sqrt(n)
SUTVA
Stable Unit Treatment Value Assumption
Response of one unit only depends on their treatment not on treatment of others
Eg if some people assigned to travel on public transportation, some in cars, then wouldn’t hold, because it affects the traffic
Kmeans
Start with initial k points.
Assign points to one of these.
Calculate new centroids
Repeat
SMOTE
For class imbalance:
select one of uncommon class at random,
then select neighbor in class at random,
draw line between them, choose something on line, assign uncommon value to that
You can also undersample majority class and/or oversample (i.e. repeat) minority class.
Accuracy
(TP + TN)/
(TP + TN + FP + FN)
Precision
TP/(TP + FP)
PREcision is TP divided by PREdicted positive
Sensitivity
TP/(TP + FN)
SeNsitivity is Positive
Correct positives among all positives
Specificity
TN/(TN + FP)
Also called recall
SPIN. sPecificity is Negative
Correct negatives among all negatives
Amazon recommendation systems, Netflix
item-based collaborative filtering algorithms, using cosine distance.
Type 1 and 2
Type 1 False Positive, ie mistaken rejection of null hypothesis
Type 2 False Negative
softmax
exp(zi) / (sum exp(zi))
imputing missing values
can just impute with mean or median etc. or some constant
“MICE stands for Multivariate Imputation via Chained Equations, and it’s one of the most common packages for R users. It assumes the missing values are missing at random (MAR).
The basic idea behind the algorithm is to treat each variable that has missing values
as a dependent variable in regression and treat the others as independent
(predictors).
https://www.r-bloggers.com/2023/01/imputation-in-r-top-3-ways-for-imputing-missing-data/
gbm hyperparameters
shrinkage or learning rate
add on only small multiple of new tree at each step
bag fraction
“fraction of independent training observations selected to create the next tree in the
expansion. Introduces randomness in the model fit; if bag_fraction < 1 then running the
same model twice will result in similar but different fits.”
num_features: “number of random features/columns to use in training model. “
interaction depth: max depth of each tree (although different in gbm3)
some of this from gbm3 documentation
boxplot definitions
` boxplot quantile 25 to 75, median, outer lines at most 1.5*IQR, outliers