Machine Learning Flashcards

Question 1

Q

logistic regression

Answer

A

Below is an example logistic regression equation:

y = e^(b0 + b1x) / (1 + e^(b0 + b1x))

Where y is the predicted output, b0 is the bias or intercept term and b1 is the coefficient for the single input value (x). Each column in your input data has an associated b coefficient (a constant real value) that must be learned from your training data.

Question 2

Q

random forest

Answer

A

an ensemble learning method that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance. This comes at the expense of a small increase in the bias and some loss of interpretability, but generally greatly boosts the performance in the final model.

Bags both features (random subset) and trees (with replacement)

Question 3

Q

GINI impurity

Answer

A

a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Question 4

Q

LDA

Answer

A

a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

Question 5

Q

bias variance tradeoff

Answer

A

the problem of simultaneously minimizing two sources of error that prevent supervised learning algorithms from generalizing beyond their training set

The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

Question 6

Q

box-cox

Answer

A

power transformation

a useful data transformation technique used to stabilize variance, make the data more normal distribution-like

Question 7

Q

stochastic gradient descent

Answer

A

a stochastic approximation of the gradient descent optimization and iterative method for minimizing an objective function that is written as a sum of differentiable functions

Question 8

Q

AIC

Answer

A

Akaike information criterion (AIC)

k = number of estimated parameters
L = max like

AIC =2k-2*ln(L)

Leave-one-out cross-validation is asymptotically equivalent to AIC, for ordinary linear regression models

Question 9

Q

BIC

Answer

A

Bayesian information criterion

BIC = ln(n)k-2ln(L)

k = num parameters
L = max like
n = num observations

Question 10

Q

DIC

Answer

A

deviance information criterion

D(theta)=-2*log(p(y | theta))+C

p_D =  D_bar-D or
p_D = 1/2 var(D(theta))

DIC= p_D +D_bar

Question 11

Q

SVM

Answer

A

a hyperplane or set of hyperplanes in a high- or infinite-dimensional space. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier

Question 12

Q

kernel trick

Answer

A

The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all x and x_prime in the input space chi, certain functions k(x,x_prime) can be expressed as an inner product in another space V. The function k: chi x chi -> R is often referred to as a kernel or a kernel function.

Question 13

Q

Odds ratio logistic regression

Answer

A

ln(p(X) / 1 – p(X)) = b0 + b1 * X

Left side is odds ratio

Question 14

Q

Logistic regression assumptions

Answer

A

Binary output variable

No error in output variable y (remove outliers first)

Linear model (with non-linear transform on output) must transform data for non linear (box cox, log, root)

Must remove correlated inputs (use pairwise distance metric, correlation)

Fails to converge if too many colinear or data sparse

Observations independent

Large sample

Question 15

Q

Logit

Answer

A

Inverse of logistic (sigmoid)

Log odds when logistic represents a probability

Question 16

Q

Probit model

Answer

Study These Flashcards

A

type of regression where the dependent variable can take only two values, for example married or not married

Question 17

Q

k means

Answer

Study These Flashcards

A

k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster

Question 18

Q

extra trees

Answer

Study These Flashcards

A

same as RF except every split of the tree is random based on that random subsets range

Machine Learning Flashcards

(18 cards)