Practical Statistics Flashcards

1
Q

Deviations

A

The difference between the observed values and the estimate of location

errors, residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variance

A

The sum of squared deviations from the mean divided by n-1 where n is the number of data values

mean-squared-error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Standard Deviation

A

The square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mean absolute deviation

A

The mean of the absolute values of the deviations from the mean

l1-norm, Manhattan norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample statistic

A

A metric calculated for a sample of data drawn from a larger population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Distribution

A

The frequency distribution of individual values in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling distribution

A

The frequency distribution of a sample statistic over many sample or resamples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Central limit theorem

A

The tendency of the sampling distribution to take on a normal shape as sample size rises

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standard error

A

The variability (standard deviation) of a sample statistic over many samples (not to be confused with standard deviation, which by itself, refers to variability of individual data values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bootstrap sample

A

A sample taken with replacement from an observed data set

powerful tool for assessing the variability of a sample statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Resampling

A

The process of taking repeated samples from observed data; includes both bootstrap and permutation procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Confidence level

A

The percentage of confidence intervals, constructed in the same way from the same population, that are expected to contain the statistic of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interval endpoints

A

The top and bottom of the confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Error

A

The difference between a data point and a predicted or average value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standardize

A

Subtract the mean and divide by the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

z-score

A

The result of standardizing an individual data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard normal

A

A normal distribution with mean = 0 and standard deviation = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Tail

A

The long narrow portion of a frequency distribution, where relatively extreme values occur at low frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Skew

A

Where one tail of a distribution is longer than the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Trial

A

An event with a discrete outcome (e.g. a coin flip)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Success

A

The outcome of interest for a trial

“1” (as opposed to “0”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Binomial

A

Having two outcomes

yes/no, 0/1, binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Binomial Trial

A

A trial with two outcomes

Bernoulli trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Binomial distribution

A

Distribution of number of successes in n trials parameterized by p. Can be approximated by normal distribution with large n and p not too close to 0 or 1

Bernoulli distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Lambda
The rate (per unit of time or space) at which events occur
26
Poisson distribution
The frequency distribution of the number of events in sampled units of time or space
27
Exponential distribution
The frequency distribution of the time or distance from one event to the next event
28
Weibull distribution
A generalized version of the exponential distribution in which the event rate is allowed to shift over time
29
Treatment
Something (drug, price, web headline) to which a subject is exposed
30
Treatment group
A group of subjects exposed to a specific treatment
31
Control group
A group of subjects exposed to no (or standard) treatment
32
Subjects
The items (web visitors, patients, etc) that are exposed to treatments
33
Test statistic
The metric used to measure the effect of the treatment
34
Null hypothesis
The hypothesis that chance is to blame
35
Alternative hypothesis
Counterpoint to the null (what you hope to prove)
36
One-way test
Hypothesis test that counts chance results only in one direction (e.g. B is better than A)
37
Two-way test
Hypothesis test that counts chance results in two directions (e.g. A is different from B; could be bigger or smaller)
38
Permutation test
The procedure of combining two or more samples together and randomly (or exhaustively) reallocating the observations to resamples Randomization test, rand permutation test, exact test
39
Resampling
Drawing additional examples ("resamples") from an observed data set
40
p-value
Given a chance model that embodies the null hypothesis, the p-value is the probability of obtaining results as unusual or extreme as the observed results not "What is the probability that this happened by chance?"
41
Alpha
The probability threshold of "unusualness" that chance results must surpass for actual outcomes to be deemed statistically significant typically 5% and 1%
42
Type I error
Mistakenly concluding an effect is real (when it is due to chance)
43
Type II error
Mistakenly concluding an effect is due to chance (when it is real)
44
Multi-arm bandit
An imaginary slot machine with multiple arms for the customer to choose from, each with different payoffs, here taken to be an analogy for a multitreatment experiment Alters traditional sampling process to incorporate information learned during the experiment and reduce the frequency of the inferior treatment epsilon-greedy
45
Arm
A treatment in an experiment (e.g. "headline A in a web test")
46
Win
The experimental analog of a win at the slot machine (e.g. "customer clicks on the link")
47
Effect size
The minimum size of the effect that you hope to be able to detect in a statistical test, such as "a 20% improvement to click rates" Bigger the effect size, the fewer samples you probably need to detect it
48
Power
The probability of detecting a given effect size with a given sample size
49
Significance level
The statistical significance level at which the test will be conducted alpha
50
Response
The variable we are trying to predict dependent variable, Y variable, target, outcome
51
Independent variable
The variable used to predict the response X variable, feature, attribute, predictor
52
Record
The vector of predictor and outcome values for a specific individual or case row, case, instance, example
53
Intercept
The intercept of the regression line -- that is, the predicted value when X = 0 b_0, B_0
54
Regression coefficient
The slope of the regression line slope, b_1, B_1, parameter estimates, weights
55
Fitted values
The estimates Y_hat_i obtained from the regression lines predicted values
56
Residuals
The difference between the observed values and the fitted values errors
57
Least squares
The method of fitting a regression by minimizing the sum of squared residuals ordinary least squares, OLS
58
Root mean squared error
The square root of the average squared error of the regression (this is the most widely used metric to compare regression models) RMSE
59
Residual standard error
The same as the root mean squared error, but adjusted for degrees of freedom RSE
60
R-squared
The proportion of variance explained by the model, from 0 to 1 coefficient of determination, R^2
61
t-statistic
The coefficient for a predictor, divided by the standard error of the coefficient, giving a metric to compare the importance of variables in the model
62
Weighted regression
Regression with the records having different weights
63
Correlated variables
When the predictor variables are highly correlated, it is difficult to interpret the individual coefficients
64
Multicollinearity
When the predictor variables have perfect, or near-perfect, correlation, the regression can be unstable or impossible to compute collinearity
65
Confounding variables
An important predictor that, when omitted, leads to spurious relationships in a regression equation
66
Main effects
The relationship between a predictor and the outcome variable, independent of other variables
67
Interactions
An interdependent relationship between two or more predictors and the response
68
Conditional probability
The probability of observing some event (say, X = i) given some other event (say, Y = i), written as P(X_i | Y_i)
69
Posterior probability
The probability of an outcome after the predictor information has been incorporated (in contract to the prior probability of outcomes, not taking predictor information into account)
70
Covariance
A measure of the extent to which one variable varies in concert with another (ie similar magnitude and direction)
71
Discriminant function
The function that, when applied to the predictor variables, maximizes the separation of the classes Fisher's Linear Discriminant maximizes the "between" sum of squares relative to the "within" sum of squares
72
Discriminant weights
The scores that result from the application of the discriminant function and are used to estimate probabilities of belonging to one class or another
73
Logit
The function that maps class membership probability to a range from negative to positive infinity log odds
74
Odds
The ratio of "success" (1) to "not success" (0) The probability of an event divided by the probability that the event will not occur
75
Log odds
The response in the transformed model (now linear), which gets mapped back to a probability
76
Logistic Regression
Analogous to multiple linear regression but the outcome is binary. Is a special instance of a "generalized linear model" (GLM). Fit with Maximum Likelihood Estimation
77
Maximum Likelihood Estimation (MLE)
A process that tries to find the model that is most likely to have produced the data we see. Involves quasi-Newton optimization that iterates between a scoring step, based on the current parameters, and an update to the parameters to improve the fit
78
Recall
tp / (tp + fn) Sensitivity, TPR, hit-rate
79
Precision
tp / (tp + fp)
80
Specificity
tn / (tn + fp) True negative rate
81
F1 Score
Harmonic mean of the precision and recall 2 * Recall * Precision / (Recall + Precision)
82
ROC curve
The plot of the true positive rate (TPR, recall, y-axis) against the false positive rate (FPR, x-axis), at various threshold settings Some definitions use specificity (TNR) for the x-axis
83
Bias error
Error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting) More tunable parameters -> lower bias -> higher variance
84
Variance error
Error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modeling the random noise in the training data (overfitting) More tunable parameters -> lower bias -> higher variance
85
Convex and non-convex functions
convex: one minimum - important: an optimization algorithm(like gradient descent) wont get stuck in a local minimum non-convex: some up and down valleys (local minimas) that aren’t as down as the overall down (global minum) -optimization algorithms can get stuck in local minimum and it can be hard to tell when this happens
86
Kullback Liebler divergence
A measure of how one probability distribution diverges from a second, expected probability distribution KL-divergence
87
Kolmogrov Smirnoff test
A nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test)
88
ANOVA
Analysis of variation is a statistical method used to test differences between two or more means of variance
89
PCA
Principle Component Analysis - orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. - transformation is defined in such a way that the first principal component has the largest possible variance and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components - resulting vectors are an uncorrelated orthogonal basis set. - PCA is sensitive to the relative scaling of the original variables.
90
p-value principles
1. p-values can indicate how incompatible the data are with a specified statistical model 2. p-values do no measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold 4. Proper inference requires full reporting and transparency 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis