# Intro Statistics Flashcards

central limit theorem

If x_bar is the mean of a random sample X1, X2, … , Xn of size n from a distribution with a finite mean mu and a finite positive variance sigma ², then the distribution of W = (x_bar -mu)/ (sigma/sqrt(n)) is N(0,1) in the limit as n approaches infinity.

This means that the variable is distributed N(mu,sigma/sqrt(n)).

binomial distribution

with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question

P(x=k) = (n,k) * p^k * (1 - p)^(n-k)

(n,k) = n! / (k! (n - k)!)

Mu = n*p Sigma = n*p*(1-p)

Accuracy

the proportion of true results (both true positives and true negatives) among the total number of cases examined.[

accuracy = tp + tn / (tp + tn + fp + fn)

Precision

precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances

precision = tp / (tp + fp)

Recall

recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances

recall = tp / (tp + fn)

type I error

a type I error is the rejection of a true null hypothesis (also known as a “false positive” finding)

a type I error is to falsely infer the existence of something that is not there

type II error

type II error is retaining a false null hypothesis (also known as a “false negative” finding)

a type II error is to falsely infer the absence of something that is

kullback liebler divergence

a measure of how one probability distribution diverges from a second, expected probability distribution

kolmogrov smirnoff test

is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test)

Bootstrap

statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.

Jackknife

The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations.

Permutation test

the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. In other words, the method by which treatme

Two tailed test

appropriate if the estimated value may be more than or less than the reference value, for example, whether a test taker may score above or below the historical average

One tailed test

appropriate if the estimated value may depart from the reference value in only one direction, for example, whether a machine produces more than one-percent defective products

Assessing normality

Subtract mean divide by variance, compare to standard normal values — nscore

Box plot

Categorical variables, shows shape of distribution, central value and variability

Median black center line

Box top bottom are first and third quartiles

Vertical lines 1.5 times IQR

Outside lines points shown

IQR

Inter quartile range

Distance between first and third quartiles

Two way table

two-way table presents categorical data by counting the number of observations that fall into

Correlation coefficient

R = 1/(n-1) * Sum( ((x-x_mean)/std_x ) * ((y-y_mean)/std_y)) )

ANOVA

Analysis of variance is a statistical method used to test differences between two or more means of variance

Parameter

parameter is a number describing a population, such as a percentage or proportion.

true proportion of defective items in the entire population

Statistic

is a number which may be computed from the data observed in a random sample without requiring the use of any unknown parameters, such as a sample mean.

takes a sample 300 items and observes that 15 of these are defective- computes the statistic , p_hat = 15/300 = 0.05 an estimate of the parameter p

Biased estimator

statistic is systematically skewed away from the true parameter p, it is considered to be a biased estimator of the parameter

Unbiased estimator

unbiased estimator will have a sampling distribution whose mean is equal to the true value of the parameter.