terms and definitions Flashcards

1
Q

statistical model

A

a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population)

a statistical model represents, often in considerably idealized form, the data-generating process

a statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

the problem of multiple comparisons

A

beware of whenever someone does many tests and picks one that looks good

eg we flip 1000 fair coins 100 times each; we then select the 10 “best” coins that came up heads the most, claiming these coins are “lucky”–we’ve no causal claim to the top 10’s favoritism of heads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

prior probability

A

in Bayesian statistical inference, prior probability is the probability of an event based on established knowledge, before empirical data is collected; “what is originally believed before new evidence is introduced”

e.g. consider classifying an illness, knowing only that a person has a fever and a headache. These symptoms are indications of both influenza and of Ebola virus. But far more people have the flu than Ebola (the prior probability of influenza is much higher than that of Ebola) so based on those symptoms, you would classify the illness as the flu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

posterior probability

A

in Bayesian statistics, the posterior probability is the probability after we’ve applied the conditioning event, ie after the desired “new” information has come in, to eg refine the probability distribution–to make an “adjusted guess”

a posterior can, in turn, become a prior, if we have, in turn, newer information that leads to its (the posterior-cum-prior) revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

margin of error (in context of samples)

A

Since a sample is used to represent a population, the sample’s results are expected to differ from what the result would have been if you had surveyed the entire population. This difference is called the margin of error. [eg in inferences on population means]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

t-statistic

A

usually in context of hypothesis testing between means (eg between two sample means, or between a sample mean and a population mean)

a general form for t statistics is,
t(ŷ) = (ŷ-y)/s.e.(ŷ),

ie the t statistic for point estimate ŷ is the recentered ŷ divided by the standard error for point estimate ŷ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

probability triplet; event space, sample space

A
  • a probability space, (O,A,P)
    • O the sample space (eg the real line)
    • A a sigma algebra of subsets of O (eg Borel sigma algebra)
    • P a measure, normalized as P(O)=1
  • subsets of O are called events; elements of A are called random events (ie can be measured)
  • if O is countable, we generally call A the event space
  • a random variable then maps events in the probability space to some associated state space (ie assigns values to events)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

state space

A

this involves the “separation” of random events and values assigned to those events

for each outcome in the sample space (eg the result of 10 coin tosses), we can assign a value; a random variable’s state space consists of these values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

variance

A

as the second centered moment:

var(X) = E( [X-E(X)]^2 ) = E(X^2)-(E(X))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

linear functions of a random variable

A

let f(X) = aX+b; then:
* E[f(x)] = aE(X) + b
* var[f(x)] = a^2 var(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

stastical inference

A

deducing properties of an underlying probability distribution from a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

statistic

A

property of a sample from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

point estimate

A
  • a point estimate, x_e, of a population parameter, x_a, is a best guess of the value of x_a
  • the bias of a point estimate is, bias = E(x_e)-x_a
    standard unbiased estimates include:
    • sample mean for the mean of any distribution
    • p_e=X/n for binomial B(n,p_a)
    • sum_i (x_i-E(x_i))^2 / (n-1) for the variance of any distribution
  • sampling distribution is the distribution of the point estimate derived from samples
  • relative efficiency for two different point estimates, var(x_e1) / var(x_e2)
  • mean square error for point estimate, E((x_e-x_a)^2) = var(x_e) + bias^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

standard error

A
  • standard error for a point estimate is the standard deviation of the sample estimate’s probability distributioneg for s.e. on the mean for n samples from a population with variance sig^2, s.e. = sig / sqrt(n)
  • standard error is often estimated from a sample, and called the standard error estimate (or just standard error)eg for estimted s.e. on the mean for n samples from a population, sig_e / sqrt(n-1), where sig_e the sample standard deviation (has Bessel correction)
  • typically for point estimates, the s.e. is estimated from a sample, and a special distribution (fitted to the case, such as Student’s t) is used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

error vs residual

A

error–an error term represents the way observed data differs from the actual population (eg the mean)

residual–a residual represents the way observed data differs from estimates based on sample population data (eg the sample mean), including a model’s prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

law of large numbers

A

the sample mean converges to the population mean as sample size increases

17
Q

error types (I and II) and power of a test

A
  • a table with rows H_0 accepted, H_0 rejected, and columns H_0 true, H_0 false
  • type I error corresponds to LLC–H_0 was falsely rejected; can occur with the problem of “too many tests”
  • type II error corresponds to URC–H_0 was falsely accepted; can occur when sample sizes are too small
  • power of a hypothesis test = 1 - probability of type II error; the probability the hypothesis was not falsely accepted (higher is better)
18
Q

Bonferonni (multiple comparisons)

A
  • for large numbers of comparisons, and arguably very conservative, amethod for reducing type I error (false rejection of the null, so eg for comparing population means, falsely assigning a difference in means significance)
  • simply divide the desired alpha level for the p-values (eg 0.05) by the number of tests run
  • see also Holm-Bonferroni, FDR, and FWER
19
Q

Simpson’s paradox

A

an association between two random variables appears in the population, but the association disappears when dividing the population into subgroups

20
Q

confidence interval

A
  • the range of possible values that the population’s result would be at the confidence level of the study–sample result +/- the margin of error
  • in most general terms, given point estimate x_e for population parameter x_a, and standard error estimate s
    • the point estimate, normalized by s, will follow some distribution, determined from the population distribution and parameter in question
    • the normalized p.e. distribution is then used to translate one- or two-sided confidence at some level (95%, etc.) into allowed range for for x_e-x_a
21
Q

sample variance

A
  • sig^2 = sum_i (x_i-E(x_i))^2 / n
  • for unbiased estimate of population variance (Bessel correction)
    sig^2 = sum_i (x_i-E(x_i))^2 / (n-1)
  • if X is normally distributed, then
    (x1-E(xi))^2 + … + (xn-E(xi))^2 = sig_p^2 [N(0,1)^2 + … + N(0,1)^1] = sig_p^2 Chi_{n-1}where sig_p^2 is the population variance, and Chi is a Chi-squared distribution
22
Q

moments of a distribution

A
  • the nth raw moment is E[X^n]
  • the nth central moment is E[(X-E(X))^n]
  • link to Fourier series
    • assume continuous random variable X with pdf f(x)
    • then the Fourier transform of f(x), F(s), can be expressed as a Taylor-like series with raw-moment coefficients:
      F(s) = sum E[X^n]((-2πis)^n / n!) [sic re 2πis–think of as terms of exp(-2πis)]
23
Q

moment generating function

A
  • the moment generating function, MGF, is a special function that generates the moments of pdf/pmf f(x)
  • the MGF is bijective with f(x)
  • MGF of r.v. X = E[e^{tX}] = 1 + tE(X) + t^2 E(X^2) / 2! + …
  • kth order differentiation of MGF allows extraction of the kth moment
  • c.f. characteristic function
24
Q

characteristic function

A
  • a special function that is bijectively determined from a pdf/pmf f(x)
  • CF of r.v. X = E[e^{itX}]
  • in the case of a pdf, the CF is the Fourier transform of f(x)
  • the characteristic function of a distribution always exists, even when the probability density function or moment-generating function do not
  • c.f. moment generating function
25
Q

maximum likelihood estimate

A
  • given unknown parameter y, observations {xi}_i, and pdf f(x,y), the likelihood of event [x1,…,xn] is,
    f(x1,y)…f(xn,y)
  • find y so that the likelihood function is maximized
  • note this may involve log likelihoods (ie take log of l.f.)
26
Q

method of moments

A
  • assume we’ve a population under a specific distribution, with parameters a_1,…,a_k: pdf = f(x,a_1,…,a_k)
  • suppose further, we can express the population moments (mean, variance, skewness, kurtosis, …) as a function of the parameters a_1,…,a_k;
    a simple example would be a uniform distribution with endpoint parameters a,b: ie f=1/(b-a) on the interval [a,b]; then mu = (a+b)/2; var = (b-a)^2 / 12
  • we then can create a system of equations, k equations in k unknowns, by setting the population parameter-moments equal to respecitve moment estimates derived from a population sample
  • solving this system will produce an estimate for the distribution
27
Q

stochastic process

A
  • eg discrete case, half-infinite, have a family of random variables, (X0,X1,…), indexed over the set N (naturals), defined over the common probability space (O,B,P) (O the sample space, B the event space), and all mapping into the same measurable space (O1,B1) (O1 the state space)
  • a random variable in the process may be considered dependent on both the index set (eg “time”), and some measurable set in O
  • a sample function aka realization aka sample path aka trajectory aka path function aka path is a single outcome of a stochastic process–ie a set of single-possible instances of each random variable; note that though all Xi are based in the same (O,B,P) space, they are not necessarily independent
  • stationary stochastic process–the joint distribution of subsets is invariant to shifts in the time index
28
Q

Chebyshev inequality

A
  • for random variable X with mean mu and variance sig,
    P(mu-c*sig < = X < = mu+c*sig) >= 1-1/c^2
  • note, this inequality is referenced to the true population variance (which we may not know)
  • may be useful for non-normal distributions, for rough inequality
29
Q

effect size

A
  • this is a general term, refering to the strength of the relationship between two variables
  • categories of measurements of effect size include
    • correlation–eg Pearson’s r
    • differences between means–eg Cohen’s d
    • categorical–for effect sizes among categorical variables, eg odds ratio
  • may be considered in context of statistical significance (eg highly significant, but with small effect size)