Exam 1 Flashcards

1
Q

statistics

A

scientific analysis of data in whose generation randomness/chance played some part

necessary to handle data since math isn’t enough because randomness is unavoidable in science

ex. how does the weight of a baby depend on its age? how does BP depend on genetics? Monty Hall Problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

direction: probability is _____, while statistics is ______

A

prob is deductive. starts with if statement… then some prob calculation about data. ->

stat is inductive, starts with data and relevant prob calcs to make a statement about reality. <-

zig-zag relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

probability notation

A

events: A, B, …

probability that event A will happen = Prob(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

derived events

A

A u B: union, A or B or both

A n B: intersection, both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

mutually exclusive events

A

F and G are mutually exclusive if they cannot happen in the same experiment. If F, G…H are mutually exclusive, P(F u G u … H) = Prob(F) + Prob(G)… + Prob(H).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

independence

A

F and G are independent iff the probability of their intersection, Prob (F n G) = Prob(F) x Prob(G).

We often assume independence.

When F and G are independent, the conditional probability formula cancels out by filling in the prob of union equation. AKA F and G are independent if G having occured doesnt change the prob that F occurs.

If two events are mutually exclusive, they cannot be independent The oppositie may or may not be true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

conditional probabilities

A

the probability that F occurs given that G has occured

Prob (F | G) = Prob(F n G) / Prob(G)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

discrete and continuous random variables

A

concept of the mind numbers

discrete RV: a conceptual and numerical quantity which in some future experiment will take on or other of a discrete set of values with known or unknown probabilities (only take one or other of a set of discrete values, usually 0, 1, 2, 3…)

continuous RV: can take any value in a continuous range of values RVs are written with upper case letters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

data

A

the observed values of RVs after the relevant experiment has been carried out. written with lower case letters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

presenting probabilities

A

tableau with possible values of X and probabilities of X (probs sometimes known, sometimes unknown)

or graph (known)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

parameters

A

some usually unknown numerical value, denoted with a greek letter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

discrete RV notation

A

Prob( X = vi) is a shorthand for “the prob that, once the exp is carried out, the observed value x of X will be vi”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

conditions for binomial distribution

A

1) we conduct a fixed number of trials
2) there are only two possible outcomes on each trial. We call these success and failure
3) the outcomes of the various trials are independent
4) the probability of success, θ, is the same on all trials RV of interest is the number of successes, X, in n trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

third method of presenting probability distribtuion

A

(n x) is the # of orderings in which there are x successes in the n trials = n! / (n-x)!x!

the second part is the probability of getting x successes in n trials in a specified order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

mean

A

NOT AN AVERAGE, a parameter, denoted μ, for binomal distribution, μ = nθ

it is the balance point (where areas under the curve are equal)

mean is a proberty of the being

MUST KNOW LONG AND SHORT FORMULAS FOR MEAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

average

A

calculated from data, estimates the mean, denoted x̄

precision depends on sample size and the propoerties of the RV X̄, the concept of the mind average before we roll the die

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

variance

A

always denoted σ2

it is a meausre of the spread outness of the probability distribution of X relative to the mean of X

for binomial distribution var of X = nθ (1 - θ)

important because the precision of x̄ as an estimate of μ depends on the variance of the RV X̄

MUST KNOW LONG AND SHORT FORMULAS FOR VARIANCE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

complementary events

A

Ac is the event that A did not happen.

Prob (Ac) = 1 - Prob (A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

standard deviation

A

σ = sqrt (variance of X)

often more useful in practice than variance bc its not in squared units

20
Q

deductive and inductive example

A

if a newborn is equally likely to be a boy as a girl, then in a well-conducted, representative, unbiased (WRU) sample of 40,000 newborns, then probability of getting between 19800 and 20200 boys is .95

In a WRU sample of 40K newborns we saw 20288 boys. Possible reasonable stat induction is that is is not equally likely that a newborn is a boy as a girl.

21
Q

probability of union equation

A

P(F u G) = Prob(F) + Prob(G) - Prob( F n G)

22
Q

3 main activities of stat

A

1) estimating the value of a mean using an average
2) assesing the precesion of the estimate
3) testing hypotheses about a mean or multiple means

23
Q

theory for many RVs is necessary because

A

we need it to make proper statistical inferences about data x1…xn

24
Q

n

A

notation for sample size

DIFFERENT from k

for a die, k = 6 (6 possible values when you roll it)

n = # of times you will roll it

25
Q

iid

A

independently and identically distrubuted

we assume iid unless stated otherwise

26
Q

Tn

A

mean of Tn = nμ

variance of Tn = n σ2

MUST KNOW ABOVE FORMULAS

27
Q

A

mean of X̄ = μ

variance of X̄ = σ2/n

MUST KNOW ABOVE FORMULAS

(larger n -> smaller variance. ex. JMP exercise)

relevance to stat: if we estimate the numerical value of a mean μ by a data average x̄, the precesion of the estimate depends on the variance σ2/n of X̄. this can help us make stat inductions.

28
Q

differences

A

D = X1 - X2

an RV

mean of D = 0 (or = μ1 - μ2)

variance of D = 2 σ2 (or = σ12 + σ22)

MUST KNOW ABOVE FORMULAS

29
Q

proportion of successes (P)

A

need to use P to make a fair comparison between different sample sizes

If X is the # of successes in n binomial trials, P = X/n

mean of P = θ

variance of P = θ(1 - θ) /n

MUST KNOW MEAN AND VARIANCE OF P

30
Q

density function

A

for continuous RVs we allocate probabiliites to ranges of values using the RV’s density function

31
Q

normal distribution

A

the continuous RV X has a normal distribution with mean μ and variance σ2 if its density function is as follows in the pic

creates a bell shaped curve

not one single distribution but a family of them

sums, averages and differences of iid RVs all have a normal distribution

32
Q

Z

A

a normal distribution of specific focus with μ = 0 and σ2 ​= 1

probabilities are evaluated using a chart

33
Q

mean and variance of a continuous RV

A
34
Q

standardization procedure or Z-ing

A

(X - μ) / σ ​= Z

35
Q

two standard deviation rule

A

gives you a 95% confidence interval

approximated to 2 from Z < 1.645 and -1.96 < Z < 1.96

often used in stat to make probability statements about the RVs sum, average, X, and P

36
Q

the central limit theorem

A

the reason we consider normal distribution in detail

says that if the RVs X1…Xn are iid, no matter the probability distribution of those RVs, the average X̄ and sum Tn both have approximately normal distributions and the approximation gets more accurate as n increases

allows us to use magic formulas with the many stat procedures that deal with n binomial trials

37
Q

die graphs

A

show 5 things:

1 + 2) formulas for mean and variance of sum

3 + 4) formulas for mean and variance of average

5) central limit theorem: sums and averages have approx normal distributions

38
Q

estimate of θ

A

we estimate θ by proportion of successes

p = x/n <- this is the estimator

p is an unbiased estimate of θ aiming at the right target

the properties of this estimate depend on the properties of the RVs (the mean of P = θ)

39
Q

how accurate is the estimate of θ?

A

depends on the variance θ(1 - θ) /n of P

flip the 2SD rule using the 10 yard rule and substitute θ for p (since p is an estimate of θ) to create the 95% confidence interval

40
Q

conservative 95% confidence internval for p

A

derived from the idea that p( 1 - p) is about equal to 1/4

its approximation is always a bit wider than if you use the usual interval

41
Q

margin of error

A

= 1/ sqrt(n)

can be used to determine n (and how precise the poll is)

larger n -> more precise but more expensive

if you want to be twice as accurate, you need four times the sample size

42
Q

99% confidence interval

A

papers often don’t specify if their margin of error was determined by the 95 or 99% confidence interval

43
Q

JMP

A

proof of principle that the average as an estimator gives a good idea of the mean

44
Q

conservative 99% confidence interval

A
45
Q

fair die rolled once

A

the mean of the number to turn up is 3.5 and the variance of the number to tun up is 35/12

46
Q

probabilities for X when flipping a coin and n = 2, X = # of heads

A