# Statistics for Dummies Flashcards

1
Q

What % constitutes a high response rate to a survey?

the number of respondents divided by the number of surveys sent

A

70%
———-

Some statisticians will settle for nothing less.

Rarely does it get that high, but the lower the response rate the less credible the results

A 20% (say) rate could easily mean that more people in the population feel differently than the respondents.

2
Q

In the research statistics process, there are 6 main steps. Starting with:
1) what is the question to be answered?
2) design a study
3) determine the group of people to be studied

What are the next 3?

A

4) collect the data
5) organise, summarise and analyse the data
6) draw conclusions from the summaries and graphs to answer the question

3
Q

What is a ‘Parameter’?

A

Usually we only have a statistic from a sample, which is then said to ‘estimate the parameter

4
Q

What is a ‘Statistic’?

A

data from a population is a ‘census’, and data summarised to a stat from a census is called a ‘parameter’ because it refers to the population.

5
Q

What are ‘Categorical Data’?

A

Sometimes categories are recorded using numbers, like 1 for male and 2 for female but they don’t have any specific meaning

6
Q

What are ‘Numerical Data’?

A

Also referred to as Quantitative or measurement data.

Examples: Height, Weight, IQ, Blood pressure

7
Q

What is a ‘standard score’

A

Like +2 or -1

8
Q

What is the ‘central limit theorem?

A

The ‘crown jewel’ of all statistics’

9
Q

What are Z-Values

A

The distribution is then called a ‘standard normal distribution’ or ‘Z-Distribution’

10
Q

What is a ‘Standard Normal Distribution’

A

It is the Z-Distribution

Useful for determining percentiles, and what data fall between two values

11
Q

What is a ‘Z Distribution’?

A

It is the ‘Standard Normal Distribution’

Useful for determining percentiles, and what data fall between two values

12
Q

a single sentence that describes an ‘Experiment’

A

And often their environment.

The purpose is to pinpoint a cause-and-effect relationship between two variables (drug vs health or placebo vs health)

13
Q

What is a ‘blind experiment’

A

Where bias is controlled because the subjects do not know if they are in the control group or the treatment group

14
Q

What is a ‘double blind experiment’?

A

Where bias is controlled on the part of both patients and researchers because none of them know

15
Q

What is the purpose of sample statistics

A

To produce an ‘estimate’ of a population parameter

16
Q

Explain a 10% Probability vs. ‘9 to 1 odds’

A

If a horse has a 1 in 10 chance of winning then it is 1/10 so 0.10 = 10%

But bookies use 9/10 vs 1/10. So the 10’s cancel out leaving 9/1 or ‘9 to 1’

17
Q

Describe the ‘law of averages’

A

In the long term, results will average out to their expected value. In the short term no one knows what will happen.

18
Q

In hypothesis testing, a ‘statistically significant result’ is related to chance in what way?

A

it means a result with a small probability of happening by chance

19
Q

What is the range for a p-value?

A

Between 0 and 1
——

a small p-value indicates strong evidence AGAINST the null hypothesis

20
Q

What does a small p-value indicate?

A

strong evidence AGAINST the null hypothesis
———

p-values are between 0 and 1

21
Q

On a distribution graph or a histogram, if a long tail of data is on the right, which way is the data skewed?

A

skewed to the right

22
Q

On a distribution graph or a histogram, if a long tail of data is on the left, which way is the data skewed?

A

skewed to the left

23
Q

Does a histogram deal with numerical or categorical data?

A

You need to decide your own groups to put the numbers into

24
Q

What does the empirical rule state?

A

In a normally distributed data set, about 68% of values are within 1 SD of the mean, 95% within two standard deviations and about 99.7% within three SD
——-
Following from Chebyshev’s inequality, it seems that even in non normal sets, about 89% of values lie within 3 SDs

25
Q

How do you calculate a percentile? E.g 90th percentile of n values?

A

Order the data ascending. Multiply n x kth percentile desired. Round UP the result to nearest whole number. That number in the n set represents the kth percentile
——

Eg n is 1,2,3,4,5,6,7,8,9,10,11

Kth sought is 90%

11 * .9 = 9.9

Round up = 10

So 10 is at the 90th percentile

26
Q

Is probability calculation effective in predicting short term behaviour?

A

No, it is effective when predicting long term behaviour

27
Q

If only two outcomes are possible, what are the chances of one of them occurring?

A

It’s not necessarily 50/50.

The chances are based on the probability of the particular event, not because there are only two outcomes (otherwise every penalty goal attempt would be 50/50

28
Q

Is probability calculation effective in predicting short term behaviour?

A

No, it is effective when predicting long term behaviour

29
Q

If only two outcomes are possible, what are the chances of one of them occurring?

A

It’s not necessarily 50/50.

The chances are based on the probability of the particular event, not because there are only two outcomes (otherwise every penalty goal attempt would be 50/50

30
Q

Any probability is a number between what and what..?

A

0 means impossible
1 means certain

31
Q

All of the probabilities for all possible outcomes must add up to..?

A

the probability that an outcome does NOT happen is 1 minus the probability that the outcome DOES happen

32
Q

in probability, what is an event a combination of..?

A

equal to the sum of the probabilities of the individual outcomes that make up the event

33
Q

What is a ‘confidence interval’

A

A statistic, plus or minus a margin of error

34
Q

How can you roughly work out the sample size needed to achieve a particular desired margin of error ?

A

1 / √n * 100 gives the + or - error so * 2 gives the margin.

In reverse pick your margin of error , say 5% / 2 = 2.5% / 100 = 0.025

(1/0.025)^2 = 1600 is the rough estimate of sample size needed for a +/- 2.5% margin

Presumably assuming normally distributed population and no bias in the sample