Statistics Flashcards

(61 cards)

1
Q

What is a population?

A

The whole set of items that are of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a census?

A

Observes or measures every member of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a sample?

A

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The advantages and disadvantages of a census

A

Advantage:
- It should give a completely accurate result
- Lots of data

Disadvantages:
- Time consuming and expensive
- Cannot be used when the testing process destroys the item
- Hard to process large quantities of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The advantages and disadvantages of a sample

A

Advantages:
- Less time consuming and expensive
- Fewer people have to respond
- Less data to process

Disadvantages:
- The data may not be accurate
- Sample may not be large enough to give information about the whole population or of small subsets of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a random sample?

A

It is a type of sampling where every member of the population has an equal chance of being selected.
The sample should then be representative of the whole population.
Random sampling also helps to remove bias from a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to carry out a simple random sample?

A
  1. Create a sampling frame (each person is given a number)
  2. One of the numbers is chosen at random by:
    - Generating random numbers
    - Lottery sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is systematic sampling?
How to carry out systematic sampling?

A

When the required elements out of a sampling frame are chosen at regular intervals from an ordered list.

  1. If you want a sample size of x people out of y population, you would take every (y/x=n) nth person. However, the first person has to be randomly chosen.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is stratified sampling?

What should the proportion of each strata sampled be?

A

When the population is divided into mutually exclusive strata (e.g. males and females) and a random sample is taken from each.

The proportion of each strata sampled should be the same - (number in stratum/number of population)*total population size - to work out how many people to sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Advantages and disadvantages of simple random sampling

A

Advantages:
- Free of bias
- Easy and cheap to implement for small population and samples
- Each sample unit has a known and equal chance of being chosen
Disadvantages:
- Not suitable when the population or the sample is large
- A sampling frame is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantages and disadvantages of systematic sampling

A

Advantages:
- Simple and quick to use
- Suitable for large samples and large populations
Disadvantages:
- A sampling frame is needed
- It can introduce bias if sampling frame is not random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages and disadvantages of stratified sampling

A

Advantages:
- Sample accurately reflects the population structure
- Guarantees proportional representation
Disadvantages:
- Population must be cleanly classified into distinct strata
- Selection within stratum suffers from the same disadvantages as simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is quota sampling?

A

An interviewer or researcher selects a sample that reflects the characteristics of the whole population, determines the proportion of the sample for each group, then goes out and fills the quota with whoever they can find

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is opportunity sampling?

A

Taking the sample from people who are available at the time of sampling and who fit the criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Advantages and disadvantages of quota sampling

A

Advantages:
- Allows a small sample to still be representative of the whole population
- No sampling frame needed
- Quick, easy and inexpensive
- Allows for easy comparison between different groups within a population
Disadvantages:
- Non-random sampling can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Increasing the scope of the study increases number of groups, which adds time and expense
- Non-responses are not recorded as such

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages and disadvantages of opportunity sampling

A

Advantages:
- Easy to carry out
- Inexpensive
Disadvantages:
- Unlikely to produce a representative sample
- Highly dependent on individual researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is quantitative data?

A

It is associated with numerical observations/data.
e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is qualitative data?

A

It is associated with non-numerical observations.
e.g. eye colour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is continuous data?

A

A variable that can take any value within a range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is discrete data?

A

A variable that can only take on certain values within a range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Where is Leuchars?

A

Southern Scotland

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Where is Leeming?

A

Northern England

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Where is Heathrow?

A

London area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Where is Hurn?

A

South West of London, on the coast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Where is Camborne?
South-West tip of the British Isles
26
Where is Jacksonville?
Florida
27
Where is Beijing?
China
28
When is mode, mean and median used?
Mode - when the data is qualitative Mean - used for quantitative data, it gives a true measure of data but is affected by extreme values Median - Used for quantitative data but is not affected by extreme values
29
For data in a frequency table, what is the mean?
Add up (midpoint in each x frequency) for every class. Divide by the total frequency
30
Finding the upper and lower quartiles for discrete data
Lower quartile: - divide n by 4 - If a whole number - the lower quartile is halfway between this data point and the one above - if it is not a whole number, round up Upper quartile: - Find 3/4 of n - If this is a whole number, the upper quartile is halfway between this data point and the one above - If it is not a whole number, round up and pick this data point
31
What is interpolation in a frequency table
When data is presented in a grouped frequency table, you can use interpolation to estimate the median, quartiles and percentiles. When using interpolation, you are assuming each data point is evenly distributed within each class
32
Adv/disAdv of range and interquartile/interpercentile range
Range takes into account all data values but is affected by extreme values. Interquartile/interpercentile does not take into account all data points but is not affected by extreme values
33
What is variance
A measure of spread It is the mean of the squares take away the square of the mean
34
What is the standard deviation
Another measure of spread It is the square root of the variance
35
What is coding
A way of simplifying statistical calculations. Each value is coded to make a new set of values that are easier to work with.
36
What does a coding formula look like
Usually looks like y = (x-a)/b where a and b are constants
37
What is the mean of the coded data What is the standard deviation of the coded data
The mean = ((mean of x)-a)/2 Standard deviation is (standard deviation of x)/a
38
How to work out medians and quartiles from a frequency table
Draw a cumulative frequency diagram or do interpolation
39
What kind of data can be shown in a Histogram
Grouped continuous data
40
Frequency density equation Area of bar equation
Frequency density = k*(frequency/class width) Area of bar = k*(frequency)
41
What is a frequency polygon
When you join the middle of the top of each bar in a histogram
42
What should you comment on when comparing data
Measures of location, measures of spread, the shape of the distribution, outliers and context
43
What is correlation? What should you comment on if asked if two variables have a linear correlation
Correlation describes the nature of the linear relationship between two variables Comment within the context of the question!!
44
What is a causal relationship
Two variables have a causal relationship if a change in one variable causes a change in the other Correlation doesn't mean causation
45
What is interpolation What is extrapolation
Interpolation is getting an estimate for the dependent variable for a certain independant variable using the line of best fit, INSIDE the range of best fit Extrapolation is making an estimate for a value OUTSIDE the data range using the line of best fit. This is much less reliable than interpolation Note: these estimates can only made FROM x TO y. not from the dependent to the independent variable
46
What is an experiment What is an event What is a sample space
A repeatable process that gives rise to a number of outcomes A collection of one or more outcomes The set of all possible outcomes
47
What does it mean to be mutually exclusive What is the probability of event A OR event B
The two events have no outcomes in common In the venn diagram, the circles do not overlap P(A or B) = P(A) + P(B)
48
What does it mean to be independant What is the condition for events to be independant?
When one event has no effect on the other P(A and B) = P(A) x P(B)
49
What is a random variable? How is it represented in terms of X and x?
A variable whose result depends on the outcome of a random event Random variables are written using upper case letters, X. The particular values that a random variable can take are written using equivalent lower case letters, x. The probability that the random variable X takes on a particular value x is written as P(X=x)
50
What are the three different ways that a probability distribution of a discrete random variable can be represented?
As a probability mass function, using a table or using a diagram
51
The requirements for being able to model with a binomial distribution
- There is a fixed number of trials - There are only two possible outcomes (success or failure) - There is a fixed probability of success - The trials are independant of each other
52
If the random variable X has the binomial distribution, then its probability mass function is given by
Formula booklet
53
What is a cumulative probability function
Tells you the sum of all the individual probabilities ip to and including the given value of x (Probability of x being smaller than or equal to)
54
DELETE
A hypothesis is a statement made about the value of a population parameter A test statistic is the result of the experiment on the statistic that is calculated from the sample
55
What is the null and alternate hypothesis
The null hypothesis is the hypothesis you assume to be correct The alternate hypothesis tells you about the population parameter if your assumption is shown to be wrong
56
What is a one tailed and two tailed test?
A one tailed test has an alternate hypothesis in the form p is smaller than/greater than the null hypotheses. Used to test if the probability has gone up or down. A two tailed test has an alternate hypothesis in the form p doesn't equal the null hypothesis. Used to test when it is thought the probability has moved in either direction (You must split the significance level into two for each tail)
57
What is the critical region? What is the critical value? What is the acceptance region?
The critical region is a region of the probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis. The critical value is the first value to fall within the critical region. The acceptance region is the region where you accept the null hypothesis
58
What is the actual significance level?
The probability that the test statistic would fall within the critical region AND that the null hypothesis is true. It is also the probability of incorrectly rejecting the null hypothesis
59
How to carry out a one tailed test?
1. Formulate a model for the test statistic X - B(n,p) 2. Identify suitable null and alternate hypotheses 3. Calculate the probability of the test statistic taking the observed value, assuming the null hypothesis is true 4. Compare this with the significance level 5. Write a conclusion in the context of the question Or find the critical region and see if the observed value falls within it
60
What is the expected value of a test statistic?
It is the number of trials x the probability of success
61
How do you carry out a two tailed test?
Same as one tailed test, but: - Halve the significance level for each tail - Work out which tail you are testing by seeing if the test statistic is higher or lower than the expected value