Statistics Flashcards

(179 cards)

1
Q

Bivariate data

A

Data relating to pairs of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variables that are statistically related

A

Correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you identify correlation

A

Scatter graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What goes on x axis

A

Explanatory or independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Population

A

The set of things you are interested in

E.g. all people in the uk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Census

A

Observes or measures every member of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parameter

A

A number that describes the entire population

E.g. the mean or standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample

A

Subset of a population

Used to find out information about the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistic

A

A value calculated from a sample
E.g. the mean or standard deviation of the sample that can be used to estimate the mean of the population or standard deviation of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sampling unit

A

An individual unit from the population

E.g. The particular person living in the uk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sampling frame

A

A list of all the sampling units in the population

E.g. The electoral register for the uk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantage of using a sample over a census

A

Quicker, fewer people have to respond and less data to process
Less expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Advantage of using a census over a sample

A

Should be a completely accurate result

A sample may not be large enough to give information about small sub groups of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample disadvantage

A

Data may not be accurate

Sample might not be large enough to give information about small sub groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Census disadvantage

A

Takes a long time and expensive
Hard to process the large quantities of data
Cannot be used if the testing destroys them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages of sampling

A

Quick and not as expensive
Fewer people have to respond
Less data to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Advantages of a census

A

Should give a completely accurate result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If you want to know the mean number of sweets in a packet of sweets, why is it not possible to use a census

A

Destroying all the sweets

Can’t use a census if all the sampling units are being destroyed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

3 methods of random sampling

A

Simple random sampling
Systematic sampling
Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Simple random sampling method

A

Number all the items in the population
Use a random number generator to select sample of desired size
If a number is replicated generate another number for item to be sampled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Systematic sampling method

A

Number all items in the population
Let n=population size/sample size
Use random number generator from 1 to n to select the first item
Choose every nth item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Stratified sampling method

A

The population divided into groups
Decide how many to sample from each group using…
(Number in group/Number in population)×sample size
Use simple random sampling to select the items from each group
So it is proportional and representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

2 methods of non random sampling

A

Opportunity sampling/convenience sampling

Quota sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Opportunity sampling method

A

Sample consists of any items available to be sampled
Used to sample the required number from each group and once requirement is filled any further items are ignored
E.g. who walks into the frozen aisle of a supermarket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Quota sampling method
The population divided into groups Decide how many to sample from each group using... (Number in group/Number in population)×sample size Sample the first "X" for each group and ignore any further items
26
Advantages of simple random sampling
Free of bias Easy and cheap to implement for small populations and small samples Each sampling unit has a known and equal chance of selection
27
Disadvantages of simple random sampling
Not suitable for a small population size or sample size Large samples are expensive and time consuming and disruptive Need a sampling frame
28
Advantages of systematic sampling
Simple and quick to use | Suitable for a large sample/population
29
Disadvantages of systematic sampling
Sampling frame needed | Can introduce bias if the sampling frame isn't random
30
Advantages of stratified sampling
Sample accurately reflects the population | Guarantees proportional representation of groups within a population
31
Disadvantages of stratified sampling
Population must be clearly classified into discrete groups | Selection within each group suffers same as simple random sampling e.g. need a sampling frame
32
Advantages of opportunity sampling
Easy to carry out | Inexpensive
33
Disadvantages of opportunity sampling
Unlikely to provide representative sample Highly dependant on individual researcher Not random so may introduce bias
34
Advantages of quota sampling
Allows a small sample to still be representative of the population No sampling frame needed Quick and easy and inexpensive Allows for easy comparison between different groups in a population
35
Disadvantages of quota sampling
Not random so may introduce bias Population must be divided into groups which can be costly or inaccurate Non-responses are not recorded as such Increasing scope of study increases number of groups which adds time and expense
36
Mean
A numerical measure | Given by the Σx/n
37
What's a median
A numerical measure Given by n+1/2 for non grouped data And n/2 for groped
38
Mode
Most common value
39
Range
Difference between the highest and lowest data value
40
Lower Quartile
Q1 Point that is a quarter of the way along an ordered data set Given by n+1/4 for non-grouped data And n/4 for grouped data
41
Upper Quartile
Q3 Point that is three quarters of the way along an ordered data set Given by 3(n+1)/4 for non-grouped data And 3n/4 for grouped data
42
IQR
Interquartile range The difference between the lower and upper quartile Q3-Q1
43
Variance
A measure of spread of data σ^2=Σ(x-x̄^2)/n Where x̄ is the mean
44
Standard deviation
A measure of spread of data | σ=sqrt of variance
45
Can you use your calculator to get the median for linear interpolation
No | Not accurate
46
How do you use a calculator to get the mean, median, standard deviation, variance and quartiles
``` Shift Menu/setup 3. Statistics Frequency on Menu/setup 6. Statistics 1. 1-Variable Input values AC (sets table) OPTN 2 (1-Variable calc) ```
47
Discrete data
Can only take certain values and can have gaps | shoe size, money, number of sweets
48
Median for grouped
n/2
49
Median for non-grouped data
n+1/2
50
Continuous data
Can take any value in a certain range | height, time, length
51
Linear interpolation assumption
Assuming that the data values are evenly distributed within each class
52
How do you work out standard deviation
Root of variance Or Page 3/4 of formula book Where
53
What is coding
A way of simplifying statistical calculations | Each data value is coded to make a new set of data values that are easier to work with
54
Coding formula for mean and standard deviation
Mean: ȳ=(x̄-a)/b | Standard deviation: σy=σx/b
55
What is an outlier
An extreme piece of data which differs significantly from other observed data values Expected formula will be given in exam
56
What does it mean to clean data
Remove outliers But keep the outliers in unless told otherwise Mark with an x if you are able to identify them
57
Advantage of mode
Useful for non numerical data Not usually affected by outlier or emissions Always an observed data value
58
Disadvantage of mode
Does not use all data values May not be representative if low frequency May not be representative if in a small population
59
Advantage of median
Not usually affected by outliers or errors
60
Disadvantage of median
Not always a data value | Does not use all data values
61
Advantage of mean
When data is large a few extreme values have little effect | Uses all data values
62
Disadvantage of mean
May not always be a data value | Affected by outliers and errors if in a small population
63
Advantage of range as a measure of spread
Reflects the full data set
64
Disadvantage of range
Distorted by outliers
65
Advantage of using the IQR as a measure of spread
Not distorted by outliers
66
Disadvantages of using the IQR as a measure of spread
Does not reflect the full data set
67
Advantage of using the standard deviation as a measure of spread
When data is large a few outliers have negligible impact
68
Disadvantage of using the standard deviation as a measure of spread
When a data set is small a few outliers have a large impact
69
What is a box plot
Can be drawn to represent important features of data AKA FIVE FIGURE SUMMARY since it displays the lowest and highest values, the quartiles and the median Can display any outliers with an x or *
70
When can cumulative frequency be used
For grouped data | Can be an alternative way to estimate the median, quartiles or percentiles
71
Do you include outliers in range
Yes | Unless told otherwise
72
How do you construct a cumulative frequency graph
Calculate cumulative frequency Appropriate scale Plot points using max value for the class width NOT middle of class Find the quartile necessary by using the cumulative frequency to read off the value of 'variable' like height or time
73
When can a histogram be used
Grouped continuous data | Gives a good picture of how data is distributed and allows you to see the rough location and shape of the data spread
74
Relationship between area and frequency in histogram
Area is proportional to frequency
75
Frequency density
``` Frequency/class width Assume there is an equal spread so you use the midpoint of each class Don't join first and last point ```
76
What is a frequency polygon
Obtained by joining the middle of the top of each bar
77
How do you construct a histogram
``` Frequency density: frequency/class width Frequency density on the y axis ```
78
Assumption for using a frequency polygon
That the data is spread equally in classes
79
What is an experiment
A repeatable process that gives rise to a number of outcomes
80
What is an event
A collection of outcomes from an experiment from which a probability is assigned
81
What is a sample space
A set of all possible outcomes
82
How are probabilities written
Decimals | Fractions
83
What is a random variable
A variable whose value depends on the outcome of an event
84
Sample space in terms of discrete probability distribution
Range of values a random variable can take
85
When is a variable discrete
If it can only take certain numerical values
86
What is a probability distribution
Describes the probability of every outcome in the sample space Several ways this can be displayed e.g Table Probability mass function
87
How can probability distribution be displayed
Table | Probability mass function
88
What is a discrete uniform distribution
Probability distribution in which the probabilities of each outcome is the same
89
What is a probability density function
The distribution for a continuous random variable | The area under the graph of this function represents probability
90
Explain thr notation X~B(n,p)
Notation for the binomial distribution of X Where n is the number of trials carried out And p is the probability of success
91
When can a binomial distribution be applied
Two outcomes only, win and lose There are a fixed number of trials, n The probability of success is the same for each trial, p All trials are independent
92
How do you find the probability for a binomial that's equal to something
``` Menu 7 4 - Binomial PD 2 Enter with = ```
93
How do you find the probability for a binomial that's less than or equal to
``` Menu 7 Down 1 - Binomial CD 2 Enter with = ```
94
How do you find the probability of a binomial that's less than something
Calculator only works out less than or equal to | Change it
95
How do you find the probability of a binomial that's greater than or greater than or equal to
Calculator only works out less than or equal to | Change it and use 1 minus
96
How do you calculate the expected value of a binomial distribution
np
97
What is the expectation of a distribution
The ling term average If the event was repeated many times the expected value would be the average of the outcomes E[X] = ų = np Where n is the number of trials and p is the probability
98
What is a hypothesis
A statement made about the value of a population parameter | Testing a hypothesis is done by carrying out an experiment or taking a sample from the population
99
What is a test statistic in term of hypothesis
The result of the experiment or the statistic that is calculated from the sample In order to carry out the test there must be two hypotheses
100
What is a null hypothesis
H0 The default position What is expected
101
What is an alternative hypothesis
H1 Describes an alternative possibility More, less, different
102
What is a one tailed test
Describes when you are testing whether a parameter is more or less than some number
103
What is a two tailed test
When you are testing whether a parameter is not equal to some number
104
2 methods to conduct a hypothesis test
Find critical region and compare to test statistic | Find the probability of being at least as extreme as the test statistic and comparing to significance level
105
How do you construct a hypothesis testing conclusion
Compare the probability to significance level or test statistic to critical region Accept/Reject H0 State the outcome - is/is not enough evidence to suggest...
106
Method for hypothesis testing with probabilities
State hypotheses Assume H0 true and state the distribution being used Expected value and diagram Calculator to find the probability of interest Compare to the given significance level and be careful of the tail of interest If the probability is greater than the given significance then accept H0 Conclusion
107
When do you accept or reject H0 in hypothesis testing with probabilities
When calculated probability is greater than significance level you accept H0 so insufficient evidence to suggest... When the calculated value is less than the significance level then you reject H0 so there is sufficient evidence to suggest...
108
What is a critical region
The region of a probability distribution which, if the test statistic falls within, would cause the null hypothesis to be rejected The critical value is the first value to fall inside the critical region
109
What is the critical value
The first value to fall inside the critical region
110
Acceptance region
Region of a probability distribution which, if the test statistic falls within, would cause the null hypothesis to be rejected
111
What is the actual significance level of a hypothesis test
The probability of the test statistic falling in the critical region assuming the null hypothesis is correct
112
What does the location of the critical region depend on
The type of alternative hypothesis
113
What is the significance level
The probability of incorrectly rejecting the null hypothesis
114
How do you find the critical region
State hypotheses Assume H0 true and state binomial distribution Calculate the expected value Determine whether the critical region is before or after this 'What numbers lie in the "significance level" percent?" Menu 7 Down Binomial CD 1 list Input estimate numbers until you get a suitable value Critical region must be lower than significance level if at bottom and greater if at the top
115
How do you find the actual significance level
Once you've found the critical region | It is the probability that correlates to this
116
What do you have to be careful of when finding the critical region above the expected value
Add one to the value that is just above the significance
117
How do you test hypotheses with the critical region
State hypotheses Assume H0 true and state binomial distribution E[X] and graph to determine location of critical region Find critical region: Menu 7 Down Binomial CD 1 List Input approximate values If test value is in the critical region then you reject H0 If test value is not in the critical region it is in the acceptance region for H0 so accept Conclusion
118
Ø
The empty set | No intersections
119
Definition and formula for mutually exclusive events
When the events have no outcomes in common P(AnB) = 0 and P(AuB) = P(A) + P(B)
120
Definition for independent events and formula
When one event has no effect on another P(AnB) = P(A) x P(B) Formula used to prove and test if independent
121
What can tree diagrams be uses for
To show two or more events happening in succession
122
Explain the notation P(B|A)
The probability that B occurs given A has already occured
123
What is conditional probability
A way of modelling situations in which the probability of an event can change depending on the outcome of a previous event
124
Formula for conditional probability
P(B|A) =P(BnA)/P(A)
125
Rule for independent events in conditional probability
``` P(A|B) = P(A|B') = P(A) P(B|A) = P(B|A') = P(B) ```
126
Addition formula for for the events A and B
P(AuB) = P(A) + P(B) - P(AnB)
127
Multiplication rule for conditional probability
P(B|A) = P(BnA)/P(A) So P(AnB) = P(B|A) x P(A)
128
Binomial vs normal distribution
Binomial is for discrete data | Normal is for continuous
129
What is a continous random variable
A variable that can take any one of infinitely many values
130
What is the normal distribution
A continously probability distribution that can model naturally occurring characteristics
131
Notation for normal distribution
X~N(μ,σ²) If X is a normally distributed random variable with the population mean μ and variance σ²
132
What are the conditions for normal distribution
Symmetrical, mean=median=mode Has a bell shaped curve with asymptote at each end Has a total area under the curve of 1 Has points of inflection at μ+σ and μ-σ
133
What is a point of inflection
Convex to concave or vice versa
134
Rules for a normally distributed variable
Approximately 68% of data lies within one standard deviation of the mean (μ+/-σ) 95% of the data lies within two standard deviations of the mean (μ+/-2σ) Nearly all data (99.7%) lies within three standard devations of the mean (μ+/-3σ)
135
How do you find probabilities using the normal distribution
``` Menu 7: Distribution 2: Normal CD Enter μ and σ using = Fill in the upper and lower boundaries If only one boundary to use... Lower = -99999 Upper = 99999 ```
136
Explain P(X=a)=0
The probability of an individual thing, a, happening is zero Not actually zero since asymptote but so small it is approximately 0 and has no area
137
When do you use the inverse normal
When given a probability to calculate a value that satisfies an inequality Calculator only calculates less than
138
How do you calculate inverse normal
Menu 7: Distribution 3: Inverse normal Area is the area less than the value that satisfies inequality Since calculator only works out
139
Why can't you use PD for P(X<=1)
Must be CD | Since X can also be zero
140
How do you standardise a normally distributed variable
By coding the data | So that it is modelled by the standard normal distribution
141
Why is the standard normal distribution useful
To standardise a normally distributed variable | By coding it
142
Rules for the standard normal distribution
Z~N(0,1) Has mean 0 and standard deviation 1
143
What can the standard normal distribution be used to find
μ or σ if they are unknown Z=(x-μ)/σ
144
How do you use the standard normal distribution go find μ or σ
Z~N(0,1) Draw both graphs with equivalent areas Find value of z for which P(Z>/etc)=area Z=x-μ/σ to get value
145
How can you test hypotheses about the mean of a normally distributed random variable
By looking at the mean of a sample called the sample mean
146
Formulas to use for hypothesis testing with the normal distribution
For a random sample of size n taken from a random variable X~N(μ,σ²), the sample mean distribution is given by X̄~N(μ,σ²/n) The mean is the same but the variance is different
147
What must be used when completing a hypothesis test with the normal distribution
The sample mean | Because you are using a sample of a given size and extrapolating that to give conclusions about the whole population
148
Method for using hypothesis testing with normal distribution
``` State hypotheses Assume H0 true and state the sample mean distribution Sketch the graph Find the probability Compare Conclusion ```
149
What goes on y axis
Response/dependant variable | Expected to change in response to the other variable
150
What is a regression line
A line which fits as well as possible to the points on the scatter graph Useful to identify a trend
151
PMCC
Product Moment Correlation Coefficient Provides information on the type and strength of the correlation between two variables Described by 'r'
152
PMCC of 1
Perfect positive correlation
153
PMCC of -1
Perfect negative correlation
154
PMCC of 0
No correlation
155
PMCC of -0.2 - 0.2
Weak/poor correlation
156
PMCC of 0.75 to 1
Strong positive correlation
157
PMCC of -0.75 to 1
Strong negative correlation
158
Type and strength of correlation for a town's annual income and the crime rate
Moderate negative correlation
159
Give the type and strength of correlation for the height of father's and their sons
Positive correlation
160
Give the type and strength of correlation between the cooking time for a chicken and the weight of the chicken
Strong positive correlation
161
Give the type and strength of the correlation between shoe size and salary
No correlation
162
When is a prediction made using a regression line unreliable
When the predication is made in different conditions than those for the original sample data
163
Interpolation in terms of regression line
Using a regression line to make predictions which fall within the range of observed data Stronger correlation means more reliable prediction
164
Extrapolation in terms of regression line
Making predictions outside of the range of observed data | Unreliable since no evidence that the pattern extends beyond the observed range
165
How do you find the regression line and PMCC (r)
``` Frequency off Menu 6: Statistics 2: y=ax+b Enter data items x and y in table OPTN 4: Regression calc Displays a and b for the regression line of form y=ax+b and 'r'/PMCC ```
166
How do you put frequency on
Shift Setup 3 1 or 2
167
What is causal correlation
When a change in one variable does affect the other
168
What is spurious correlation
Correlation without causal connection
169
What is a regression line
Line of best fit Y=c+mx C: when "x" is zero "units", "C" is the predicted number of "y" M: every increase in "x" by "1 unit" corresponds to an increase/decrease in "y" by "m units"
170
What is curve fitting used for
To model polynomial and exponential relationships
171
Polynomial curve fitting equations
``` If y=ax^n Then log(y)=log(a)+nlog(b) Where Y=log(y) and x=log(x) ```
172
Exponential curve fitting equations
``` If Y=kb^x Then log(y)=log(k)+xlog(b) Where y=log(y) ```
173
a and b in Y=ab^t
a=initial number of variable y | b=proportional increase or decrease as t increases by 1
174
Why can you use hypothesis testing with correlation coefficient
To determine whether the oroduct moment correlation coefficient, r, for a particular sample indicates that there is likely to be a linear relationship within the whole population
175
r vs p for correlation hypothesis testing
r is PMCC for a sample | p is PMCC for the population
176
Explain the hypotheses for corration hypothesis testing
H0: p=0 H1: p>0 or p<0 or p≠0 Positive correlation, negative correlation, correlation
177
Method to find the critical region with PMCC then test hypothesis
Page 37 of FB, read off to find the critical region for r using the significance level and sample size Sketch number line to determine if r is negative or positive Assume no correlation to test alternative hypothesis If r>critical region then reject H0 Conclusion
178
What is the large data set
Contains the weather data For 5 UK weather stations And 3 weather stations overseas
179
Why can't you predict x for a value of your for the regression line y=mx+c
Regression line for y on x | Can only reliably be used to predict the y value