AS Stats Flashcards

1
Q

define population

A

the whole set of items that are of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define census

A

observes or measures every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the advantage of using the census?

A

it should give a completely accurate result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the disadvantages of using the census?

A

time consuming & expensive
cannot be used if the testing process destroys the item
difficult to process a large quantity of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define sample

A

a selection of observations taken from a subset of the population, which is used to find out information about the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the advantages of using a sample?

A

less time consuming & expensive than the census
fewer people have to respond
less data to process than a census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the disadvantages of using a sample?

A

data might not be as accurate
sample might not be large enough to give info about small subsets of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define sampling units

A

individual units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define sampling frame

A

a list of individually named or numbered sampling units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(how does sampling size affect the validity of the conclusions?)

A

sample size depends on required accuracy & resources
larger sample sizes are more accurate
a varied population requires a larger sample than a uniform population
different samples produce differing results due to natural variation within populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the 3 types of random sampling?

A

simple random
systematic
stratified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

define simple random sampling

A

every sample of size n has an equal chance of being selected
need a sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are advantages of simple random sampling?

A

no bias
easy & cheap for small sample
each sampling unit has a known & equal chance of selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are disadvantages of simple random sampling?

A

not suitable from large sample bc time consuming, disruptive & expensive
need sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define systematic (random) sampling?

A

the required elements are chosen at regular intervals from an ordered list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are advantages of systematic sampling?

A

simple & quick to use
suitable for large samples/populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are disadvantages of systematic sampling?

A

need sampling frame
can be biased if sampling frame is not random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

define stratified (random) sampling

A

population is divided into mutually exclusive strata & a random sample is taken from each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are advantages of stratified sampling?

A

sample accurately reflects the population structure
guarantees proportional representation of groups within the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are disadvantages of stratified sampling?

A

population must be clearly classified into distinct strata
selection within each stratum is random so same disadvantages as random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are the 2 types of non-random sampling?

A

quota
opportunity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

define quota sampling

A

researcher selects a sample that reflects the characteristics of the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are advantages of quota sampling?

A

allows a small sample to be representative of the population
no sampling frame needed
quick, easy & cheap
easy comparison b/w different groups within population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are disadvantages of quota sampling?

A

non-random can introduce bias
population must be divided into groups - expensive or inaccurate
increase scope of study increases # of groups, which increases time & cost
non-responses not recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
define opportunity/convenience sampling
take sample from people available at the time of study & who fit the criteria
26
what are advantages of opportunity sampling?
easy cheap
27
what are disadvantages of opportunity sampling?
likely to be not representative of the population dependent on individual researcher
28
define quantitative variables
variables/data associated with numerical observations
29
define qualitative variables
variables/data associated with non-numerical observations
30
define continuous variable
can take any value within a given range
31
define discrete variable
can take only specific values within a given range
32
define measure of location
single value that describes the position in a data set
33
define measure of central tendency
when the single value (of the measure of location) describes the centre of the data
34
what are the features of a grouped frequency table?
data is grouped into classes class boundaries show max. & min. values in each class midpoint is the average of each class boundary class width is the difference b/w the upper & lower class boundaries
35
when is it best to use mean, median or mode?
mean: quantitative data with no extreme values median: quantitative data with extreme values mode: qualitative or quantitative data with 1 or 2 modes
36
what is the formula for the mean & for mean of data in frequency table?
Σx / n n = Σf Σxf / Σf
37
how do you calculate median from frequency table?
arrange data points in ascending order add 1 to the # of data points then divide by 2 Σf / 2 + 0.5 to find data - n+1 / 2 th
38
how do you calculate the mode from frequency table?
x value with the highest frequency value that appears the most
39
how do you calculate mean, median & mode from grouped frequency table?
mean: Σ(midpoint x f) / Σf median: linear interpolation or Σf / 2 is the number of the value & see what class it is in mode: class with the highest f linear interpolation if specified
40
what are the other measures of location?
Q1 - lower quartile (first 25% of data) Q2 - median (first 50% of data) Q3 - upper quartile (first 75% of data) P10 - 10th percentile (first 10% of data)
41
how do you calculate the location of Q1, Q2 & Q3 for discrete data?
Q2: Σf + 1 / 2 Q1: 1/4 x Σf Q3: 3/4 x Σf if whole number, Q1/Q3 is halfway b/w this data point & one above if not whole number, round up & Q1/Q3 is this data point
42
what is the assumption made by using linear interpolation?
that the data is evenly distributed within each class
43
what is the formula for linear interpolation?
GLB + (PV/GF x CW) lower bound of class + (place value/group frequency x class width) place value - how much you have to count up to get into that class
44
what are 3 ways of measuring spread of data & define them?
range - difference b/w largest & smallest values in the data set interquartile range (IQR) - Q3 - Q1, the difference b/w Q3 & Q1 interpercentile range (IPR) - difference b/w the values for 2 given percentiles
45
what are the other ways of measuring spread, define & formulae?
variance - each point deviates from the mean by: x - x̄ Sxx/n Sxx is in FB standard deviation - square root of variance see FB
46
what is the formula for coded data?
y = x-a / b
47
what is the formula for the mean of coded data?
ȳ = x̄ - a / b
48
what is the formula for standard deviation of coded data?
σy = σx / b
49
how does coding affect the mean & sd?
the code is applied directly to the mean sd is only impacted by b
50
how do you draw a box plot?
see notes sheet needs scale x = outlier
51
how are box plots interpreted?
comparison of position of median
52
what are the formulae for an outlier?
outlier < Q1 - kIQR outlier > Q3 + kIQR mean + or - 2σ
53
how do you compare measures of location & spread?
location: 1. compare the means or medians 2. e.g. so people in set A have to travel further than set B on average spread: 1. compare the standard deviations, variance, range or IQR 2. so there is more/less variability in data set A than data set B
54
define outlier
an extreme value that lies outside of the pattern of data it is mathematically defined
55
define anomalies
result caused by error it is removed from the data set (= cleaning)
56
what are the key aspects of a cumulative frequency graph?
start at frequency 0 continuous & CW doesn't need to be equal join w smooth curve through all points points plotted at max. of CW
57
why is a CF graph better than linear interpolation when estimating quartiles & percentiles?
it doesn't assume even distribution within class
58
what are the key aspects of a histogram?
area of the bar is proportional to the frequency x: class width (may not be =), continuous variable y: f density
59
what is a frequency polygon?
joining the middle of the top of each bar on a histogram with equal class widths
60
mean & sd compared median & IQR compared cannot mix up bc...
mean & sd more affected by outliers than median & IQR mixed up are not comparable
61
what is the difference b/w correlation & causation?
correlation: pattern/trend b/w data sets causation: one variable is directly impacted by the other variable
62
define bivariate data
data that has pairs of values for 2 variables
63
what relationship does correlation assume?
linear always say 'linear correlation'
64
what are the types of correlation?
+ve -ve strong weak none
65
describe regression line
least squares regression line b/w bivariate data = straight line that minimises the sum of the squares of the distances of each point from the line y=a+bx gradient of line will be +ve for +ve linear correlation & -ve for -ve linear correlation can only be used to find y from x not x from y
66
how can you interpret correlation of the data?
r (regression statistic) informs how close data is to linear regression line -1≤ r ≤ 1 r = 0: no linear correlation r closer to -1: stronger -ve linear correlation r closer to +1: stronger +ve linear correlation
67
interpolation vs extrapolation of linear regression line
interpolation - extracting/predicting value from inside range of data extrapolation - predicting value from outside the range of data = do not do bc less reliable
68
what variable can you predict using linear regression line?
dependent only
69
how would you predict IV from linear regression line?
use regression line of x on y = map it the other way round
70
define experiment
repeatable process that gives rise to a number of outcomes (results)
71
define event
collection of one or more outcomes
72
define sample space
set of all possible outcomes venn diagrams table tree diagram
73
define equally likely
same probability of outcome outcomes/total # possible outcomes
74
what 2 ways can probability be calculated?
sample space e.g. venn diagram, table, tree diagram linear interpolation - for continuous data/grouped frequency table
75
rules for venn diagrams
fill from middle outwards assign a value to central intersection - if unknown, put x
76
shade intersection, union & complement on venn diagram what are the notations?
union: A or B or both see notes
77
define mutually exclusive & what is the formula?
events that cannot happen at the same time P(A n B) = 0 P(A u B) = P(A) + P(B)
78
define independent & what is the formula?
the outcome of one event does not affect the outcome of the others the probability of one event is not impacted by the probability of another event (probability of A happening is the same whether or not B happens) P(A n B) = P(A) x P(B)
79
what is the assumption for tree diagrams?
an object is not replaced
80
define random variable
variable whose value depends on the outcome of a random event
81
notation for random variable & outcome
X - random variable x - random outcome
82
sum of all outcomes of an event
1
83
what are the types of probability distribution?
probability mass function: P(X=x) = 1/6, x = 1,2,3,4,5,6 table diagram
84
what is a uniform discrete probability distribution?
every outcome has the same probability fixed numerical values
85
ΣP(X=x) =
1
86
what is a cumulative probability function?
tells you the sum of all individual probabilities up to & including x in the calculation for P(X≤x)
87
binomial distribution
X ~ B(n,p) B - 2 possible outcomes (success & failure) n - fixed number of trials p - fixed probability of success outcomes/trials are independent
88
what is the probability mass function of random variable X, which has binomial distribution?
P(X = r) = nCr p^r (1-p)^(n-r) n = index p = parameter see notes booklet
89
how do you find constant k in random variable probability Qs?
use the formula they give & sum all the probabilities to 1 then solve for k
90
define population parameter
condition of the distribution that is being tested
91
define test statistic
the actual result of doing the experiment
92
define null & alternative hypothesis
null: H0 the hypothesis that you assume to be correct alternative: H1 tells you your assumption about the population parameter is wrong
93
one-tailed vs two-tailed tests
one-tailed - one direction H1: p>... or H1: p<.... two-tailed - 2 directions H1: p≠...
94
define significance level
boundary decided before experiment to decide whether the test fulfils H0 or H1
95
what is the critical region & what is the critical value:
critical region is the region of probability which, if test statistic fall inside it, would cause you to reject the null hypothesis critical value is the first value to be inside the critical region
96
define actual significance level
probability of incorrectly rejecting H0 the actual probability of critical region P(X≤CV) or P(X≥CV)
97
what is the critical region for a two-tailed test?
2 parts - half at each end of the distribution
98
what are the 2 methods for conducting a hypothesis test?
1. probability of test statistic 2. calculate critical region & compare test statistic
99
structure
see notes sheet