4 statistics and probability Flashcards

(133 cards)

1
Q

discrete data

A

something you can count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

discrete data

A

something you can count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

continuous data

A

something you measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a hypothesis

A

a statement you test to see if it is true or false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

raw data

A

data before it has been analsyed or processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

primary data

A

data you collect yourself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

secondary data

A

data you use which someone else has collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

categorical data

A

data is words not numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

numerical data

A

data given as numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

types of numerical data

A

continuous or discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ordinal data

A

data that is ordered in some way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

adv of secondary data

A

available
cheaper
easy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

adv of primary data

A

reliable

aware of bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ways of collecting data

A

measurement or experiment
survey or questionaire
modelling or simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

mistakes to avoid when doing surverys or questionaires

A
asking the wrong people or a biased sample
asking leading questions
asking confusing questions
asking personal questions
asking too open ended questions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

random

A

every member of the popuation ahs the same probability of being included
the members of a genuinely random sample have to be selected independently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ways of collecting a sample

A

convience
systematic
genuinely random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

convienence sample

A

asking whoever is easiest to get hold of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

systematic sample

A

asking every 3rd person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

genuinely random sample

A

picking out of a hat or using a random number generator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

quota sampling

A

Choosing a sample that is only comprised of members of the population that fit certain characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

stratified sampling

A

Choosing a random sample in a way that the proportion of certain characteristics matches the proportion of those characteristics in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

continuous data

A

something you measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

hypothesis

A

a statement you test to see if its true or false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
raw data
date before analysis or processing
26
primary data
data you collect yourself
27
secondary data
data you use which someone else has collected
28
categorical data
word data not numbers
29
numerical data
number data
30
ordinal data
ordered in some way
31
adv of secondary data
available cheaper easy
32
adv of primary data
reliability | aware of bias
33
ways of collecting primary data
measurement or experiment survey or questionaire modelling or simulation
34
random sample
every member of population has the same probability of being included. selected indepently.
35
what is the opposite of a census
a random sample
36
convience sampling
asking friends or those easy to ask
37
systematic sammpling
e.g. asking every 3rd person
38
genuinely random sampling
pick out of hat or use random number generator on calculator.
39
quota sampling
the populalation is divided into groups. a given number is surveyed forme ach grouo.
40
cluster sampling
the population is divided into groups or clusters. a random sample of clusters is chosen and every item in it is surveyed. a large number of small clusters minimises the chances of being unrepresentative.
41
opinion polls
large scale opinion polls often use a combination of cluster and quota sampling. large sample size based on small proportion of population. (geographical area, age). but opinions change over time
42
what is a uniform distribution
flat/even
43
what is a normal distribution
peaked in the middle mean, median, middle, mode in the same place gaussian distribution
44
what is negatively skewed
leading up to the right
45
what is the positively skewed
leading up to the left or decreasing
46
box plot left skewed
box on the right with the median line towards the right
47
box plot right skewed
box on the left with the median line towards the left
48
normal distribution and standard deviatiosn
the standard deviations (outliers) next to the highlighted (70%) will be30% total, 15% each
49
box plot name
box and whisker diagram
50
the ends of the box in a box plot are the
interquartile range
51
outlier definition
a term of data that is at least 2 standard deviations away from the mean (histogram) OR at least 1.5 x IQR beyond the nearer quartile (box and whisker)
52
benefits of a curve in a cumulative frequency diagram
they use the data to show a gradient, so if the frequency decreases slightly then the gradient will show it by flattening a little. straight lines only show the data and not the link between them
53
datum
singular piece of data
54
why do bars not touch with discrete data
because there is no continuity between columns
55
what graph do you use for continuous data
histograms
56
what graph do you use for discrete data
bar graph
57
what graph do you use for cumulative frequency
line
58
how do you plot for cf graphs
to the upper bound
59
For data grouped into intervals or classes, we may identify the following:
``` mid-interval values interval width (though it is not common to have a varying interval width) lower interval boundary upper interval boundary modal class (the class with the highest frequency or the tallest class in the diagram; be aware, use the tallest class in the frequency diagram, not in the cumulative frequency diagram). ```
60
what is the 5 number summary
``` minimum Q1 median Q3 maximum ```
61
when is a box and whisker plot a normal distribution
when you can recognise symmetry
62
cumulative frequency polygon
The data points are connected by straight lines, implying a linear distribution of the data points within an interval.
63
cumulative frequency curve
All the data points are connected by a smooth curve
64
no correlation is
a bunch of dots
65
strong positive correlation is
a line goin gup to the right with all the dots very close on that line
66
perfect negative correlation
a line going down to the right with all the dots onit
67
moderate negative correlation is
going gently down to the right with dots around it
68
weak positive correlation is
a line faintly foing up to the right with dots all aroun dit
69
what is the r of no correlation
0
70
what is the r of strong positive correlation
0.9
71
what is the r of perfect negative correlation
-1
72
what is the r of modertae negative correlation
-0.5
73
what is the r of weak postiive correlation
0.3
74
what is the r of a curved relationship or no correlation
dont add straight line so r not meaningful
75
what are residuals
the vertical displacements for some of the points from the line
76
which residuals are positive and negative
above the line - positive | below the line - negative
77
what is the sum of all residuals
0
78
why would we square the residuals
so they are all positive
79
what does the sum of residuals show
how well the line fits the poitns
80
would a good line have a low or high sum of residuals
the line with the lowest possible sum of square residuals is called the least squares regression line of y on x
81
if you want to calcualte the y values from the x values how would you plot the line of best fit
vertical residuals to be as small as possible.
82
if you want to calcualte the x values from the y values how would you plot the line of best fit
horizontal residuals to be as small as possible.
83
what is the line called that has the lowest possible sum of square residuals
the least squares regression line of y on x
84
what are the two seperate regression least squares regression lines
one for y on x | one for x on y
85
what extra 3 columns should you have if youre calcualting regression and correlation
x squared y squared xy
86
what are teh sections of the graph called
quadrants
87
positive correlation to the quadrants
in the 1st and 3rd (top right and bottom left)
88
negative correlation to the quadrants
int he 2nd and 4th quadrant (top left and bottom right)
89
product moment correlation coefficent
Sxy ---------- (square root) SxxSyy
90
what is a in stats
gradient
91
what is b in stats
y intercept
92
graident of y on x line
Sxy -------- Sxx
93
what is the product moment correlation coefficent
r
94
if measurements multiplied by 10 what effect would that have on the correlation
no effect
95
y = ax + b | whats a
gradient
96
y = ax + b | whats b
y intercept
97
interpoaltion
within data range
98
extrapolation
outside data range
99
r squared or variance is used to
show how clsoe the points are to the line. they remove knowledge of whether the data is trending up or down.
100
small sd or variance means
the data is all close together
101
high sd and variance means
the data is spread out
102
if the values when calcualting sd were all multiplied by 10 what would happen to sd
it would also be multiplied by 10
103
if the values when calculating variance were all multiplied by 10, what would happen to variance
it would get multiplied by 10^2
104
if the values when calculating mean were all multiplied by 10, what would happen to the mean
it would also be multiplied by 10
105
if the values when calculating r/correlation were all multiplied by 10, what would happen to the r/correlation
there would be no change
106
if 10 was added to all the values when calculating sd, what would happen
there would be no change
107
if 10 was added to all the values when calculating variance, what would happen
there would be no change
108
if 10 was added to all the values when calculating the mean, what would happen
10 would be added to the mean
109
what is variance essentially
standard deviation squred
110
which type of standard deviation is the notation of sigma used for
population
111
which type of standard deviation is the notation of Sx used for
sample
112
to find x on y, give the order of the columns that you would enter into the calculator
y column and then the x column
113
to find y on x, give the order of the columns that you would enter into the calculator
x column and then the y column
114
when you are calcuating x on y, what value are you finding
x
115
when you are calculating y on x, what value are you finding
y
116
what is the mean point
the line of regression x on y and the line y on x will pass through the mean point. (x bar, y bar)
117
what is relative frequency
the decimal probability as a percentage
118
0 on the probability scale is
never
119
1 on the probability scale is
absolutely certain
120
when might you use r squared
to plot a curve
121
probability P(A) =
n(A) ---------- n(u)
122
complementary events are represented by an
apostrophe
123
when do you multiply probabilities
when they are independent events
124
when do you add probabilities
when they are mutually exclusive
125
what are independent events
when one event does not effect the
126
what is relative frequency
probability multiplied by 100, so it is a percentagee
127
do you multiply or add AND
multiply
128
do you multiply or add OR
add
129
what are mutually exclusive events
when only one event can happen. there is no intersection
130
combined events | P(A∪B) =
P(A) + P(B) - P(A∩B)
131
what is a random variable
an outcome of a random experiment which can be represented as a number
132
what is a probability distribution
a table showing all the possible outcomes and their probabilities.
133
probabilities add up to
1