Stats Flashcards

(75 cards)

1
Q

What does qualitative data mean

A

non-numerical data e.g. hair colour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does quantitative data mean

A

numerical data e.g. number of children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

discrete data meaning

A

data that can be counted e.g. number of children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

continuous data meaning

A

data that can be measured e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how do you find the class width of a set of data that uses short hand e.g.
length | 1-20 | 21-30 |…

A

put it into ‘5 < x< 9’ form
e.g. 1 - 20 -> 0.5<x<20.5
then minus 20.5 by 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is descriptive stats?

A

stats that are collected and organised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is inferential stats

A

stats where the data is inferred and analysed for conclusion to be made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a population

A

a whole set of items that are in interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a sample

A

a selection of items taken as a subset from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a parameter + example

A

numerical characterists of a population i.e. a mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a stat?

A

a numerical characteristic of a sample that can help to estimate a parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what tool can you use to remember what a stat and a parameter are used for

A

stat begins with ‘s’ -> sample
parameter starts with ‘p’ -> Pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a census

A

data that observes and measures every item within the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an adv and a dis.adv of a census

A

adv -
representative of the whole pop

d.adv-
expensive, time-consuming, impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what can the size of a sample affect

A

the validity of any conclusions made. The more varied the sample, the more accurate the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is a sampling frame?

A

a list with all items of the population individually named

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is a sampling unit

A

an individual unit of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do you carry out a simple random sample

A
  1. form a sampling name
  2. allocate each item a specific number
  3. generate a random number, e.g. using a calculator, as many times as needed for your sample size (if u need a sample size of 30, generate 30 random numbers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do you carry out systematic sampling

A
  1. form a sampling frame
  2. allocate each item a unique number
  3. using a calc, generate a random number within your population size (this is your starting unit)
  4. calculate the integer component (population size/ sample size = x)
  5. select every xth item after the first to be included in the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do you carry out stratified sampling

A
  1. divide the data into groups (i.e. year groups, age sex)
  2. calculate sample size for strata (xi) -> xi = (sample size/population size) x strata size
  3. make sure the sum of all the xi’s equal the sample size (you may have to round is appropriate)
  4. conduct a simple random sample for each strata
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

adv and dis adv of
- simple random

A

adv:
- everyone has an easy chance of being selected - removes bias
- easy to conduct

d.adv:
- time consuming
-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

adv and dis adv of
- systematic

A

adv:
- covers a wider study area
- less likely to introduce bias because the starting point is randomly generated

d.adv:
- need a randomly generated starting point
- need to know the number of total pop for it to work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

adv and dis adv of
- stratified

A

adv:
- each group receives representation within the sample, as it is proprtional to the group size = increased accuracy

d.adv:
- not all members of the pop may belong to a specific group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are the two non-random techniques

A

quota sampling
convenience/opportunity sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what is quota sampling
1. the population is split into groups (year, age, race) 2. individuals are chosen who best fit the requirements
26
what is convenience sampling
1. a sample is taken from people who are availble for the study (i.e. the first 20 people i see)
27
adv and dis adv of quota sampling
adv: - representative d. adv - non random sampling may introduce bias
28
adv and dis adv of convenience/ opportunity sampling
adv: - quick -> no need of a sampling frame d.adv: - not representative -> the first people seen may not effectively represent the whole pop
29
which is affected by extreme values and which arent: mean, media, and mode
mode is not median is not mean is
30
how do you use your GDC to find mean, median, mode, etc
stats -> enter data ->F2 (calc) -> F6 (SET) -> make sure ‘List1’ is the first line and ‘List2’ is the second ->EXIT -> F1 (1-VAR) x = mean n = sample size Med = median Mod = Mode
31
what is the formula for the integer component
population/ sample size P/n
32
what is the formula to find the sample size for strata
sample size for strata = sample size/ population size x strata size xi = n/p x si
33
what is the formula for the mean from a raw list of data
x = sum of the items/ no of items
34
what is the formula for the mean from grouped discrete distribution
x = sum of each item x its frequency/ the sum of the frequencies
35
what is the formula for the mean from grouped continuous distribution
x = sum of the midpoint of each class x its frequency / the sum of its frequencies
36
what is the formula for the median from very large or grouped distribution
n/2 (n = sum of the frequencies)
36
what is the formula for the median of a raw list of data
n+1/2 = the median item
37
what is a quartile
a data point that lies 1/4, 1/2, or 3/4 through the data
38
define range
the difference between the largest and smallest numbers - > easily affected by large valies
39
define interquartile range
the difference between the upper and lower quartiles. shows the spread of the central 50% of data so is unaffcted by outliers
40
define standard deviation
how spread out the data is
41
what is the formula for variance
variance = (standard deviation)^2
42
if you add a constant to the mean and standard deviation, how will they be affected
new mean (xi) = x+c no change to standard deviation
43
if you multiply a constant (b) by the mean and standard deviation, how will they be affected
mean (x) = bx standard deviation (o) = bo
44
what is relative frequency?
the proportion of the total frequency that lies within a class
45
what is the formula for relatiev frequency
class frequency/ sample size this will be a decimal between 0 and 1
46
what does a bell-shaped/ symmetrical distribution suggest
that data is grouped around the mean or median
47
what does a left/negative distribution suggest
data is largely grouped to the top of the range. median is usually higher than mean
48
what does a right/positive distribution suggest
the data is largely grouped towards the bottom of the range. median is usually lower than mean
49
what does a left skewed histogram look like
the highest bars are found on teh right hand side
50
what does a right skewed histogram look like
the highest bars are found on teh left hand side
51
what does unimodal mean
having one peak
52
what does bimodal mean
having two peaks
53
what does uniform mean
where teh frequency for all classes is approximately equal
54
how do u graph a histogram on the gdc
STATS -> enter data -> GRAPH -> SET -> GRAPH 2 = HIST (F6 -> F1) -> set width and starting point -> GRAPH
55
what values do u use when drawing a cumulative frequency graph
use the lower bound value of the first coordinate i.e. if 1.40
56
if a question asks you to find the 80th percentile on a cumulative frequency graph, and it goes up to 160, what do you do?
80/100 x 160
57
what are the 3 equations to calculate outliers
any value which is: - greater than the upper quartile + 1.5 x interquartile range - less than the lower quartile - 1.5 x interquartile range - a value that falls outside the mean + or - 2(standard deviation)
58
what does the box is a box plot show
the middle 50% of data
59
what do the whiskers show
the minimum and maximum values
60
What is bivariate data
data that tracks two characteristics of a population (x and y), called variables
61
what is an explanatory or independent variable
a variable set and controlled by the observer in a study
62
what is the response or dependent variable
a variable recorded to measure the outcome of a study
63
what does association mean
when the explanatory and response variables demonstrate a relationship
64
what does causation mean
when the explanatory/independent variable is the reason for the relationship
65
what is correlation
how well two variables are related
66
what do points closest to a best-fit line mean
closer the points = stronger the correlation
67
what measure can we use to find a numerical value to represent linear correlation
Pearson’s Moment Correlation Coefficient (PMCC)
68
what will the value of the PMCC (r) always be between
-1< r <1
69
if the value of the PMCC is closer to -1 or 1, is it stronger or weaker?
stronger
70
what does y on x mean and x on y mean
y on x => y=ax+b x on y => x=ay+b
71
how do u switch from y on x, to x on y
SET -> swap the 3rd and 4th list around
72
what is interpolation
making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), WITHIN the range of data
73
what is extrapolation
making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), OUTSIDE the range of data
74
is interpolation or extrapolation more reliable and why
interpolation is more reliable, because extrapolation estimates values outside of the given data, so there is no guarantee a linear trend may continue