Vocab Flashcards

Chapter 1 - Collection of data. (108 cards)

1
Q

Population

A

Everyone/everything involved in an investigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Census

A

An investigation with data taken from every member of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample

A

An investigation with data taken from a select few of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bias

A

Anything that distorts the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Strata

A

A subgroup/subcategory within a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sampling frame

A

A list of all the items/people forming a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling unit

A

One item from a sampling frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Observation

A

You record something happening.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Experiment

A

You record data from something you make happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Qualitative data

A

Describes certain qualities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quantitative data

A

Describes certain quantities, can be discrete or continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Continuous data.

A

Data we can measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Discrete data.

A

Data we can count.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Primary data.

A

Collected by the user.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Secondary data.

A

You obtain the data from somebody else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Questionnaire.

A

A set of questions used to obtain data, which respondents complete, can be anonymous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Interview/Survey.

A

Data collection methods. Ask people their opinions, can be anonymous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pilot survey.

A

Testing a questionnaire on a small group of people first.

-identifies likely responses
-checks response rate
-see if questions are understood
-checks how long it will take

-unexpected outcomes(refine hypothesis/change something)
-problems easier and less costly to fix before full study
-check methods of distribution/collection work
-estimate time/costs of full study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Open questions.

A

No suggested answers, differently worded answers can make data analysis difficult.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Closed questions.

A

Suggested answers to choose from, opinion scales where people tend to answer in the middle as they do not wish to be extreme.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Capture recapture

A

A population estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Judgement sampling.

A

Use judgement to select a sample representative of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Opportunity sampling.

A

Use available people/objects at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Systematic sampling.

A

Choose a starting point from your sampling frame at random, then choose items at regular intervals. (e.g. sampling frame of 1st 32, use RNG to pick number in 1st 32, then go up sample in intervals of 32s asking every person selected.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Random sampling.
Everyone in the population has an equal chance of being selected (unbiased).
26
Quota sampling.
Group by characteristics, and interview a number from each group
27
Cluster sampling.
Data naturally splits. List of clusters = sampling frame. Randomly select clusters to form sample.
28
Stratified sampling.
Intentionally different proportion of people asked from each strata, depending on size. (e.g. 60/1000 x 250 =15 year 7s in sample).
29
Random response method.
For sensitive questions which people are likely to answer dishonestly (e.g. flipping coins, if heads, tick yes, if tails, answer honestly.)
30
Primary data advantages
gather data that directly relates to hypothesis you know reliability
31
primary data disadvantages
expensive time consuming difficult/impossible
32
secondary data advantages
easier to get hold of can gather data quickly and cheaply large data sets
33
secondary data disadvantages
wrong format/rounded difficult to find data that matches your hypothesis exactly (out of date, no relevant data available) don't know accuracy, may be biased, unreliable
34
census pros
representative of entire pop. unbiased
35
census cons
hard/impossible for big pops. expensive impractical might be tricky to define entire pop/access all members not an option when items being used up/damaged by investigation
36
sample pros
quicker cheaper more practical than a census
37
sample cons
less accurate not fully representative biased variability between samples
38
random sampling pros
unbiased (should be) representative
39
random sampling cons
not always practical/convenient-if pop. spread over large area, travel impossible to list entire pop. or access everyone
40
stratified sampling pros
likely gives a representative sample if you have easy to define categories (e.g. gender) can compare results from different groups
41
stratified sample cons
not useful when no obvious categories/hard to define can be expensive
42
systematic sampling pros
unbiased sample can be done by machine
43
systematic sample cons
nth item might coincide with a pattern (e.g. fault) so biased
44
cluster sampling pros
convenient (saves travel time when pop. spread over large area)
45
cluster sampling cons
biased if similar clusters sampled, e.g. with similar incomes per region.
46
quota sampling pros
quick representation of all diff groups (genders etc) can be done with no sample frame member easily replaced by one of the same characteristics
47
quota sampling cons
biased- interviewer bias refusal to take part (might have similar views) -not all may have an equal chance of being selected
48
opportunity sampling pros
convenient
49
opportunity sample cons
-not representative of pop. -very biased. -selecting at a particular time and place so not all students have an equal chance of being selected.
50
judgement sampling pros
quick sometimes may be the only suitable method to use
51
judgement sampling cons
researcher bias researcher unreliable-though should have good knowledge of pop. not random -very biased
52
categorical scale
gives names or numbers to classes of qualitative data so it can be more easily processed. (numbers don't have meaning).
53
ordinal scale
(rank scale) gives numbers to the classes of data which can be ordered in a meaningful way.
54
multivariate data
made up of two or more variables
55
bivariate data
data made up of two variables (numerical)
56
questionnaire pros
quick and cheap well written ones shouldn't be biased respondents aren't under pressure, so their answers likely truthful can distribute to large numbers of people
57
questionnaire cons
distribution can lead to bias non-responses (particularly on sensitive Qs) (discard but might remove certain parts of pop.) questions might not be understood by respondent
58
methods to distribute questionnaires (pros and cons)
hand it out - target pop gets, but time consuming put it online -data recorded and collected easily, but ppl without internet access excluded post/email - wide reaching, not sure who is responding ask ppl to collect it - easy, but people with strong views are more likely to take one.
59
interview pros
ask more complex questions can explain Qs if someone doesn't understand/ask follow up questions higher response rate you know the right person answered the questions
60
interview cons
time consuming - one person at a time expensive - employ interviewers/travel if sample is geographically spread out more likely to lie if questions are sensitive, they may be embarrassed answers could be recorded in a biased way (accidental if untrained, deliberate if strong views)
61
statistical enquiry cycle
1. planning (hyp, what data and how use) 2. collecting data (prim/sec, constraints) 3. processing and presenting data (diagrams/measures, tech) 4. interpreting results (plan analysis, conclusions, predict) 5. communicating results clearly and evaluating methods (aware of target audience, clear visual representation of results)
62
collecting data
primary data by experiment - reliable recording of data accurately/fairly secondary data from a website- more reliable in cases, for sensitive topics (income, (money spent) weight, age)
63
processing and presenting
Distribution? -averages -measures of spread -box plots -(pie charts) -(histograms) -(bar graphs) Correlation -Scatter graph -line of best fit -SRCC -PMCC Over time -time series graph
64
Interpreting data
-compare averages or -find correlation do the result prove/disprove hypothesis -do I need to repeat to find more results? (c+e)
65
Closed vs open questions
Closed questions have a fixed number of possible answers whereas open questions can be answered in any way.
66
Questionnaire questions, think: (SABCURL)
-Is it understandable and clear? -Is it relevant? -Is it leading? -Is is biased? -Is it ambiguous? -Is it sensitive?
67
How can we reduce the problem of non-responses?
-Follow up people who did not respond -Provide an incentive for people to answer (prize) -Use clear questions that are easy to answer
68
Remember to:
Answer the question in a statement Look at how many parts to q and how many marks
69
How to use technology
Can use technology to... -order data (e.g. by age) -identify missing data -remove irrelevant columns/data -remove extraneous symbols -remove outliers -automate the calculation of summary statistics (using a computer) e.g. mean point, line of best fit. -set up a computer to visually represent data
70
Advantages of using technology
-can reduce human error -uses all data so unbiased -more visually appealing -saves time
71
constraints when planning an investigation:
time - under pressure? costs - budget? minimise spending? longer investigation = more expensive, costs of travel and equipment ethical issues - no harm/ distress confidentiality- sensitive information e.g income? could be hard to get accurate data- ppl may lie or refuse to answer. convenience - hyp could be difficult/ impossible to test, think abt most convenient way to access data you need
72
observation
involves counting or measuring
73
reference sources
secondary sources of information: -acknowledge its source -consider reliability(biased?) -out of date? wrong format? data incomplete/missing?
74
explanatory variable
the variable you are in control of/ the variable that has an affect on the other variable
75
response variable
the variable you measure/ changes as a result of changing the explanatory variable.
76
when considering a lab, field, or natural experiment, think:
how far can I control the explanatory variable?
77
How can we clean raw data?
-Remove outliers -Put data in the dame format -Remove extraneous symbols -Identify missing values -Remove irrelevant columns
78
Why would we repeat a simulation/experiment a number of times?
-Find the mean average -Compare results/see patterns -Spot anomalous results -Results will vary
79
Steps for a simulation
-Choose a suitable method for getting random numbers -Assign numbers to the data -Generate random numbers -Match the random numbers -count how many rolls or whatever it took -repeat a number of times and find the mean average
80
Frequency polygon
Use midpoints
81
Cumulative frequency chart
Use endpoints/the highest value.
82
Why would you expect a smaller sample to have a greater standard deviation?
More variation between samples.
83
Why may it be appropriate to remove outliers?
-May be an error in data -Doesn't fit trend
84
What should you look for in tables?
Patterns in the data e.g. is distribution symmetric?
85
why might the mean be appropriate?
takes into account all the data can be used to calculate standard deviation
86
why might the mean not be appropriate?
may be significantly affected by extreme values or outliers
87
why might the median be appropriate?
-useful when data is skewed or contains outliers as not distorted by extreme values -easy to find in ordered data -can be used alongside range and IQR
88
why might the median not be appropriate?
isn't always a data value not always a good representation of the data
89
why might the mode be appropriate?
always a data value can be used with non-numerical data easy to find in tallied data
90
why might the mode not be appropriate?
-doesn't always exist -may be more than one -may be a misleading value far from the mean -may not be a good representation of the data.
91
What does PMCC tell you?
It measures how close the points on a scatter diagram are to a straight line (how linear the correlation is)
92
What does SRCC tell you?
It measures correlation between ranks. (this can be strong even if the data values themselves have a non-linear relationship so SRCC can detect both a linear and non-linear association).
93
How will SRCC and PMCC compare if there's a non-linear association between two variables?
Both will be positive or negative but SRCC will be stronger (closer to 1 or -1).
94
If the mean is low then...
more than 50% of data values must be above the mean.
95
If the mean in high then...
More than 50% of data values must be below the mean.
96
Why should a control group be used?
Allows for comparisons (between control and test group).
97
how could matched pairs be used? (2)
Will aim to pair people based on similar characteristics (e.g. age, gender) and place one in each group.
98
What can you do when given a pie chart?(or comparative pie charts)
Measure the radius! With a ruler!
99
index numbers
talk about rate
100
for probability tree diagrams...
multiply all the branches out to find values at end of each branch.
101
for comparing regression lines...
-talk about gradient. -plug the values given into the equation or imagine x as 0. -interpret each correlation.
102
Cumulative frequency step polygons
along and then up the height of each step is the same as the frequency for its corresponding value e.g. 5 boxes (vertical) have 48 matches (horizontal)
103
why might the mean increase?
if you add a value greater or take away a value less than the mean, it increases
104
Why is combining results (e.g. into one grouped frequency table) an advantage?
Only need to calculate one mean
105
Why is combining results (e.g. into one grouped frequency table) a disadvantage?
Can’t compare classes
106
What do we do for a systematic sample?
number divide choose go in intervals
107
What do we do for a systematic sample?
number divide choose go
108
Are population or sample means more consistent? 😡
sample means so standard deviation of pop is bigger