Vocab Flashcards

Question

Random sampling.

Answer 1

Everyone in the population has an equal chance of being selected (unbiased).

Answer 2

Group by characteristics, and interview a number from each group

Answer 3

Data naturally splits. List of clusters = sampling frame. Randomly select clusters to form sample.

Answer 4

Intentionally different proportion of people asked from each strata, depending on size. (e.g. 60/1000 x 250 =15 year 7s in sample).

Answer 5

For sensitive questions which people are likely to answer dishonestly (e.g. flipping coins, if heads, tick yes, if tails, answer honestly.)

Answer 6

gather data that directly relates to hypothesis you know reliability

Answer 7

expensive time consuming difficult/impossible

Answer 8

easier to get hold of can gather data quickly and cheaply large data sets

Answer 9

wrong format/rounded difficult to find data that matches your hypothesis exactly (out of date, no relevant data available) don't know accuracy, may be biased, unreliable

Answer 10

representative of entire pop. unbiased

Answer 11

hard/impossible for big pops. expensive impractical might be tricky to define entire pop/access all members not an option when items being used up/damaged by investigation

Answer 12

quicker cheaper more practical than a census

Answer 13

less accurate not fully representative biased variability between samples

Answer 14

unbiased (should be) representative

Answer 15

not always practical/convenient-if pop. spread over large area, travel impossible to list entire pop. or access everyone

Answer 16

likely gives a representative sample if you have easy to define categories (e.g. gender) can compare results from different groups

Answer 17

not useful when no obvious categories/hard to define can be expensive

Answer 18

unbiased sample can be done by machine

Answer 19

nth item might coincide with a pattern (e.g. fault) so biased

Answer 20

convenient (saves travel time when pop. spread over large area)

Answer 21

biased if similar clusters sampled, e.g. with similar incomes per region.

Answer 22

quick representation of all diff groups (genders etc) can be done with no sample frame member easily replaced by one of the same characteristics

Answer 23

biased- interviewer bias refusal to take part (might have similar views) -not all may have an equal chance of being selected

Answer 24

convenient

Answer 25

-not representative of pop. -very biased. -selecting at a particular time and place so not all students have an equal chance of being selected.

Answer 26

quick sometimes may be the only suitable method to use

Answer 27

researcher bias researcher unreliable-though should have good knowledge of pop. not random -very biased

Answer 28

gives names or numbers to classes of qualitative data so it can be more easily processed. (numbers don't have meaning).

Answer 29

(rank scale) gives numbers to the classes of data which can be ordered in a meaningful way.

Answer 30

made up of two or more variables

Answer 31

data made up of two variables (numerical)

Answer 32

quick and cheap well written ones shouldn't be biased respondents aren't under pressure, so their answers likely truthful can distribute to large numbers of people

Answer 33

distribution can lead to bias non-responses (particularly on sensitive Qs) (discard but might remove certain parts of pop.) questions might not be understood by respondent

Answer 34

hand it out - target pop gets, but time consuming put it online -data recorded and collected easily, but ppl without internet access excluded post/email - wide reaching, not sure who is responding ask ppl to collect it - easy, but people with strong views are more likely to take one.

Answer 35

ask more complex questions can explain Qs if someone doesn't understand/ask follow up questions higher response rate you know the right person answered the questions

Answer 36

time consuming - one person at a time expensive - employ interviewers/travel if sample is geographically spread out more likely to lie if questions are sensitive, they may be embarrassed answers could be recorded in a biased way (accidental if untrained, deliberate if strong views)

Answer 37

1. planning (hyp, what data and how use) 2. collecting data (prim/sec, constraints) 3. processing and presenting data (diagrams/measures, tech) 4. interpreting results (plan analysis, conclusions, predict) 5. communicating results clearly and evaluating methods (aware of target audience, clear visual representation of results)

Answer 38

primary data by experiment - reliable recording of data accurately/fairly secondary data from a website- more reliable in cases, for sensitive topics (income, (money spent) weight, age)

Answer 39

Distribution? -averages -measures of spread -box plots -(pie charts) -(histograms) -(bar graphs) Correlation -Scatter graph -line of best fit -SRCC -PMCC Over time -time series graph

Answer 40

-compare averages or -find correlation do the result prove/disprove hypothesis -do I need to repeat to find more results? (c+e)

Answer 41

Closed questions have a fixed number of possible answers whereas open questions can be answered in any way.

Answer 42

-Is it understandable and clear? -Is it relevant? -Is it leading? -Is is biased? -Is it ambiguous? -Is it sensitive?

Answer 43

-Follow up people who did not respond -Provide an incentive for people to answer (prize) -Use clear questions that are easy to answer

Answer 44

Answer the question in a statement Look at how many parts to q and how many marks

Answer 45

Can use technology to... -order data (e.g. by age) -identify missing data -remove irrelevant columns/data -remove extraneous symbols -remove outliers -automate the calculation of summary statistics (using a computer) e.g. mean point, line of best fit. -set up a computer to visually represent data

Answer 46

-can reduce human error -uses all data so unbiased -more visually appealing -saves time

Answer 47

time - under pressure? costs - budget? minimise spending? longer investigation = more expensive, costs of travel and equipment ethical issues - no harm/ distress confidentiality- sensitive information e.g income? could be hard to get accurate data- ppl may lie or refuse to answer. convenience - hyp could be difficult/ impossible to test, think abt most convenient way to access data you need

Answer 48

involves counting or measuring

Answer 49

secondary sources of information: -acknowledge its source -consider reliability(biased?) -out of date? wrong format? data incomplete/missing?

Answer 50

the variable you are in control of/ the variable that has an affect on the other variable

Answer 51

the variable you measure/ changes as a result of changing the explanatory variable.

Answer 52

how far can I control the explanatory variable?

Answer 53

-Remove outliers -Put data in the dame format -Remove extraneous symbols -Identify missing values -Remove irrelevant columns

Answer 54

-Find the mean average -Compare results/see patterns -Spot anomalous results -Results will vary

Answer 55

-Choose a suitable method for getting random numbers -Assign numbers to the data -Generate random numbers -Match the random numbers -count how many rolls or whatever it took -repeat a number of times and find the mean average

Answer 56

Use midpoints

Answer 57

Use endpoints/the highest value.

Answer 58

More variation between samples.

Answer 59

-May be an error in data -Doesn't fit trend

Answer 60

Patterns in the data e.g. is distribution symmetric?

Answer 61

takes into account all the data can be used to calculate standard deviation

Answer 62

may be significantly affected by extreme values or outliers

Answer 63

-useful when data is skewed or contains outliers as not distorted by extreme values -easy to find in ordered data -can be used alongside range and IQR

Answer 64

isn't always a data value not always a good representation of the data

Answer 65

always a data value can be used with non-numerical data easy to find in tallied data

Answer 66

-doesn't always exist -may be more than one -may be a misleading value far from the mean -may not be a good representation of the data.

Answer 67

It measures how close the points on a scatter diagram are to a straight line (how linear the correlation is)

Answer 68

It measures correlation between ranks. (this can be strong even if the data values themselves have a non-linear relationship so SRCC can detect both a linear and non-linear association).

Answer 69

Both will be positive or negative but SRCC will be stronger (closer to 1 or -1).

Answer 70

more than 50% of data values must be above the mean.

Answer 71

More than 50% of data values must be below the mean.

Answer 72

Allows for comparisons (between control and test group).

Answer 73

Will aim to pair people based on similar characteristics (e.g. age, gender) and place one in each group.

Answer 74

Measure the radius! With a ruler!

Answer 75

talk about rate

Answer 76

multiply all the branches out to find values at end of each branch.

Answer 77

-talk about gradient. -plug the values given into the equation or imagine x as 0. -interpret each correlation.

Answer 78

along and then up the height of each step is the same as the frequency for its corresponding value e.g. 5 boxes (vertical) have 48 matches (horizontal)

Answer 79

if you add a value greater or take away a value less than the mean, it increases

Answer 80

Only need to calculate one mean

Answer 81

Can’t compare classes

Answer 82

number divide choose go in intervals

Answer 83

number divide choose go

Answer 84

sample means so standard deviation of pop is bigger

Vocab Flashcards

Chapter 1 - Collection of data. (108 cards)