statistics definitions Flashcards
(18 cards)
raw data
data as it is first collected in a statistical investigation before it has been sorted or ordered
what is quantitative data
numerical data such as measures of height and weight
what is qualitative data
non-numerical data such as type of car or colour of hair
what is categorical data
variables that can be sorted into categories
what is continuous data
numerical data that can take values between two numbers like temperature
what is discrete data
numerical data that can only take one value like show size
what is ordinal data
position in a race or in a class test written in order of numerical value
what is bivariate data
pairs of related data values such as exam results and time spent on study
what is multivariate data
involves sets of three or more related data values like age, height and weight
what is primary data
data that you collect yourself
what is secondary data
data collected by a published source
what are the advantages and disadvantages of primary data
advantages:
- collection method known
- accuracy known
- questionnaire or survey can be designed properly to find answers to specific questions
disadvantages:
- collection of data can be expensive and time-consuming
what are the advantages and disadvantages of secondary data
advantages:
- easy and cheap to obtain
- data from known organisations is usually reliable like the UK office for National Statistics
disadvantages:
- data source may not be reliable
- data might contain errors
- data might not be suitable to find answers to specific questions
- collection method unknown
- data might be out of date
what needs to be controlled when collecting data
- extraneous variables which are any variables that the researcher is not interested in but could affect the results of the experiment
- explanatory data which is like the control variable in science
- response variable which is like the dependent variable in science
what are field experiments
carried out in an everyday (uncontrolled) environment but the researcher sets up the situation and variables are controlled
what is a natural experiment
carried out in an everyday (uncontrolled) environment but the researcher has no control over any variables
how must you clean data
- identify and correct or remove inaccurate data values or extreme values
- check units are consistent
- record values without units or with other symbols
- decide what to do about missing data
why must you check and clean data
- to ensure it is consistent and accurate before you process it otherwise you results may be invalid
- collected data may contain outliers or anomalous values that do not fit the pattern of the rest of the data and may skew your results
- outliers can be ignored if they are due to measuring or recording errors
- you need to check that your collection plan hasn’t affected the reliability of your results