Lecture 2 Flashcards
(11 cards)
Quantitative data
Quantitative data “any type of data that is numeric in form” or assigned a numeric value
Observational data
- subtype of quantitative data
- Observational data is “collected without researchers interacting with their subjects or their environment”
Non-observational data
- subtype of quantitative data
- is collected through researchers either interacting with their subjects or intervening in their subjects’ environments
- e.g experiment or survey
Observational data - pros and cons
- often more widely available and easier to collect since it does not require ethics clearance
- not subject to observer bias (researcher influences response of subjects through their actions or words)
- sometimes very “messy” (lacks structure) and takes significant effort to re-structure into a usable format (e.g., social media data, data scraped from the web)
- over time, some observational data may be subject to guinea pig or measurement effects (people change their behaviours because they are aware that it is being measured, tracked etc.)
Non-observational data - pros and cons
- may not represent the real world well
- subject to potential observer bias
- guinea pig or measurement effects, e.g. social desirability bias of participants: tendency for participants to over-report socially “desirable” behaviour and under-report “undesirable” behaviour
- challenges with gathering a large, representative sample
- can be used to study outcomes associated with policies, institutions, or practices that do not exist in the real world (yet)
- tends to exist in cleaner format because it was collected and documented by researcher(s) for a specific purpose
5 Criteria to evaluate Quality of Data
- Accuracy
- Data Validity
- Precision
- Completeness
- Consistency
Accuracy
is the data reflective of real-world values? Is it correct? E.g if the dataset only recorded by-elections for 2024, and it recorded 4 by-elections, it would be accurate.
(By-elections were held in the ridings of Durham, Ontario (March 4), Toronto-St.Paul’s, Ontario (June 24), Elmwood-Transcona, Manitoba (September 16), and LaSalle-Émard-Verdun, Quebec (September 16)).
Data Validity
do the scores of the variable accurately capture what a variable is said to represent or indicate? E.g if studying prison overcrowding prisoner per sq foot would be more valid then #of prisoners
Precision
- increases as we measure data in smaller units or intervals.
- We should be measuring our data as precisely as feasibly possible without sacrificing accuracy or validity.
- A more precise variable may actually be inaccurate, especially in the case of sensitive survey questions, for instance. (for example ppl might not answer there income very accurately if the intervals are small)
Completeness
a dataset is complete if it (1) includes values for the whole universe of relevant cases and (2) includes observations for all of the relevant measures or variables in the data.
Consistency
- Data consistency “refers to the absence of contradictions in the data” (Brancati 2016, p.238). For example, data is consistent when cases are coded according to the same rules and the data are collected using the same types of sources.
- Inconsistent data lacks validity, but consistency does not guarantee validity.
- ## Consistency cannot make up for low levels of validity.