Missing data Flashcards
(26 cards)
what is missing data?
Often don’t get information for every measure from every participants
Variables we study may contain missing values/ information
This includes error in data
why is missing data important?
Missing data points could be meaningful for analysis if observed
This could potentially affect our analysis as it may conceal a meaningful value for understanding a problem
We need to identify this and determine the nature of the missing data
what are the different reasons why missing data may occur?
non-response
data entry errors
instrumentation issues
privacy concerns
natural causes
what is non-response?
a reason why missing data may occur if participants choose not to answer certain questions or fail to provide information
what is data entry errors?
a reason why missing data may occur if there are mistakes made by researchers during the data collection or entry process
what is instrumentation issues?
a reason why missing data may occur if there are problems with the tools or instruments used to collect data
what is privacy concerns?
a reason why missing data may occur if sensitive information is omitted from a dataset to protect participants and to maintain ethics
what is natural causes?
a reason why missing data may occur if there are events beyond control e.g. technical issues, power outages, environmental factors can lead to missing data
what are the different types of missing data?
non-systematic
systematic
what are the types of non-systematic missing data?
missing completely at random (MCAR)
missing at random (MAR)
what are the different types of systematic missing data?
missing not at random (MNAR)
explain what MCAR is
The fact data is missing is independent of the observed and unobserved value
The missing data reduces the analysable population of the study
Reduces the statistical power but does not introduce bias
Can be considered a simple random sample of the full data set
explain what MAR is
The missing data is systematically related to the observed but not the unobserved values
Can occur if probability of completion of the survey is related to their sex which is fully observed but not the severity of their depression
explain what MNAR is
Missing data is systematically related to the unobserved value
Analysis of a dataset containing MNAR data
Likely to result in biased estimates
why is missing data a problem?
Can have significant effects on statistical analyses
May lead to biased estimates
Can affect the story we try to tell about our participants and who they represent in the wider population
why do we report missing data?
- Scientific research relies of clear reporting of approach and analysis so we need to show researchers and readers our procedure
- Aid transparency and reproducibility
- Help readers understand potential limitations with our statistical outputs
- Helps to draw reasonable conclusions about our study
what missing data do we report?
- The number of participants who had no data on either measure
- The number of participants who had no data on each measure
- Also specify the reasons for the missing data
why do we specify the reasons for the missing data?
to provide insights into whether the missingness is random or systematic
what are some ways in which we can identify missing data?
- Check within database and scan to look for the missing or erroneous value
- Missing value analysis in SPSS
- Visualise the data and check for the missing data
what are some other approaches to missing data?
complete case analysis (CCA)
mean/median/mode imputation
multiple imputation
weighted estimation
model-based methods
what is complete case analysis?
- involves excluding cases with missing data from analysis
- Can lead to biased results if missing data is not completely random
- Difficult to make inferences about entire target population or subpopulations
what is mean/median/mode imputation?
- Filling missing values with the mean, median or mode of the observed values for that variable
- Generally inappropriate as can reduce the variability of your data
- Can affect measurements of covariances and correlations
- No straightforward way to estimate standard errors
what is multiple imputation?
creating, analysing and combing multiple complete datasets with imputed values
what is weighted estimation?
using statistical models to estimate missing values