DS Foundations Part 4 Flashcards
(23 cards)
What is the law of large numbers?
As sample size increases, sample statistics converge to population parameters.
What is a population in statistics?
The entire group that you want to draw conclusions about.
What is a sample in statistics?
A subset of the population used to make inferences.
Why do we use samples instead of populations?
Because it is often impractical or impossible to measure the whole population.
What is confirmation bias?
The tendency to seek or interpret data in a way that confirms existing beliefs.
What is survivorship bias?
Focusing only on ‘successful’ cases and ignoring those that didn’t survive.
What is anchoring bias?
Relying too heavily on the first piece of information encountered.
What is the Dunning-Kruger effect?
When people with low ability overestimate their competence.
Why are consistent units important in data analysis?
Inconsistent units can lead to incorrect comparisons or aggregation.
What is derived data?
Data calculated from raw measurements, such as ratios or rates.
Why convert all values to common scale before analysis?
To ensure fair comparisons and model compatibility.
What is framing in data analysis?
The way a question or result is posed, which can influence interpretation.
Why does reframing a question matter?
Different framing can lead to different insights or decisions.
What is cherry-picking in data communication?
Selecting only results that support a desired narrative.
What is data accessibility?
The ease with which data can be retrieved and used by those who need it.
Why is documentation essential for usability?
It helps users understand structure, purpose, and limitations.
What is data discoverability?
How easily data can be found and identified for a purpose.
What is data governance?
Policies and processes for managing data availability, usability, integrity, and security.
What is anonymization?
Removing identifying information to protect privacy in a dataset.
What is the difference between anonymization and pseudonymization?
Anonymization is irreversible; pseudonymization can be reversed with a key.
What is exploratory thinking?
Asking open-ended questions to investigate data and generate hypotheses.
What is explanatory thinking?
Explaining observed phenomena using tested relationships and models.
What is iteration in data work?
Revisiting steps (e.g., cleaning, modeling) based on new findings or feedback.