DS Foundations Part 2 Flashcards
(23 cards)
What is the purpose of data visualization?
To communicate patterns, relationships, and insights effectively.
When should you use a bar chart?
To compare quantities across categories.
When is a pie chart misleading?
When comparing many categories or values with small differences.
What is the best chart for time series data?
A line chart.
What is a heatmap useful for?
Visualizing correlation matrices or high-dimensional categorical data.
What are the four levels of measurement?
Nominal, ordinal, interval, and ratio.
What distinguishes interval from ratio data?
Ratio data has a true zero; interval data does not.
Is temperature in Celsius interval or ratio?
Interval — because zero doesn’t represent absence of heat.
Is weight a ratio scale?
Yes — it has a true zero and equal intervals.
What is sampling?
Selecting a subset of a population for analysis.
What is selection bias?
When the sample is not representative of the population.
What is a random sample?
Each individual has an equal chance of being selected.
What is stratified sampling?
Dividing population into subgroups and sampling from each.
What is a convenience sample?
A non-random sample taken from a readily available group.
What is Simpson’s paradox?
A trend appears in groups but disappears or reverses when groups are combined.
Why is domain knowledge important in data analysis?
It helps interpret results and choose relevant features.
What is confounding?
When a third variable affects both the independent and dependent variables.
Why does correlation not imply causation?
Because the relationship may be coincidental or due to a third variable.
What does ‘garbage in, garbage out’ mean?
Poor quality input data leads to unreliable analysis or results.
What is metadata?
Data that describes other data, like column types or source.
Why is metadata important?
It provides context, improves transparency, and aids reproducibility.
What is data provenance?
A record of the origin, lineage, and transformation of data.
Why is tracking data provenance valuable?
It supports auditing, trust, and reproducibility.