PAD-L7-intro to Stats-2025-LJ Flashcards
(24 cards)
What is the main question addressed in the bullet hole pattern analysis during WWII?
How to minimize damage to bombers by identifying sensitive areas
The analysis focused on reinforcing areas that were not shot at on returning planes.
What is survivorship bias?
The error of focusing on the survivors or successful cases while ignoring those that did not survive or succeed
This can lead to incorrect conclusions about the data.
What are the three key take-home messages from the introduction?
- Understand what question you want to address
- Understand what data you need to answer a question
- Understand what data you have and how reliable it is
What is a variable?
Something that takes on different values that can be measured or counted
What are the two main categories of variables?
- Numerical (quantitative)
- Categorical (qualitative)
What are the five types of variables?
- Binary
- Nominal
- Ordinal
- Discrete
- Continuous
What is descriptive statistics?
Describing and summarizing data
How is the mean calculated?
Add the values of a set of observations together and divide by the number of observations
What is the median?
The exact middle value in a sorted list of observations
When is the median preferred over the mean?
When dealing with skewed distributions or data with outliers
What is the mode?
The value that occurs most frequently in a dataset
What are the methods to measure variability?
- Variance
- Standard deviation
- Range
- Interquartile range (IQR)
What does variance measure?
The extent to which each observation deviates from the mean
What is the standard deviation?
The square root of the variance, representing the average of the deviations of observations from the mean
How is the range defined?
The difference between the largest and smallest observation
What is a percentile?
A value indicating the percent of a distribution that is equal to or below it
What is the Interquartile Range (IQR)?
The difference between the 75th percentile and the 25th percentile
What is the purpose of visualizing data with plots?
To provide summary pictures that spot patterns, trends, and anomalies in data
What is a histogram used for?
To plot the distribution of a numeric variable
What characterizes a normal distribution?
It is symmetric and evenly distributed about the mean
What are boxplots used for?
To compare groups of continuous data
What do scatter diagrams illustrate?
The relationship between two continuous variables
What is the significance of the second quartile in a boxplot?
It represents the median of the data
What is the next topic to be covered after descriptive statistics?
Formulating a hypothesis