Modules 1-2 Flashcards
(39 cards)
What should we identify before gathering and analyzing data?
The question we wish to answer
What type of graph is useful for examining a data set to reveal patterns and trends?
Histogram
In a histogram, what does the x-axis represent?
Bins corresponding to ranges of data
In a histogram, what does the y-axis indicate?
The frequency of observations falling into each bin
What is an outlier?
A value that falls far from the rest of the data
What should we do before deciding on an outlier?
Carefully investigate it
What can graphing two variables on a scatter plot reveal?
Relationships between two variables (two data sets)
What is a key point regarding correlation and causation?
Correlation does not imply causation
What should we be alert to when examining relationships between two data sets?
The possibility of hidden variables
What are descriptive statistics also known as?
Summary statistics
What three values describe the center of a data set?
- Mean
- Median
- Mode
How is the mean calculated?
Sum of all data points divided by the number of data points
What is the median?
The middle value of the data set
What does the mode represent in a data set?
The value that occurs most frequently
Can a data set have multiple modes?
Yes
What measures the spread of the data?
- Range
- Variance
- Standard deviation
How is the standard deviation calculated?
The square root of the variance
What is a conditional mean?
A conditional mean is the mean of a subset of the data that includes all values satisfying a certain condition.
What is a percentile?
A percentile is a value below which a certain percentage of observations fall. For example, 60% of the observations are less than or equal to the 60th percentile.
What is the median in terms of percentiles?
The median is by definition the 50th percentile of a data set.
What is the coefficient of variation?
The coefficient of variation measures the size of the standard deviation relative to the size of the mean.
What does the correlation coefficient measure?
The correlation coefficient quantifies the strength of a linear relationship between two variables.
What is the range of the correlation coefficient?
The value of the correlation coefficient ranges between -1 and +1.
What does a correlation coefficient near zero indicate?
A correlation coefficient near zero indicates a weak or nonexistent linear relationship.