Exploring Data Flashcards
What is a categorical varialbe?
variables that take on values that are names or labels, such as color, or breed of dog.
What is a quantitative variable?
variables that are that are numerical, and represent a measurable quantity, like salary or height.
How do we represent categorical variables?
With bar charts or pie charts
How do we represent quantitative variables?
With histograms, stem and leaf plots, or boxplots
When do we use the mean to describe a distribution?
When the distribution is unimodal and symmetric
When do we use the median to describe a distribution?
When the distribution is not unimodal and symeetric.
What is standard deviation?
The average distance from the mean
What is a z score?
The number of standard deviations away from the mean
What is percentile?
percent to the left
What is the five number summary?
min, Q1, median, Q3, max
What is IQR
interquartile range: Q3 - Q1
What is the empirical rule?
mean-68-95-99.7–yes!!
What percent of data lies above the median?
50%
How do you determine “outliers”
1.5 IQR’s above Q3 or below Q1
If a distribution is skewed right, which is higher, the median or mean?
the mean–the mean chases the tail!
How do you know whether to use the mean and s.d. to describe data, or median and IQR?
If the data is unimodal and symmetric, use the mean and s.d., otherwise use median and IQR
What should you remember when making graphs?
Label your axes, give a key if needed, and give the graph a name!
How much data is between Q1 and Q3?
50%
what is a contingency table
shows distributions across 2 variables like gender and music pref. AKA 2-way table
How can you tell if variables in a contingency table are independent?
If the distributions are the same across the variables.. Then it doesn’t DEPEND.. so INDEPENDENT
marginal distribution
overall distributions of a single variable in contingency table (out in margins)
conditional distribution
A distribution within the table, along only one row or one column? NOT IN THE MARGINS
How do you describe distributions (histograms)?
Shape-Cener-Spread- and STRANGE (Outliers and gaps) some say GSOCS. where’s yo GSOCS?
If asked to compare distributions, what should you write about?
Compare Shapes, Centers, Spreads, and Stranges.. The GSOCS