1-3 exam terms and concepts Flashcards
(33 cards)
What is the definition of data?
Numbers in context
What is data analysis?
It tries to summarize and explain the data
What are variables in statistics?
They are the differences in what is being measured weather it is people or things
What is a data set/ sample?
It’s a collection of data
What is a sample?
It is a sample of the whole population
What are the 2 main types of variables?
Numerical and categorical
What is numerical data?
Data that involves numbers such as heights or ages
What is categorical data?
Data that involves qualities such as color or location
What is the difference between stacked and unstacked data?
Stacked means something has become coded or tuned into number 0 or 1 to correlate with association of a category ex male 1 female 0
What type of experiment is least likely to prove a correlation?
A) Ancedotes
B) Observational
C) Controlled
A
What is a distribution?
A way of displaying data where each datum as a frequency
What’s the difference between a histogram and a relative frequency histogram?
A histogram uses the numbers as they are while a relative frequency histogram divides the specific point of data by all of the entries ex 4 divided by 11 is 0.36 so that is what would be displayed
What is the formal definition of an outlier?
Haha sike there isn’t one
How can you tell if a data set is left skewed?
If the “tail” goes to the left
How can you tell if a data set is right skewed?
If the “tail” goes to the right
What is the difference between a bar chart and a histogram?
A bar chart has spaces and a histogram does not
What makes a bar chart “pareto”?
If if goes from largest to smallest aka goes downhill
What does z- score measure?
How far a data point is away from the mean
What are the best ways to show data if it categorical?
With a bar chart, pareto chart, or pie chart
What are the best ways to show data if it is numerical?
With a dotplot, histogram, or stemplot
What does standard deviation measure?
It measures the spread of data
What does the IQR measure?
It measures the spread of data
Is standard deviation or IQR a better measure of spread for a skewed data set?
IQR
Is standard deviation or IQR a better measure of spread for a symmetrical data set?
Standard deviation