Module 5- Descriptive Statistics Flashcards
1
Q
Descriptive Statistics
A
- summarizing our data set to better understand and communicate important information
- helps researchers identify and communicate important characteristics about the empirical data
2
Q
Raw Scores
A
- data resulting from our measurement procedures
- not informative
- ex. listing all the scores from the quiz
instead using descriptive statistics we could communicate performance on quiz by a class average
3
Q
Frequency Distribution
A
- vital for describing data
-quick way to summarize how many scores were observed at each data point - type of freq dis used depends on the level of measurement
- x axis; observations of the variable in question
- y axis; frequency of each observation
4
Q
Bar Graphs
A
- used for data representing discrete categories (distinct/ non overlapping categories)
- summarizes nominal or categorical data
- can also be used for interval and ratio data but not often
5
Q
Frequency Polygon
A
- graph continuous data
- interval and ratio data
- not used for nominal data bc no assumption of equal intervals ^ cannot connect data points using a continuous line
- line to connect points represents equal intervals bw each data point
6
Q
Grouped Bar Graph
A
- taking ratio data and grouping it into categories
- grouping continuous data into categories
ex. scores of quiz, group people scores of 70-79% together into a bar
7
Q
Frequency distribution tells us
A
- number of observations at each data point
- normal vs skewed data
8
Q
Normal Distribution
A
- symmetrical bell curve
- IQ, Height, Weight
- represents majority of scores are in the middle with fewer observations at the ends/ extremes
- most observations around the mean
9
Q
Skewed Distribution
A
- scores are bunched at one end bc the extremes are pulling
10
Q
Positive Skew
A
- mean greater than the median (mean is pulled by higher scores)
- more values are clustered to the left (lower end of the scale)
- right end of the distribution (high end of the scale) gets pulled to the right and has a longer tail
- this happens when have a few extremely high observations
11
Q
Negative Skew
A
- mean less than the median ( mean is pulled by low scores)
- more values clustered to the right (higher end of the scale)
- left end of the distribution is pulled (lower end of the scale) and has a longer tail
- this happens when we have a few extremely low observation
12
Q
Measures of central tendency
A
- Mean
- Median
- Mode
convey info about the typical observation of our data set
13
Q
Mean
A
- most used MCT
- mathematical average of our data set
- mean= sum of scores/ number of scores
14
Q
Mode
A
- Most frequent score/ observation in the data set
- peak of the frequency distribution
15
Q
Bimodal Distribution
A
- when have 2 peaks in the distribution or 2 scores tied for the most frequent
16
Q
Median
A
- middle point of the distribution
- to find; list all the scores in order of magnitude and the score that is in the middle= median
- cuts distribution in half; 50% of observations fall above and 50% fall below
- not used often
- use median when data is skewed bc gives more information bc mean is very much impacted by extreme scores
17
Q
Mean and Median can only be calculated for…
A
Interval and Ratio Data
18
Q
MCT and normal distribution
A
- Mean, median and mode are all equal
19
Q
MCT and Skewed distribution
A
- Mean is very much impacted by extreme scores/ outliers
- Median is more informative and representative of the distribution
- positive skew; Mean > Median bc higher scores pull the mean
- negative skew; Mean < Median bc lower scores pull the mean
20
Q
Variability
A
- provides us with an index of how spread out the scores are around the MCT
21
Q
measures of variability
A
- range
- variance
- standard deviation
22
Q
Range
A
- most basic way to represent dispersion of scores
- difference bw the largest and smallest score
- not always informative
- sensitive to outliers; one extreme score can drastically have an impact on the range of the data set
23
Q
Variance
A
- how much each score in the distribution varies from the mean of the distribution
- average squared deviation from the mean
24
Q
Problem with Variance
A
- is the sum of squares ^ different unit of measurement than the observations
- makes it hard to interpret
- ex. if looking at quiz grades the varience would be
% squared
25
Standard Deviation
- Measures the dispersion of the data set relative to the mean.
- determines the percentage of points that will fall around the mean
- solves the problem of variance
- is the square root of the variance
- therefore converts the scores back into the same scale as the observations
26
Properties of Normal Distribution
-68% of all observations/ scores will fall w/in (+/-) 1 SD of the mean
-95% of all observations will fall w/in (+/-) 2 SD of the mean
-99% of observations will fall w/in (+/-) 3 SD of the mean
27
Smaller the Standard Deviation...
- the smaller the interval and scores vary less around the mean
28
Larger the Standard Deviation...
- the larger the interval and scores vary more around the mean
29
If you know the mean and standard deviation, can calculate
- the interval in which 68, 95 or 99% of the scores will fall
30
Data Transformation
- transform data from its OG state to compare to data that has different measures
- cannot compare different measures therefore have to transform the data into the same units
31
Z scores
- most common transformation of data
- expresses each of the scores or observations in the data set in relation to the mean or standard deviation of the entire distribution
- measures exactly how many standard deviations above or below the mean a data point is
32
when z scores are used
- data did not form a normal distribution and have to do infernal stats
- want to compare 2 data sets of diff measures
33
Z score mean
Mean= 0
34
Z score Standard Deviation
SD=1
35
When can Z score not be used?
- Nominal or Ordinal Data
- bc they do no have a meaningful mean
36
equation for z score
(score- mean)/ Standard Deviation
37
Z scores tell us
- Valance
- Size
38
Z score Valance
+Z; observed score is larger than the mean
-Z; observed score is smaller than the mean
ex. if Z=-1.8, we know the student fell below the class average
39
Z score size
- tells us with more precision where on the distribution the score fell
68% of all Z scores fall between -1 and +1
95% of all Z scores fall between +2 and -2
99% of all Z scores fall between +3 and -3
- ex. Z= -1.8, close to -2 so we know the score fell more towards the left of the distribution
40
Pearson Product Moment Correlation Coefficient (r)
-type of descriptive data
- describes the relationship bw 2 variables based on how much they vary together
- used for interval or ratio data
- can use the analogy of 2 overlapping circles. the amount 2 circles overlap is how much variance the 2 variables share
- small overlap; correlation coefficient is small
- big overlap; correlation coefficient is big
41
Coefficient of Determination
- r^ Squared
- Proportion of variance accounted for in one variable by knowing the other variable
- allows to make predictions. if highly correlated can make predictions about the other variable
- HIGHER THE R2 THE BETTER OUR PREDICTIONS WILL BE
- r2= 0.45; proportion of variance accounted for is 45%
42
if SD is small
- tall and skinny graph
- little dispersion of scores around the mean
- what we want
43
If SD is large
- flat and wide graph
- large dispersion of scores around the mean