2.4 - A Statistical Primer Flashcards
Descriptive Statistics
a set of techniques used to organization, summarize, and interpret data
Statistics used to describe and understand the data:
Frequency, central tendency, variability
Data Distribution
1) whether some scores occurred more often than others
2) whether all the scores were clumped in the middle or more evenly spaced across the whole range
Histogram
Bar graph
*vertical axis shows the frequency
Frequency
the number of observations that fall within a certain category or range of scores
Normal Distribution (Bell Curve)
a symmetrical distribution with values clustered around a central, mean value
Negatively Skewed Distribution
a distribution in which the curve has an extended tail to the left of the cluster
Positively Skewed Distribution
a distribution in which the curve has an extended tail to the right of the cluster
Skews occur because?
there is an upper or lower limit to the data
(ex. person cannot take less than 0 mins on a quiz, curve of quiz time cannot continue indefinitely to the left, beyond the 0 point)
Central Tendency
a measure of the central point of distribution
*measured usually by the mean, but there are exceptions
Three different measures of Central Tendency
mean, median, mode
Mean
the arithmetic average of set numbers
ex. class averages
Median
the 50th percentile - the point on the horizontal axis at which 50% of all observations are lower, and 50% of all observations are higher
Mode
the category with the highest frequency (category w/ most observations)
*measure that is used least
Which to use to calculate central tendency when mean, median, and mode are equal?
Normally distributed data - Mean
Mode = measure that is used least, provides less info than other two, used when dealing w/ categories of data (ex. when you vote for a candidate, the mode = candidate w/ most votes)
Skewed Data (Positively/Negatively) - median (extreme values have a large effect on mean but will not affect the median
*the longer the tail, the more the mean is pulled away from the centre of the curve
Variability
the degree to which scores are dispersed in a distribution
(some are spread out, some are clustered)
Higher Variability = larger # of cases that are closer to the extreme ends of the continuum for that set data
(ex. lots of excellent AND poor students in one class)
Lower Variability = most scores are similar
(ex. call filled with all “B” students)
*can be caused by measurement errors, imperfect measurement tools, differences between participants in the study, characteristics of participants on that given day (ex. mood, fatigue levels)
Standard Deviation
a measure of variability around the mean (estimate of the average distance from the mean)
*links central tendency and variability
The ______ always marks the 50th percentile of the distribution.
median
The ______ is a measure of variability around the mean of a distribution.
standard deviation
A histogram is created that presents data on the number of mistakes made on a memory test by participants in a research study. The vertical axis indicates?
the frequency of errors made
In a survey of recent graduates, your university reports that the mean salaries of the former students are positively skewed. What are the consequences of choosing the mean rather than the median or the mode in this case?
The mean is likely to provide a number that is higher than the largest cluster of scores
Hypothesis Test
a statistical method of evaluating whether differences among groups are meaningful, or could have been arrived at by chance alone
Statistical Significance
the means of the groups are father apart than you would expect them to be by random chance alone
- proposed by Ronald Fisher
- not used for limited numbers of potential participants
Null Hypothesis & Experimental Hypothesis
Null = any differences between groups (or conditions) are due to chance
Experimental = assumes that any differences are due to a variable controlled by the experimenter