Ch. 12 Flashcards
(42 cards)
Descriptive statistics
Refers to a set of techniques for summarizing and displaying data.
distribution
The way scores are distributed across levels of a variable.
Every variable has a distribution
Frequency Tables
A display of each value of a variable and the number of participants with that value.
The first column lists the values of the variable—the possible scores on the Rosenberg scale—and the second column lists the frequency of each score.
There are a few other points worth noting about frequency tables.
First, the levels listed in the first column usually go from the highest at the top to the lowest at the bottom, and they usually do not extend beyond the highest and lowest scores in the data.
Second, when there are many different scores across a wide range of values, it is often better to create a grouped frequency table, in which the first column lists ranges of values and the second column lists the frequency of scores in each range.
In a grouped frequency table, the ranges must all be of equal width, and there are usually between five and 15 of them.
Finally, frequency tables can also be used for categorical variables, in which case the levels are category labels.
The order of the category labels is somewhat arbitrary, but they are often listed from the most frequent at the top to the least frequent at the bottom.
Histograms
A graphical display of a frequency distribution.
It presents the same information as a frequency table but in a way that is even quicker and easier to grasp.
The x-axis of the histogram represents the variable and the y-axis represents frequency.
Above each level of the variable on the x-axis is a vertical bar that represents the number of individuals with that score.
When the variable is quantitative, there is usually no gap between the bars.
When the variable is categorical, however, there is usually a small gap between them.
Distribution Shapes
When the distribution of a quantitative variable is displayed in a histogram, it has a shape.
There is a peak somewhere near the middle of the distribution and “tails” that taper in either direction from the peak.
Another characteristic of the shape of a distribution is whether it is symmetrical or skewed.
symmetrical: When a histogram’s left and right halves are mirror images of each other.
skewed: When a histogram’s peak is either shifted toward the upper end of its range and has a relatively long negative tail (Negatively Skewed) or the peak is shifted toward the lower end of its range and has a relatively long positive tail (Positively Skewed).
outlier
An extreme score that is much higher or lower than the rest of the scores in the distribution.
Measures of Central Tendency and Variability
It is also useful to be able to describe the characteristics of a distribution more precisely. Here we look at how to do this in terms of two important characteristics: their central tendency and their variability.
Central Tendency
Is the middle of a distribution—the point around which the scores in the distribution tend to cluster. (Another term for central tendency is average.)
mean
The average of a distribution of scores (symbolized M) where the sum of the scores are divided by the number of scores.
median
The midpoint of a distribution of scores in the sense that half the scores in the distribution are less than it and half are greater than it.
mode
The most frequently occurring score in a distribution.
variability
The extent to which the scores vary around their central tendency in a distribution.
range
A measure of dispersion that measures the distance between the highest and lowest scores in a distribution.
Standard deviation
Is the average distance between the scores and the mean in a distribution.
variance
A measurement of the average distance of scores from the mean.
percentile rank
For any given score, the percentage of scores in the distribution that are lower than that score.
z score
Is the difference between that individual’s score and the mean of the distribution, divided by the standard deviation of the distribution. It represents the number of standard deviations the score is from the mean.
Differences Between Groups or Conditions
Differences between groups or conditions are usually described in terms of the mean and standard deviation of each group or condition.
It is also important to be able to describe the strength of a statistical relationship, which is often referred to as the effect size.
The most widely used measure of effect size for differences between group or condition means is called Cohen’s d, which is the difference between the two means divided by the standard deviation:
Cohen’s d
d = (M1 −M2)/SD
In this formula, it does not really matter which mean is M1 and which is M2.
If there is a treatment group and a control group, the treatment group mean is usually M1 and the control group mean is M2.
Otherwise, the larger mean is usually M1 and the smaller mean M2 so that Cohen’s d turns out to be positive.
Indeed Cohen’s d values should always be positive so it is the absolute difference between the means that is considered in the numerator.
The standard deviation in this formula is usually a kind of average of the two group standard deviations called the pooled-within groups standard deviation.
To compute the pooled within-groups standard deviation, add the sum of the squared differences for Group 1 to the sum of squared differences for Group 2, divide this by the sum of the two sample sizes, and then take the square root of that.
Informally, however, the standard deviation of either group can be used instead.
Conceptually, Cohen’s d is the difference between the two means expressed in standard deviation units.
A Cohen’s d of 0.50 means that the two group means differ by 0.50 standard deviations (half a standard deviation).
A Cohen’s d of 1.20 means that they differ by 1.20 standard deviations
Cohen’s d
how should we interpret these values in terms of the strength of the relationship or the size of the difference between the means?
Values near 0.20 are considered small, values near 0.50 are considered medium, and values near 0.80 are considered large.
Thus a Cohen’s d value of 0.50 represents a medium-sized difference between two means, and a Cohen’s d value of 1.20 represents a very large difference in the context of psychological research.
Cohen’s d is useful because it has the same meaning regardless of the variable being compared or the scale it was measured on.
A Cohen’s d of 0.20 means that the two group means differ by 0.20 standard deviations whether we are talking about scores on the Rosenberg Self-Esteem scale, reaction time measured in milliseconds, number of siblings, or diastolic blood pressure measured in millimeters of mercury.
Not only does this make it easier for researchers to communicate with each other about their results, it also makes it possible to combine and compare results across different studies using different measures.
Correlations Between Quantitative Variables
linear relationships
Relationships between two variables whereby the points on a scatterplot fall close to a single straight line.
Correlations Between Quantitative Variables
Nonlinear relationships
Relationships between two variables in which the points on a scatterplot do not fall close to a single straight line, but often fall along a curved line.
restriction of range
When one or both variables have a limited range in the sample relative to the population, making the value of the correlation coefficient misleading.
Presenting Descriptive Statistics in Writing
using words only for numbers less than 10 that do not represent precise statistical results and using numerals for numbers 10 and higher.
However, statistical results are always presented in the form of numerals rather than words and are usually rounded to two decimal places