Descriptive Statistics Flashcards

1
Q

describe/summarize the data a researcher has

A

descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

helps a researcher understand the data that he has, while descriptive statistics help him explain to other people what is happening to his data

A

Exploratory data analysis (EDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The first thing to describe is the distribution of the data,
to show the kinds of numbers that we have.

A

describing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • Different ways of Describing the Distribution
  • is used to
    present the pattern in the data.
A
  • Frequency Table
  • Charts (e.g., histograms, bar chart etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

frequency distributions of nominal or ordinal data are customarily plotted using a ______

A

bar graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

____ drawn for each category, where the height of the
bars represent the frequency or number of members of
that category.

A

Bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

used to represent frequency distributions
composed of interval or ratio data. Bar is drawn for each
class interval.

  • Class intervals are plotted on the horizontal axis such
    that each class bar begins and terminates at the real
    limits of the interval.
A

histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

also used to represent interval or
ratio data.

Instead of using bars, a point is plotted over the midpoint
of each interval at a height corresponding to the
frequency of the interval. Points are joined by a straight
line.

A

frequency polygon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Don’t draw a bar chart for ___

A

Continuous measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

presents the score values and
their frequency of occurrence.

When presented in a table, the score values are listed in
rank order, with the lowest score value usually at the
bottom of the table.

A

Frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

in grouping data

A

how wide should interval be?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When data are grouped

A

some information is lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The wider the interval,

A

the more information is lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Constructing a frequency distribution of grouped scores

A
  1. Find the range of the scores.
  2. Determine the width of each class interval (i).
  3. List the limits of each class interval, placing the interval
    containing the lowest score value at the bottom.
  4. Tally the raw scores into the appropriate class intervals.
  5. Add the tallies for each interval to obtain the interval
    frequency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

indicates the
proportion of the total number of scores in each interval.

A

Relative Frequency Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

indicates the
number of scores that fall below the upper limit of each
interval.

A

Cumulative Frequency Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

–indicates the
percentage of scores that fall below the upper limit of
each interval.

A

Cumulative Percentage Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is this symbol?

f/N

A

Relative Frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

frequency of interval + frequencies of all class intervals below it.

A

Cumulative Frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is this formula?

cum f / N x 100

A

cumulative percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

_____are very important in data analysis, because
they allow us to examine the shape of the distribution of
a variable.

The shape is a pattern that forms when a _____ is
plotted and is known as the distribution.

A

histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

the normal distribution also known as the

A

Gaussian Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

_____ symmetrical and bell shaped. It
curves outwards at the top and then inwards nearer the
bottom, the tails getting thinner and thinner.

A

normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

is the data form a perfect normal distribution?

A

never but as long as the distribution is close to a normal
distribution, it will not matter too much.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
A very ___ of naturally occurring variables are normally distributed. A _____ of statistical tests make the assumption that the data form a normal distribution.
large number
25
don’t refer to the Normal Distribution as either of the following;
usual, regular, standard, or even distribution.
25
Wrong Shape Distributions can be of wrong shape for two reasons. First, because it is not symmetrical – Second, because it is not the characteristic bell shape
- SKEW - KURTOSIS
26
A non-symmetrical distribution is said to be _____.
SKEW
26
the curve rises rapidly and then drops off slowly.
positive skew
26
the curve rises slowly and then decreases rapidly.
negative skew
27
Skewness has some serious implications for some types of data analysis. Skew often happens because of ____ or _____
floor effect or ceiling effect
28
occurs when only few of the subjects are strong enough to get off the floor.
floor effect
29
causes negative skew and are much less common in Psychology. sometimes occur most commonly when we are trying to ask questions to measure the range of some variable, and the questions are all too easy, or too low down the scale.
ceiling effect
29
Much trickier than Skew but is usually less of a problem. Occurs when there are either too many people at the extremes of the scale, or not enough people at the extremes.
kurtosis
30
when there are insufficient people in the tail (ends) of the scores to make the distribution normal.
positive kurtosis
31
when there are too many people, too far away, in the tails of the distribution.
negative kurtosis
32
_____ is just a “posh” way of saying average. In some way refers to the most central value of a data set with different interpretations of the sense of “central”. Loosely known as the average. In statistical description, though, we have to be more precise about just what sort of average we mean.
central tendency
32
Small number of data points that lie outside the distribution when the distribution is approximately normal. Usually easily spotted in histograms. ______ are easy to spot but deciding what to do with them can be much trickier.
outliers
33
The mean is very sensitive to _____
extreme scores
33
Called the arithmetic mean. Calculated by adding up all the scores and dividing by the number of individual scores. Equation: (?) = ∑x / N
Mean
33
Under most circumstances, of the measures used for central tendency, the mean is least subject to ______
sampling variation
33
For statistics to be correct, we need to make some _____
assumptions
34
The sum of the squared deviations of all the scores about their mean is a ______
minimum
35
the _____ is equal to the sum of the mean of each group times the number of scores in the group, divided by the sum of the number of scores in each group.
overall mean
36
Second most common measure of central tendency. It is the middle score in a set of scores. Used when the mean is not valid, which might be because the data are not symmetrically or normally distributed, or because the data are measured in an ordinal level.
Median
37
The median is _____ than the mean to extreme scores.
less sensitive
38
The most frequent score in the distribution or the most common observation among a group of scores. Best measure of central tendency for CATEGORICAL data (although it is not even very useful for that) Rarely used in research.
mode
39
In a frequency distribution it is very easy to see because it is the _______ of the distribution. The problem with it is it does not tell us very much.
highest point
40
The _____ is the simplest measure of dispersion. It is the distance between the highest score and the lowest score. It can be expressed as a single number, or sometimes it is expressed as the highest and lowest scores.
range
41
To find the range we find the lowest value (2) and the highest value (17). Sometimes the range is expressed as a single figure, calculated as:
Range = Highest Value – Lowest Value
42
Used with ordinal data or with non-normal distributions. If median is used as a measure of central tendency, the ___ is probably used as a measure of dispersion. It is the distance between the upper and lower quartiles.
inter-quartile-range
43
There are ____ quartiles in a variable – they are the____ values that divide the variable into four groups.
three
44
The ____ quartile happens one quarter of the way up the data, which is also the 25th centile.
1st quartile
45
The _____ quartile is the half-way point, which is the median, and is also the 50th centile.
2nd quartile
46
The ____ quartile is the three-quarter-way point or the 75th centile.
third quartile
47
symbol s
sample standard deviation
47
______ is like the mean, in that it takes all of the values in the dataset into account when it is calculated. It is also like the mean in that it needs to make some assumptions about the shape of the distribution. To calculate the _____, we must assume that we have a normal distribution.
Standard Deviation
48
symbol σ
population standard deviation
49
the _____ of a set of scores is just the square of the standard deviation
variance
50
the variance is not used much in descriptive statistics because it gives us squared units of measurement. however, it is used quite frequently in ___________
inferential statistics
51
1. the SD gives us a measure of dispersion relative to the mean 2. the SD is sensitive to each score in the distribution 3. like the mean, the SD is stable with regard to sampling fluctuations
properties of the standard deviation
52
population standard deviation
boxplot or box and whisker plot