Basic Descriptive Data Analysis Flashcards

(39 cards)

1
Q

frequency distribution table

A

ranked order scores that shows the number of times each value occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

categorical data contains what in a frequency distribution table

A

raw and relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

continuous data contains what in a frequency distribution table

A

raw, relative, cumulative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

raw frequency

A

how many fall into that data, usually whole number

example: 5/27 people were age 0-10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

relative frequency

A

how does the number of data points relate to the entire sample, in %

example: 5/27 = 18.5 %

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

cumulative frequency

A

cumulative % up to indicated range you are looking at

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

class intervals (BIN)

A

defined range limits in which data is grouped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

categorical variables

A

separate due to lack of relationship to one another - ranking order tho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are two major statistical outputs of categorical data

A

frequency and percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what constitutes a normal distribution curve

A

bell shaped curve

symmetrical around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the statistical significance of normal distribution?

A

many datasets follow the bell shaped symmetrical around the mean shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does bimodal mean

A

bad, “two humps”

suggestive of 2 different populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

left skewed

A

negatively skewed

tail is to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

right skewed

A

positively skewed

tail is to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

stem and leaf plot

A

used with continuous data

good for showing individual data
bad for large amounts of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

histogram

A

continuous data

we can get distribution curves with a histogram
good for showing midpoint of data and large amounts of data
bad for showing individual data

17
Q

mean

A

equals sum of all values / total number of values

18
Q

when is mean commonly used

A

used for measuring central tendency

19
Q

when is mean less helpful

A

when outliers present or with skewed distribution

20
Q

median

A

equals value of middle of ranked data

21
Q

when is median most helpful

A

more helpful than mean when outliers present or with skewed distribution

22
Q

mode

A

equals value that occurs most often

less commonly used compared to mean and median

23
Q

what is the relationship between mean, median and mode with symmetrical data

A

mean = median = mode

24
Q

what is the relationship between mean, median and mode with right skewed data

A

mode < median

25
what is the relationship between mean, median and mode with left skewed data
mode > median
26
range
nominal and ordinal 2 extreme scores can be listed as interval, misleading value is outliers present
27
percentiles
describes a scores position within distribution
28
interquartile range
Q3 - Q1 more helpful than range when outliers present
29
Q1
value that occurs at the first quarter mark (25%)
30
Q2
value that occurs at the second quarter mark = median
31
Q3
value that occurs at the third quarter mark (75%)
32
box plot
summary in 5 numbers min, Q1, Q2/median, Q3, max
33
pros and cons of box plots
large number set do not keep exact values lose info within data
34
standard deviation
average absolute distance of each point from the mean
35
what is standard deviation helpful for
distinguishing statistically significant data points from random fluctuations
36
proportional area under a normal curve: 68 %
within 1 standard deviation from the mean
37
proportional area under a normal curve: 95 %
within 2 standard deviation from the mean
38
proportional area under a normal curve: 99.7 %
within 3 standard deviations from the mean
39
coefficient of variation
unitless measures that depicts the size of the SD relative to the mean especially helpful when comparing variation of greater than or equal to 2 variables measured in different units often expressed as a %