BA 1 - Describing and Summarizing Data Flashcards

1
Q

Axes of a histogram?

A

X-axis - bins corresponding to ranges of data;

Y-axis - frequency of observations falling into each bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s an outlier?

A

An outlier is a value that falls far from the rest of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you examine the validity of an outlier?

A

i. Check if it’s valid, though unusual;
ii. Check for a data entry error; and
iii. Check if it was collected under different circumstances than the rest of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do you do about an outlier?

A

Leave it; change it to its corrected value; or in extreme cases, delete it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Skewness

A

Skewness measures the degree of a graph’s asymmetry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are descriptive statistics?

A

Summary measures that provide an overview of the data set without showing every data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean

A

Sum of all data points divided by the number of data points

The mean is affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Median

A

Middle value of the data set; i.e. 50th percentile.

When the number of values is even, it’s the average of the middle two values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mode

A

Value that occurs most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Conditional mean

A

The mean of a subset of the data that includes all values satisfying a certain condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Percentile value

A

Value beneath which a certain percentage of the data lie

i.e. 25th percentile is the smallest value that is greater than or equal to 25% of the data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Range

A

Maximum value - Minimum value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relationship between standard deviation and variance?

A

SD = square root (Variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does variance measure?

A

Variance is a measure of how far each point is from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between populate and sample variance/sd?

A

For population, denominator is N; for sample, denominator is ‘n-1’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Coefficient of Variation

A

SD/mean
Measures the standard deviation relative to the size of the mean.
It’s useful for comparing variation in different data sets.

17
Q

Kurtosis

A

Measure of flatness or sharpness of a distribution.

Low kurtosis => flat distribution

18
Q

[EXCEL] Mean

A

=AVERAGE(range)

19
Q

[EXCEL] Median

A

=MEDIAN(range)

20
Q

[EXCEL] Mode

A

=MODE.SNGL(range)

21
Q

[EXCEL] Conditional Mean

A

=AVERAGEIF(range,criteria,range)

22
Q

[EXCEL] Percentile

A

=PERCENTILE.INC(range,k)

33rd percentile => k = 0.33

23
Q

[EXCEL] Variance

A

=VAR.S(range)

24
Q

[EXCEL] Standard deviation

A

=STDEV.S(range)

25
[EXCEL] Square root
=SQRT(number)
26
[EXCEL] Number of values
=COUNT(range)
27
[EXCEL] Range
=MAX(range) - =MIN(range)
28
[EXCEL] Total
=SUM(range)
29
[EXCEL] Correlation
=CORREL(range1,range2)
30
Scatter plot
Shows the relationship between two variables as a visualization, though we cannot assume causal link.
31
Correlation
- Measure that quantifies the strength of a linear relationship between two variables. - Range - +1, -1. 0 = no linear relationship - Does not imply causation - Strongly influenced by outliers
32
Hidden variables
Variable that is correlated with each of the two variables that are not fundamentally related to each other.
33
Mediating variable
Variable which is affected by one variable, and in turn affects another.
34
Time series
When one of the variables is time, the relationship is known as a time series
35
Cross-sectional data
Cross-sectional data provide a snapshot of data across multiple groups at a given point in time.