BA 1 - Describing and Summarizing Data Flashcards

1
Q

Axes of a histogram?

A

X-axis - bins corresponding to ranges of data;

Y-axis - frequency of observations falling into each bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s an outlier?

A

An outlier is a value that falls far from the rest of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you examine the validity of an outlier?

A

i. Check if it’s valid, though unusual;
ii. Check for a data entry error; and
iii. Check if it was collected under different circumstances than the rest of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do you do about an outlier?

A

Leave it; change it to its corrected value; or in extreme cases, delete it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Skewness

A

Skewness measures the degree of a graph’s asymmetry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are descriptive statistics?

A

Summary measures that provide an overview of the data set without showing every data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean

A

Sum of all data points divided by the number of data points

The mean is affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Median

A

Middle value of the data set; i.e. 50th percentile.

When the number of values is even, it’s the average of the middle two values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mode

A

Value that occurs most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Conditional mean

A

The mean of a subset of the data that includes all values satisfying a certain condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Percentile value

A

Value beneath which a certain percentage of the data lie

i.e. 25th percentile is the smallest value that is greater than or equal to 25% of the data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Range

A

Maximum value - Minimum value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relationship between standard deviation and variance?

A

SD = square root (Variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does variance measure?

A

Variance is a measure of how far each point is from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between populate and sample variance/sd?

A

For population, denominator is N; for sample, denominator is ‘n-1’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Coefficient of Variation

A

SD/mean
Measures the standard deviation relative to the size of the mean.
It’s useful for comparing variation in different data sets.

17
Q

Kurtosis

A

Measure of flatness or sharpness of a distribution.

Low kurtosis => flat distribution

18
Q

[EXCEL] Mean

A

=AVERAGE(range)

19
Q

[EXCEL] Median

A

=MEDIAN(range)

20
Q

[EXCEL] Mode

A

=MODE.SNGL(range)

21
Q

[EXCEL] Conditional Mean

A

=AVERAGEIF(range,criteria,range)

22
Q

[EXCEL] Percentile

A

=PERCENTILE.INC(range,k)

33rd percentile => k = 0.33

23
Q

[EXCEL] Variance

A

=VAR.S(range)

24
Q

[EXCEL] Standard deviation

A

=STDEV.S(range)

25
Q

[EXCEL] Square root

A

=SQRT(number)

26
Q

[EXCEL] Number of values

A

=COUNT(range)

27
Q

[EXCEL] Range

A

=MAX(range) - =MIN(range)

28
Q

[EXCEL] Total

A

=SUM(range)

29
Q

[EXCEL] Correlation

A

=CORREL(range1,range2)

30
Q

Scatter plot

A

Shows the relationship between two variables as a visualization, though we cannot assume causal link.

31
Q

Correlation

A
  • Measure that quantifies the strength of a linear relationship between two variables.
  • Range - +1, -1. 0 = no linear relationship
  • Does not imply causation
  • Strongly influenced by outliers
32
Q

Hidden variables

A

Variable that is correlated with each of the two variables that are not fundamentally related to each other.

33
Q

Mediating variable

A

Variable which is affected by one variable, and in turn affects another.

34
Q

Time series

A

When one of the variables is time, the relationship is known as a time series

35
Q

Cross-sectional data

A

Cross-sectional data provide a snapshot of data across multiple groups at a given point in time.