Topic 1 Descriptive Statistics Flashcards

1
Q

Data are time-consuming, ____, and of varying quality.

A

commercially sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A data set of size __ is denoted as: {xi}ᵢ₌₁,…,ₙ

A

n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the difference between a dot plot and a histogram?

A

Dot plots show individual data points (multi-frequency data set) while histograms group data into bins and show frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What affects the appearance of a histogram the most?
A. Sample size
B. Axis label
C. Bin size
D. Title

A

C. Bin size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relative frequency = frequency / ____

A

total number of data points (n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is plotted on the vertical axis of a histogram?

A

The absolute (or relative) frequency is plotted on the
vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does F(x) in a CDF plot represent?

A

The relative frequency of data ≤ x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the downside of using histograms?

A

Downside is that histogram differs depending on bin size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

F(x) =

A
  • 0 if x < x₍₁₎
  • j/n if xⱼ ≤ x < xⱼ₊₁
  • 1 if x ≥ x₍ₙ₎
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Formula for Arithmetic Mean

A

x̄ = (1/n) ∑xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which measure is affected most by outliers?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which is not a measure of central tendency?
A. Median
B. Mode
C. Range
D. Mean

A

C. Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The geometric mean is only used for ____ data.

A

non-zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does interquartile range (IQR) measure?

A

The spread between Q3 and Q1 (middle 50% of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Formula for Sample variance (unbiased)

A

s² = (1/(n−1)) ∑(xi − x̄)²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Formula for Sample Standard Deviation (unbiased)

A

s = √(s²) = √[(1/(n−1)) ∑(xi − x̄)²]

17
Q

What measures asymmetry?

18
Q

The coefficient of variation is given by: vx = ____ / x̄

A

standard deviation (sx)

19
Q

Why is the mean absolute deviation more robust than standard deviation?

A

It is less influenced by outliers

20
Q

What does a positive skew indicate about the data?

A

Mean > Median > Mode

21
Q

Formula
Biased skewness:

A

g₁(x) = (1/n) ∑(xi − x̄)³ / ωₓ³

22
Q

Skewness is a ____ quantity (unit-less).

A

non-dimensional

23
Q

What is the median if n is odd?

A

median(x) = x₍ₙ₊½₎

24
Q

What is the mode?

A

The most common value

25
Formula for geometric mean:
x* = (∏ⁿᵢ₌₁ xᵢ)¹⁄ⁿ
26
What is used to test if two datasets have a linear relationship?
Covariance and correlation coefficient
27
Formula Sample covariance:
cov(x,y) = (1/(n−1)) ∑(xi − x̄)(yi − ȳ)
28
Formula Correlation coefficient:
cₓᵧ = (1/(n−1)) ∑(xi − x̄)(yi − ȳ) / (sx·sy)
29
A correlation coefficient of 0 means ____ correlation.
no
30
What is the range of correlation coefficient cₓᵧ? A. [−2, 2] B. [0, 1] C. [−1, 1] D. [−∞, ∞]
C. [−1, 1]
31
What does a negative skew indicate about the data?
mode > median > mean
32
What does a symmetric skew indicate about the data?
mode= median= mean
33
The quantile value for the i-th data point is given by:
yᵢ = (i − 0.5)/n
34
To determine the percentile qₚ(x), which condition must j satisfy? A. j < p B. j − 0.5/n < p/100 ≤ j + 0.5/n C. j/n > p D. j = p + 0.5
B. j − 0.5/n < p/100 ≤ j + 0.5/n
35
To estimate qₚ(x), take the mean of data values at positions xⱼ and ____.
xⱼ₊₁