Organizing, Visualizing, and Describing Data Flashcards

1
Q

Continuous data

A
  • can take on any numerical value in a specified range of values
  • ex. future value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete data

A
  • number has a limited number of values.

- ex. monthly = 12, quarterly = 4, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nominal data (2)

A

AKA quantitative data

  • continuous
  • discrete
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical data (2)

A

aka qualitative data
- describe a quality or characteristic of a group of observations

  • nominal data
  • ordinal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nominal data

A
  • grouping names
  • cannot be organized in a logical order
  • ex. classifying stocks into different sectors, such as energy, information tech, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinal data

A
  • can be organized in logical order or ranked
    ex. rating of mutual funds with the worst performance
  • there is an order, but can’t distinguish values of magnitude
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Time-series data

A
  • observations of 1 subject taken at specif and equal spaced intervals of time
    ex quarterly returns of Apple 2019-2020
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cross-sectional data

A
  • observations of multiple subjects taken at specific points in time
    ex. 2019 Q1 quarterly returns of a group of simial stocks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Panel data

A
  • presented as a table
  • groups observations through time on one or more variables for multiple subjects
  • quarterly returns for MSFT, Orcal, and HP from 2019 - 2020
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One-Dimensional array

A
  • one row of data

- a single variable - closing price of a stock on x day

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two-dimensional array

A
  • consists of columns and rows to hold multiple variables and multiple observations
  • a firm’s quarterly revenue, EPS, and DPS for past two years
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Tree-map

A
  • graphical tool to display categorical data

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Arithmetic Mean

A
  • simple mean
  • the center of gravity of a data set
  • sensitive to extreme values (outliers)
  • appropriate for forecasting single period returns and expected returns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample mean

A
  • arithmetic mean of a sample
  • ^x sample mean
  • mue (^m) population mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Winsorized mean

A
  • a way of dealing with outliers

- a 95% winsorized mean takes the bottom 2.5% off and the top 2.5% off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Median of even number of observations

A
  • n = 4
  • 3, 9, 10, 20: take value 2&3 and add then / 2
  • (9+10)/2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Geometric Mean

A
  • used to calculate the average return of an investment
  • represents the growth rate of an investment
  • represents the compound rate of return of an investment
  • appropriate to measure past performance over multiple periods
    = [(1+r)(1+r2)(1+rn)]^1/n -1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Harmonic mean

A
  • used to find average purchase price for equal periodic investments

= n / sum of 1/xi

3 years / (1/$10) + (1 / $15) + (1/$20) = $13.85

19
Q

Relationship of Geometric mean to the arithmetic mean

A

-geo mean will always be less than arithmetic mean

20
Q

Quantiles:
quartiles, quintiles, deciles, percentiles
formula for the position of a percentile in a data set

A

4 quarters, 5 quintiles, 10 deciles, hundredths

  • arrange data in ascending order (low to high)
    = Ly = (n + 1) * (y / 100)
21
Q

When to use each mean:

a. Arithmetic mean
b. Geometric mean
c. Weighted mean
d. Harmonic mean
e. Trimmed mean
f. Winsorized mean

A

a. with single period or cross-sectional data
b. with time-series data
c. when different observations have different weights
d. find avg purchase price for equal periodic investments
e. when data has extreme outliers
f. when the data has extreme outliers

22
Q

Interquartile range:

A
  • the difference between the third and first quartiles
23
Q

List the Measures of Dispersion

A
  • range
  • Mean Absolute Deviation
  • Variance (population, sample)
  • Standard Deviation (population, sample)
24
Q

Range formula

A

= max value - min value

25
Q

Mean Absolute Deviation formula

A

= |xi - ^x| / n

  • calculate the mean, then - each value from ^x. Total the absolute deviations and / n
26
Q

Variance

A
  • population O^2
  • Sample S^2
  • “the average of the squared deviations around the mean”
  • use cal function: 2nd, data, to solve.
  • need to ^2 either the sample of population deviation
27
Q

Standard deviation

A
  • “the positive square root of the variance”
  • Population O
  • Sample S
  • use cal function: 2nd, data, to solve.
28
Q

Target deviation / target semideviation def

A
  • the risk of being below a given target

- only includes the observations below the target (B)

29
Q

Target deviation / target semideviation formula

A

= sqrt root (sum squared deviations below the target / n-1)

  • sqrt root ((xi - B)^2 / (n-1))
  • ie if the target is 4%. find all observations < 4. Subtract observation from 4% and ^2. Sum all observations and / n - 1 then sqrt root
30
Q

Coefficient of Variation def

A
  • expresses how much dispersion exists relative to the mean
  • used in investment analysis to compare relative risk
  • lower value is less risky
31
Q

Coefficient of Variation formula

A

= CV = S / ^x

  • sample standard deviation / sample mean
32
Q

Properties of a Normal Distribution

A
  • “symmetrical distribution”
  • mean = median = mode
  • completely described by the mean and variance
  • skewness = 0
  • Kurtosis = 3 (excess kurtosis = 0)
33
Q

Properties of a Positively skewed distribution

A
  • has a long tail on the right side, peak to the left
  • limited but frequent downside returns and unlimited but less frequent upside returns
  • ie buying calls
  • mean > median > mode
  • positive skewness (>0)
  • visually, the Mode is at the peak, the median is to its right and down the slope, the mean is further down the slope to the right
34
Q

Properties of a Negatively skewed distribution

A
  • has a long tail on the left side, peak to the right
  • limited but frequent upside returns and unlimited but less frequent downside returns
  • ie selling puts
  • mean < median < mode
  • negative skewness (<0)
  • visually, the Mode is at the peak, the median is to its left and down the slope, the mean is further down the slope to the left
35
Q

List the different Kurtosis’

A
  • Leptokurtic
  • Platykurtic
  • Mesokurtic
36
Q

Properties of Leptokurtic

A
  • fatter tails
  • more peaked
  • excess kurtosis > 0
  • k > 3
  • probability of loss is higher
  • visually, the peak is higher and the tails come down steeper and go out further. thus there is more data in the tails (fatter)
37
Q

Properties of Platykurtic

A
  • thinner tails
  • less peaked
  • excess kurtosis < 0
  • K < 3
  • visually the peak is lower
38
Q

Properties of Mesokurtic

A
  • identical to a normal distribution

- K = 3

39
Q

Covariance def

A
  • a measure of how to variables move together
  • if positive, the 2 variables move up/down together
  • if negative, the 2 variables move in opposite directions
40
Q

Sample Covariance formula

A

= Cov = sum of (xi - ^x)*(yi - ^y) / n - 1

41
Q

Correlation def

A
  • a standardized measure of the linear relationship between to variables with values ranging between -1 and +1
  • ie the strength of the relationship
42
Q

Sample Correlation formula

A

= cov / sx * sy

cov of xy / sample stdv of x * sample stdv of y

43
Q

Spurious correlation

A
  • the correlation between town variables arising from their relation to a third variable.
  • ie shoe size and vocabulary of school children
  • the third variable is age