Chp2 Stats Flashcards

(51 cards)

1
Q

What is a random variable

A

It is a variable whose possible values are drawn from the outcome of a random phenomenonR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random variable examples

A

Tossing a coin, Tossing a die

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two types of R.V.

A

Discrete, Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we assume about the observed data

A

It is a random sample where each sample is drawn from X where each xi is independently and identically distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete

A

Takes on a countable number of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous

A

Takes on an infinite number of possible values within a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability Mass Function

A

For discrete variables,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probability Density Function

A

For continuous variables,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kernel density estimation

A

A statistical technique that smooths out data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures of central tendency

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mean

A

Average of all data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Robustness

A

The tendency to not be affected by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is the mean robust?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to obtain a robust mean

A

Trimmed mean, which occurs after extreme values on either side are discarded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Median

A

The middle value when the data points are arranged in order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is the median robust?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mode

A

The most frequent occurring value in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Is mode a useful measure of central tendency?

A

May not be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When is robustness important?

A

When your data might contain anomalies or extreme values that could distort the overall analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Measures of dispersion

A

Variance
Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variance

A

A measure of how much the values of X deviate from the expected (mean) value of X – measure of dispersion

22
Q

Sample standard deviation

A

The squared root of sample variance

23
Q

What does standard deviation tell you

A

It directly tells you how much, on average, each data point deviates from the mean – just makes number small

24
Q

bi-variate/multi-variate analysis

A

Can consider multiple vectors, as oppose to just 1 with varaince/std

25
What does bi-variate analysis try to understand
The association or dependence on X1 and X2
26
How to calculate mean and variance (first and second moment) in multivariate?
Same as normal, but return a vector instead of a single value
27
How to get total variance for multivariates
Sum all individual variances in the output vector
28
Covariance
Measure of the association or linear dependence between two variables
29
How to summarize covariance information for n attributes
nxn covariance matrix
30
Main diagonal of the matrix
Holds the variance of the column with itself
31
Is covariance matrix symmetric?
Yes
32
Correlation between two variable
The standardized covariance obtained by normalizing the covariance with the std of each variabl
33
Which is dimensionless and which is in units obtained by multiplying the two variables
Correlation is dimensionless Covariance is in units obtained by multiplying the two variables
34
Range of covariance
-inf, + inf
35
Range of correlation
-1, 1
36
what does correlation of 1 mean?
As one variable increases so does the other
37
Collinearity
Occurs when the two variables are so highly correlated that we can use one to predict another ; one variable is a linear combination of the other variable
38
Normal/Gaussian Distribution
Parameterized by mean and std mean = median = mode
39
std decreases what happens to normal/gaussian distribution
Becomes steep and short
40
Binomial distribution
Parameterized by n (number of trials) and p (probability of success in each trial) mean: np Median: [np] Variance: np(1-p)
41
Power-law distribution
Long tailed distributions, Relationships where one quantity varies as a power of another Hard to define
42
Power law distribution example
Area of square, quadruples when length is doubled
43
Visualization is
Important
44
XY plots
Scatter plots, birds eye view of how your data is distributed
45
Boxplots
Whisker plots. Maximum, 3rd quartile, median, first quartile, minimum max and min are outliers
46
Short rectangle in box plot means
data is similar
47
Long whiskers
High std and variance
48
Empirical cumulative distribution function
CDF(y) of a dataset X at a value y is the ration of samples that are lower that the value y.
49
what is cdf (X,15) X= [2, 7, 8, 9, 10, 15, 16, 20]
CDF(X, 15) = 6/8 = 0.75
50
CDF PDF relation
PDF is derivative of CDF
51