Statistics and Distributions Flashcards

(34 cards)

1
Q

Distributions

A
  • Representation of the way values tend to vary across a single attribute
  • Usually presented as a histogram
  • Where is the data concentrated? Which values are less likely? Which is most likely?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which single value best represents the data?

A

Central Tendency
Context dependent
- On a histogram: affects the location on the x-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mean

A

arithmetic mean:
sum of values/number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median

A

Middle value of sorted data
- Resistant to outliers and skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variability

A

How far does the data spread away from the mean?
Affects the width of the histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard Deviation

A

This is the average distance from the mean
If we pick a random value from the data, how far should we expect it to be from the mean?

sd = sqrt(sum(x-mu)^2 / N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Percentiles and Quartiles

A

25th Percentile : 1st Quartile
50th Percentile : 2nd Quartile
75th Percentile : 3rd Quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

IQR and Outliers

A

Interquartile Range : Q3-Q1
Lower/Upper Fences: [Q1 - (3/2) * IQR, Q3 - (3/2) * IQR]
Outlier: A value that falls outside of the fences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Boxplots

A

Excellent tool to display and compare measures of variability

They display:
- Median
- IQR
- Fences
- Outliers
- Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal Distribution

A
  • Gaussian Distribution or Bell Curve
    Fundamental to statistics
    Countless occurrences in nature
    Has a number of useful properties
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normal Distribution Properties

A
  1. Symmetric
    Mean = Median = Mode
  2. 68-95-99 Rule
  3. Foundation of the Central Limit Theorem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random Experiment

A

A process that results in an outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outcome

A

The value of the result of a single experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample Space

A

The set of all possible outcomes for an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Event

A

A subset of the sample space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Probability

A

A number between 0 and 1 that dictates the chance of an event occurring

17
Q

Sample Space

A

A sample space of an experiment is the set of all possible outcomes
Ex: Sample Space of a single die roll is: {1, 2, 3, 4, 5, 6}

18
Q

Event

A

An event usually denoted by a single capital letter, is a subset of the sample space.
Ex: If you roll two dice, some possible events include:
- (1,1), (1,2), (2,1), (1,6), (6,6)

19
Q

Probability

A

For a single event A, the probability of A occurring, P(A), is denoted as:

P(A) = number of outcomes in which A occurs/ total possible outcomes

20
Q

Addition Rule

A

Addition Rule states:
P(A or B) = P(A) + P(A) - P(A and B)

21
Q

Multiplication Rule

A

Two events are said to be independent if the outcome of one does not depend on the outcome of the other. Otherwise, they are dependent.

The multiplication rule states:
P(A and B) = P(A) * P(B, given that A occurred) = P(A) * P(B|A)

For independent events, this is simply:
P(A and B) = P(A) * P(B)

22
Q

Complements

A

P(A) + P(not A) = 1

23
Q

Deterministic Sampling

A

Rather than randomizing, you take the first people that walk by or choose the people deterministically

24
Q

Uniform Random Sampling

A

Use software to assign and pick off an n’th group of people to choose

25
Random Sampling
Randomly select
26
From random sampling, what do we know about the sample mean?
The sample mean is the mean of the data sampled, and approximates the true mean.
27
Probability Distribution
the calculated likelihood of each possible event occurring without simulation or conducting the experiment
28
Empirical Distribution
the proportion of times a value is observed in a simulation or experiment, relative to the number of possible values
29
Law of large numbers
As our sample size grows larger, the data represents the population more accurately
30
statistic
a calculated number which describes a characteristic of a sample
31
parameter
value that estimates a characteristic of a population
32
statistical inference
a conclusion made based on data from multiple random samples.
33
Central Limit Theorem
This theorem states: Upon taking sufficiently large samples, the distribution of the sample means will approximate a normal distribution, regardless of the distribution sampled from.
34