Stats for Data Science Flashcards

(60 cards)

1
Q

Descriptive Analytics

A

Leveraging historical data to determine “What” happened.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predictive Analytics

A

Leveraging historical data to determine “What will” happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prescriptive Analytics

A

Based on information gained from predictive analytics, the information is used to determine “What will we do”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Probability

A

The measure of the likelihood that an event will occur based on a random experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Complement

A

P(A) + P(A’) = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Intersection

A

P(A∩B)=P(A)P(B) Set off all elements that are members of both A and B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Union

A

P(A∪B)=P(A)+P(B)−P(A∩B) Set of all elements in the collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conditional Probability

A

P(A|B) is a measure of the probability of one event occurring with some relationship to one or more other events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Independent Events

A

Two events are independent if the occurrence of one does not affect the probability of occurrence of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mutually Exclusive Events

A

Two events are mutually exclusive if they cannot both occur at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bates’ Theorem

A

Bayes’ Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mean

A

The average of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Median

A

The middle value of an ordered dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mode

A

The most frequent value in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Skewness

A

A measure of symmetry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kurtosis

A

A measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Range

A

The difference between the highest and lowest value in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Interquartile Range

A

IQR = Q3−Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance

A

The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard Deviation

A

The standard difference between each data point and the mean and the square root of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Standard Error

A

An estimate of the standard deviation of the sampling distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Causality

A

Relationship between two events where one event is affected by the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Covariance

A

A quantitative measure of the joint variability between two or more variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Correlation

A

Measure the relationship between two variables and ranges from -1 to 1, the normalized version of covariance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Probability Mass Function
A function that gives the probability that a discrete random variable is exactly equal to some value.
26
Probability Density Function
A function for continuous data where the value at any given sample can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.
27
Cumulative Density Function
A function that gives the probability that a random variable is less than or equal to a certain value.
28
Uniform Distribution
Also called a rectangular distribution, is a probability distribution where all outcomes are equally likely.
29
Normal/Gaussian Distribution
The curve of the distribution is bell-shaped and symmetrical and is related to the Central Limit Theorem that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.
30
Central Limit Theorem
the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.
31
Exponential Distribution
A probability distribution of the time between the events in a Poisson point process.
32
Chi-Squared Distribution
The distribution of the sum of squared standard normal deviates.
33
Bernoulli Distribution
The distribution of a random variable which takes a single trial and only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with probability (1-p).
34
Binomial Distribution
The distribution of the number of successes in a sequence of n independent experiments, and each with only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with probability (1-p).
35
Poisson Distribution
The distribution that expresses the probability of a given number of events k occurring in a fixed interval of time if these events occur with a known constant average rate λ and independently of the time.
36
Null Hypothesis
A general statement that there is no relationship between two measured phenomena or no association among groups.
37
Alternative Hypothesis
Contrary to the null hypothesis.
38
Type 1 Error
rejection of a true null hypothesis.
39
Type 2 Error
the non-rejection of a false null hypothesis.
40
P-Value
When p-value > α, we fail to reject the null hypothesis, while p-value ≤ α, we reject the null hypothesis and we can conclude that we have the significant result.
41
Critical Value
A point on the scale of the test statistic beyond which we reject the null hypothesis, and, is derived from the level of significance α of the test.
42
Significance Level & Rejection Region
The rejection region is actually depended on the significance level. The significance level is denoted by α and is the probability of rejecting the null hypothesis if it is true.
43
Z-Score
finds the distance from the sample’s mean to an individual data point expressed in units of standard deviation. (Large sample size)
44
T - Score
A T-test is the statistical test if the population variance is unknown and the sample size is not large (n < 30).
45
Paired Sample
means that we collect data twice from the same group, person, item or thing.
46
Independent Sample
implies that the two samples must have come from two completely different populations.
47
1 -Way ANOVA
compare two means from tow independent group using only one independent variable.
48
2 -Way ANOVA
is the extension of one-way ANOVA using two independent variables to calculate main effect and interaction effect.
49
Chi -Square Goodness of Fit Test
determine if a sample matches the population fit one categorical variable to a distribution.
50
Chi -Square Test for Independence
compare two sets of data to see if there is a relationship.
51
Linear Regression
is a linear approach to modeling the relationship between a dependent variable and one independent variable.
52
Independent Variable
is the variable that is controlled in a scientific experiment to test the effects on the dependent variable.
53
Dependent Variable
is the variable being measured in a scientific experiment.
54
Multiple Linear Regression
is a linear approach to modeling the relationship between a dependent variable and two or more independent variables.
55
Linear Regression: Step#1
Understand the model description, causality and directionality
56
Linear Regression: Step#2
Check the data, categorical data, missing data and outliers
57
Linear Regression: Step#3
Simple Analysis — Check the effect comparing between dependent variable to independent variable and independent variable to independent variable
58
Linear Regression: Step#4
Multiple Linear Regression — Check the model and the correct variables
59
Linear Regression: Step#5
Residual Analysis: Check normal distribution and normality for the residuals.
60
Linear Regression: Step#5
Interpretation of Regression Output: R-Squared is a statistical measure of fit that indicates how much variation of a dependent variable is explained by the independent variables. Higher R-Squared value represents smaller differences between the observed data and fitted values. * P - Value * Regression Equation