Probability Basics Flashcards
(45 cards)
What is Independent
each event is not affected by other events
Probability of an event happening =
Number of ways it can happen
/ Total number of outcomes
What is Dependent
also called “Conditional”, where an event is affected by other events
A and B = A and (A l B)
“Probability of event A and event B equals
the probability of event A times the probability of event B given event A”
What is Mutually Exclusive
events can’t happen at the same time
Permutations with Repetition
n^r
where n is the number of things to choose from,
and we choose r of them,
repetition is allowed,
and order matters.
Combination
When the order doesn’t matter
Permutation
When the order does matter
Permutations without Repetition
n!
/ (n − r)!
where n is the number of things to choose from,
and we choose r of them,
no repetitions,
order matters.
Combinations without Repetition
n!/(r!(n-r)!)
It is often called "n choose r" where n is the number of things to choose from, and we choose r of them, no repetition, order doesn't matter.
Combinations with Repetition
(r+n-1)!/ (r!(n-1)!)
where n is the number of things to choose from,
and we choose r of them
repetition allowed,
order doesn’t matter.
This is the same as a combination without repetition where n = r + n - 1
Standard Deviation definition (not formula!!)
The formula is easy: it is the square root of the Variance. So now you ask, “What is the Variance?”
“What is the Variance?”
The average of the squared differences from the Mean.
Var(X) = Σx^2p − μ^2
To calculate the Variance:
square each value and multiply by its probability
sum them up and we get Σx^2p
then subtract the square of the Expected Value μ^2
μ = expected value = Σxp
Formula for The “Population Standard Deviation”:
and
The “Sample Standard Deviation”:
Population Standard Deviation
square root of [ (1/N) Σ of (x - mu)^2 ]
Sample Standard Deviation”
square root of [ (1/(N-1)) Σ of (x - x^bar)^2 ]
Weighted mean standard deviation
Square root of [Σ(x^2 (p) − μ^2)]
μ = expected value = Σxp
What is The “Bell Curve” or a Normal Distribution.
The Normal Distribution has: 1. mean = median = mode 2. symmetry about the center 3. 50% of values less than the mean and 50% greater than the mean
4. Standard deviations: 68% of values are within 1 standard deviation of the mean 95% of values are within 2 standard deviations of the mean 99.7% of values are within 3 standard deviations of the mean
“Standard Score”, “sigma” or “z-score”.
The number of standard deviations from the mean
z = (x − μ) / σ
z is the “z-score” (Standard Score)
x is the value to be standardized
μ (‘mu”) is the mean
σ (“sigma”) is the standard deviation
Correlation
When two sets of data are strongly linked together we say they have a High Correlation.
Correlation can have a value:
1 is a perfect positive correlation
0 is no correlation (the values don’t seem linked at all)
-1 is a perfect negative correlation
“Correlation Is Not Causation” - 4 reasons why
What it really means is that a correlation does not prove one thing causes the other:
One thing might cause the other??
The other might cause the first to happen -simultaneous reverse dependence
They may be linked by a different thing -hidden 3rd variable
Or it could be random chance! -spurious
Pearson’s Correlation formula
Step 1: Find the mean of x, and the mean of y
Step 2: Subtract the mean of x from every x value (call them “a”), and subtract the mean of y from every y value (call them “b”)
Step 3: Calculate: ab, a2 and b2 for every value
Step 4: Sum up ab, sum up a2 and sum up b2
Step 5: Divide the sum of ab by the square root of [(sum of a2) × (sum of b2)]
r = ( n Σ xy - Σx Σy ) / (square root( nΣx^2) - square root(Σx)^2) * (n Σy^2 - (Σy)^2) )
Bayes Theorem
“AB AB AB” then remember to group it like: “AB = A * BA / B”
P(A|B) = P(A) * P(B|A) / P(B)
Which tells us: how often A happens given that B happens, written P(A|B),
When we know: how often B happens given that A happens, written P(B|A)
and how likely A is on its own, written P(A)
and how likely B is on its own, written P(B)
Bayes Theorem applied to false positives and false negatives
P(A|B) =
P(A)P(B|A) /
( P(A)P(B|A) + P(not A)P(B|not A) )
Tp / (tp + fp)
Birthday statistics formula
chance of n= people having the same r= random choices
R! / (R^n (r-n)!)
Then 1 - The result of this equation
The closer that n comes to r, the closer that the probability of N choose R will have a match comes to 100%
Confidence intervals (integrations of total) for a standard distribution
Conf Interval Z 68% 1.0 80% 1.282 85% 1.440 90% 1.645 95% 1.960 99% 2.576 99.5% 2.807 99.9% 3.291
How to calculate our Confidence interval (AKA Margin of Error) if we know our chosen z value.
use that Z value in this formula for the Confidence Interval
X ± Z * (s/√n)
Where:
• X is the mean
• Z is the chosen Z-value from the table in previous card
• s is the standard deviation
• n is the number of observations
What is the formula for the confidence interval
The Confidence Interval is based on Mean and Standard Deviation. Its formula is:
X ± Z s√n
Where:
X is the mean
Z is the Z-value from the table below
s is the standard deviation
n is the number of observations
formula for Chi-Square:
Χ^2 = Σ { (O − E)^2 / E }
Σ means to sum up (see Sigma Notation)
O = each Observed (actual) value
E = each Expected value
So we calculate (O−E)2E for each pair of observed and expected values then sum them all up.