Probability Flashcards

(175 cards)

1
Q

What is a sample space and how do you write it?

A

The set of all possible outcomes, eg.
throwing two dice: Ω = {(i, j) : 1 ≤ i, j ≤ 6}
tossing a coin: Ω = {H, T}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a subset of Ω (sample space) called?

A

An event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When are two events disjoint?

A

A ∩ B = ∅

When they cannot both occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Stirling’s formula for the approximation of n!?

A

n! ∼()√2πn^(n+ 1/2)e^(−n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for the number of the arrangements of n objects, with repeats?
Eg,
a₁, …, a₁, a₂,…, a₂,…aₖ, …, aₖ
where a₁ is repeated m₁ times etc.

A

n!/(m₁!m₂!…m!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the multinomial coefficient?

A

The coefficient of a₁ᵐ¹….aₖᵐᵏ
in (a₁ + … + aₖ)^n where m1 + … + mk = n
nC(m₁m₂…mₖ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. How many distinct non-negative integer-valued solutions of the equation
    x₁ + x₂ + · · · + xₘ = n
    are there?
A

(n+m-1)Cn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Vandermonde’s identity?

A

For k, m, n ≥ 0
(m+n)Ck = ᵏΣⱼ₌₀(mCj)(nC(k-j))

mCj = 0 for j>m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prove Vandermonde’s identity

A

Suppose we choose a committee consisting of k people from a group of m men and n women.
There are (m+n)Ck
ways of doing this which is the left-hand side.
Now the number of men in the committee is some j ∈ {0, 1, . . . , k} and then it contains k − j women.
The number of ways of choosing the j men is mCj
and for each such choice there are nC(k-j)
choices for
the women who make up the rest of the committee. So there are mCj * nC(k-j)
committees with exactly j
men and summing over j we get that the total number of committees is given by the right-hand side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A probability space is a triple (Ω, F, P)
(Fancy F and P).
What do these symbols mean?

A
  1. Ω is the sample space
  2. F is a collection of subsets of Ω, called events, satisfying axioms F1–F3
  3. P is a probability measure, which is a function P : F → [0, 1] satisfying axioms P1–P3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the probability of the union of two disjoint events?

eg, P(A ∪ B)

A

P(A ∪ B) = P (A) + P (B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the axioms on F (a collection of subsets of Ω)?

A

F1: ∅ ∈ F.
F2: If A ∈ F, then also Aᶜ ∈ F.
F3: If {Ai, i ∈ I} is a finite or countably infinite collection of members of F, then ∪ᵢ∈ᵢ Aᵢ ∈ F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the axioms of P, where P is a function from F to R?

A
P1: For all A ∈ F, P(A) ≥ 0.
P2: P(Ω) = 1
P3: If {Ai, i ∈ I} is a finite or countably infinite collection of members of F, and Ai ∩ Aj = ∅ for
i ≠ j, then P(∪ᵢ∈ᵢ Aᵢ) = 
Σi∈I P(Ai)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When Ω is finite or countably infinite, what do we usually take F to be?

A

We normally

take F to be the set of all subsets of Ω (the power set of Ω)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
Suppose that (Ω, F, P) is a probability space and that A, B ∈ F
 If A ⊆ B then P (A)  ≤
A

A ⊆ B then P (A) ≤ P (B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Prove that P (A’) = 1 − P (A) using the probability axioms

A

Since A ∪ A’ = Ω and A ∩ A’ = ∅, by P3, P (Ω) = P (A) + P (A’). By P2, P (Ω) = 1 and so P(A) + P (A’) = 1, which entails the required result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Prove A ⊆ B then P (A) ≤ P (B) using the probability axioms

A

Since A ⊆ B, we have B = A ∪ (B ∩ A’). Since B ∩ Ac ⊆ A’, it must be disjoint from A. So by P3, P(B) = P(A) + P(B ∩ A’). Since by P1, P(B ∩ A’) ≥ 0, we thus have P (B) ≥ P(A)`

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Conditional Probability

What is the probability of A given B?

A

P(A|B) = P(A ∩ B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
Let (Ω, F, P) be a probability space and let B ∈ F satisfy P(B) > 0. Define a new
function Q : F → R by Q(A) = P(A|B)

Is (Ω, F, Q) a probability space?
Prove your result

A

Yes

Proof pg 12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When are events A and B independent?

A

Events A and B are independent if P(A ∩ B) = P(A)P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

More generally, a family of events A = {Aᵢ : i ∈ I} is independent if…

A

P(∩ᵢ∈ⱼ Aᵢ) = Πᵢ∈ⱼ P(Aᵢ)

for all finite subsets J of I

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When is a family of events pairwise independent?

A

A family A of events is pairwise independent if P(Aᵢ ∩ Aⱼ ) = P(Aᵢ)P(Aⱼ ) whenever i ≠ j.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Does Pairwise Independence imply independence?

A

NO!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Given A and B are independent, are A and B’, and A’ and B’ independent?

A
Both
A and B'
and
A' and B'
are independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Prove that A and B' are independent given A and B are independent
We have A = (A ∩ B) ∪ (A ∩ B'), where A ∩ B and A ∩ B' are disjoint, so using the independence of A and B, P(A ∩ B') = P (A) − P(A ∩ B) = P(A) − P(A) P(B) = P (A) (1 − P(B)) = P(A)P(B')
26
When is a family of events {B1, B2, . . .} a partition of Ω?
if 1. Ω = ∪ᵢ≥₁ Bᵢ (so that at least one Bi must happen), and 2. Bᵢ ∩ Bⱼ = ∅ whenever i ≠ j (so that no two can happen together)
27
What is the law of total probability/partition theorem?
Suppose {B1, B2, . . .} is a partition of Ω by sets from F, such that P (Bᵢ) > 0 for all i ≥ 1. Then for any A ∈ F P(A) = ᵢ≥₁ΣP(A|Bᵢ)P(Bᵢ)
28
Prove the partition theorem
P(A) = P(A ∩ (∪ᵢ≥₁Bᵢ)), since ∪ᵢ≥₁Bᵢ = Ω = P(∪ᵢ≥₁(A ∩ Bᵢ)) = ᵢ≥₁Σ P (A ∩ Bᵢ) by axiom P3, since A ∩ Bᵢ, i ≥ 1 are disjoint = ᵢ≥₁Σ P (A|Bᵢ)P(Bᵢ)
29
What is Bayes' Theorem?
Suppose that {B1, B2, . . .} is a partition of Ω by sets from F such that P (Bi) > 0 for all i ≥ 1. Then for any A ∈ F such that P (A) > 0 P(Bₖ|A) = P(A|Bₖ)P(Bₖ)/(ᵢ≥₁Σ P (A|Bᵢ)P(Bᵢ))
30
Prove Bayes' theory
We have P(Bₖ|A) = P(Bₖ ∩ A)/P(A) = P(A|Bₖ)P(Bₖ)/P(A) Now substitute for P(A) using the law of total probability
31
What is Simpson's paradox?
it consists of the fact that for events E, F, G, we can have P(E|F ∩ G) > P(E|F' ∩ G) P(E|F ∩ G') > P(E|F' ∩ G') and yet P(E|F) < P(E|F').
32
What is the multiplication rule? | Eg, P(A ∩ B) = ...
P(A ∩ B) = P(A|B) P(B) = P(B|A) P(A)
33
What is the generalisation of the multiplication rule for n events?
P (A1 ∩ A2 ∩ . . . ∩ An) = = P(A1) P(A2|A1). . . P(An|A1 ∩ A2 ∩ . . . ∩ An−1)
34
inclusion-exclusion formula | P (A1 ∪ A2 ∪ . . . ∪ An) = ⁿΣᵢ₌₁ P(Aᵢ) - ....
P (A1 ∪ A2 ∪ . . . ∪ An) = ⁿΣᵢ₌₁ P(Aᵢ) - Σᵢ>ⱼ P(Ai ∩ Aj) + ... + (-1)ⁿ⁺¹P(A1 ∩ A2 ∩ . . . ∩ An)
35
What is a discrete random variable?
``` A discrete random variable X on a probability space (Ω, F, P) is a function X : Ω → R such that (a) {ω ∈ Ω : X(ω) = x} ∈ F for each x ∈ R, (b) ImX := {X(ω) : ω ∈ Ω} is a finite or countable subset of R ```
36
What is the more common/shorter way of writing P({ω ∈ Ω : X(ω) = x})?
P(X = x)
37
How is the probability mass function defined?
The probability mass function (p.m.f.) of X is the function pₓ : R → [0, 1] defined by pₓ(x) = P(X = x)
38
What is the pmf when x ≠ ImX?
If x ≠ ImX X (that is, X(ω) never equals x) then pₓ(x) = P ({ω : X(ω) = x}) = P (∅) = 0.
39
What does Σₓ∈ᵢₘₓ pₓ(x) = ? | why?
ₓ∈ᵢₘₓΣ pₓ(x) = ₓ∈ᵢₘₓΣ P ({ω : X(ω) = x}) =P(ₓ∈ᵢₘₓ ∪ {ω : X(ω) = x}) since the events are disjoint = P (Ω) since every ω ∈ Ω gets mapped somewhere in ImX = 1
40
X has the Bernoulli distribution with parameter p (where 0 ≤ p ≤ 1) if...
P(X = 0) = 1 − p, P(X = 1) = p
41
X has a binomial distribution with parameters n and p (where n is a positive integer and p ∈ [0, 1]) if...
P (X = k) = nCk p^k (1-p)^n-k
42
If X has the Bernoulli distribution, how do we write this?
X ∼ Ber(p)
43
If X has the binomial distribution, how do we write this?
X ∼ Bin(n, p)
44
If X has the geometric distribution, how do we write this?
X ∼ Geom(p)
45
If X has the Poisson distribution, how do we write this?
X ∼ Po(λ)
46
X has a geometric distribution with parameter p if....
``` P(X = k) = p(1 − p)^k-1, k = 1, 2, .... ```
47
What can the geometric distribution model?
We can use X to model the number of independent trials needed until we see the first success, where p is the probability of success on a single trial
48
If you want to use the geometric distribution to model he number of failures before the first success, which formula do you use?
P (Y = k) = p(1 − p)^k, | k = 0, 1, ...
49
X has the Poisson distribution with parameter λ ≥ 0 if...
P (X = k) = ( λ^k e^-λ) /k!, k = 0, 1, ...
50
Define the expectation of X
``` The expectation (or expected value or mean) of X is E[X] = ₓ∈ᵢₘₓΣ xP(X=x) provided that ₓ∈ᵢₘₓΣ |x|P(X=x) < ∞ ```
51
What is the expectation of the Poisson distribution?
λ
52
What is the expectation of the Geometric distribution?
1/p
53
What is the expectation of the Binomial distribution?
np
54
What is the expectation of the Bernoulli distribution?
p
55
Let h : R → R | If X is a discrete random variable, is Y = h(X) also a discrete random variable?
Yes
56
If h : R → R, then | E [h(X)] = ....
E [h(X)] = ₓ∈ᵢₘₓΣ h(x)P (X = x) | provided that ₓ∈ᵢₘₓΣ |h(x)|P (X = x) < ∞.
57
Prove the theorem that | E [h(X)] = ₓ∈ᵢₘₓΣ h(x)P (X = x)
Let A = {y : y = h(x) for some x ∈ ImX} Start from the rhs. Write it as two sums, one over y∈A, the other over x∈ImX:h(x)=y pg22
58
``` Take h(x) = x^k What is E[X^k] called? ```
The kth moment of X, when it exists
59
Let X be a discrete random variable such that E [X] exists. Describe the expectation when X is non-negative Prove it
If X is non-negative then E [X] ≥ 0 We have ImX ⊆ [0, ∞) and so E [X] = ₓ∈ᵢₘₓΣ xP (X = x) is a sum whose terms are all non-negative and so must itself be non-negative.
60
Let X be a discrete random variable such that E [X] exists. If a, b ∈ R then E [aX + b] = ... Prove it
E [aX + b] = aE [X] + b
61
For a discrete random variable X, define the variance
``` For a discrete random variable X, the variance of X is defined by var (X) = E[(X − E[X])² ] = E[X²] - (E[X])² provided that this quantity exists. ```
62
What is variance a measure of?
The variance is a measure of how much the distribution of X is spread out about its mean: the more the distribution is spread out, the larger the variance.
63
Is always Var(X)≥ 0? Why?
Yes since (X−E [X])2 is a non-negative random variable, var (X) ≥ 0
64
How are standard deviation and variance related?
Standard deviation^2 = var (X)
65
Suppose that X is a discrete random variable whose variance exists. Then if a and b are (finite) fixed real numbers, then the variance of the discrete random variable Y = aX + b is given by .... Prove it
var (Y ) = var (aX + b) = a² var (X)
66
Suppose that B is an event such that P (B) > 0. Then the conditional distribution of X given B is... P(X = x|B) =
P(X = x|B) = P({X = x} ∩ B) / P(B), for x ∈ R
67
Suppose that B is an event such that P (B) > 0, | The conditional expectation of X given B is...
ₓΣxP(X = x|B), whenever the sum converges absolutely We write pₓ|ᵦ(x) = P(X=x|B)
68
What is the Partition theorem for expectations?
If {B1, B2, . . .} is a partition of Ω such that P (Bi) > 0 for all i ≥ 1 then E [X] = ᵢ≥₁ΣE [X | Bᵢ] P(Bᵢ), whenever E [X] exists.
69
Prove the Partition theorem for expectations
Use the total law of probability to split into two sums, one over x, one over i. pg24
70
Given two random variables X and Y their joint distribution (or joint probability mass function) is pₓ,ᵧ (x, y) =
pₓ,ᵧ (x, y) = P ({X = x} ∩ {Y = y}) = P(X = x, Y = y) x, y ∈ R
71
Is pₓ,ᵧ (x, y) always greater than 0?
Yes
72
What does ₓΣᵧΣpₓ,ᵧ (x, y) = ??
ₓΣᵧΣpₓ,ᵧ (x, y) = 1
73
Joint distributions: | What is the marginal distribution of X?
pₓ(x) = ᵧΣpₓ,ᵧ (x, y)
74
Joint distributions: | marginal distribution of Y?
pᵧ(y) = ₓΣpₓ,ᵧ (x, y)
75
``` Whenever pX(x) > 0 for some x ∈ R, we can also write down the conditional distribution of Y given that X = x: pᵧ|ₓ₌ₓ(y) = ```
pᵧ|ₓ₌ₓ(y) = P (Y = y|X = x) | = pₓ,ᵧ(x,y)/pₓ(x) for y ∈ R
76
The conditional expectation of Y given that X = x is | E [Y |X = x] = ...
E [Y |X = x] = ᵧΣypᵧ|ₓ₌ₓ(y) | whenever the sum converges absolutely
77
When are Discrete random variables X and Y independent?
P(X = x, Y = y) = P(X = x)P(Y = y) for all x, y ∈ R. In other words, X and Y are independent if and only if the events {X = x} and {Y = y} are independent for all choices of x and y. We can also write this as pₓ,ᵧ (x, y) = pₓ(x)pᵧ(y) for all x, y ∈ R
78
In the same way as we defined expectation for a single discrete random variable, so in the bivariate case we can define expectation of any function of the random variables X and Y . Let h : R² → R. Then h(X, Y ) is itself a random variable, and E[h(X, Y )] =
E[h(X, Y )] = ₓΣᵧΣ h(x, y)P(X = x, Y = y) = ₓΣᵧΣ h(x, y)pₓ,ᵧ (x, y) provided the sum converges absolutely.
79
Suppose X and Y are discrete random variables and a, b ∈ R are constants. Then E[aX + bY ] = Prove it
E[aX + bY ] = aE[X] + bE[Y ] provided that both E [X] and E [Y ] exist. Prove it pg28
80
What does E[aX + bY ] = aE[X] + bE[Y ] about expectation?
expectation is linear
81
E[a₁X₁ + · · · + aₙXₙ] =
E[a₁X₁ + · · · + aₙXₙ] = a₁E[X₁] + · · · + aₙE[Xₙ]
82
If X and Y are independent discrete random variables whose expectations exist, then E[XY ] = Prove it
E[XY] = E[X]E[Y ] Proof pg28
83
What is the covariance of X and Y?
cov (X, Y ) = E[(X − E [X])(Y − E [Y ])]
84
What is cov(X,X) = ?
cov (X, X) = var (X)
85
Does cov (X, Y ) = 0 imply that X and Y are independent?
NO!!!!
86
multivariate distributions: pX₁,X₂,...,Xₙ (x₁, x₂, . . . , xₙ) =
pX₁,X₂,...,Xₙ (x₁, x₂, . . . , xₙ) = P(X₁ = x₁, X₂ = x₂, ..., Xₙ = xₙ) for x₁, x₂, ...,xₙ ∈ R
87
A family {Xᵢ | : i ∈ I} of discrete random variables are independent if ....
A family {Xᵢ : i ∈ I} of discrete random variables are independent if for all finite sets J ⊆ I and all collections {Aᵢ : i ∈ J} of subsets of R, P(ᵢ∈ⱼ∩{Xᵢ ∈ Aᵢ}) = ᵢ∈ⱼΠP(Xᵢ ∈ Aᵢ)
88
Suppose that X1, X2, . . . are independent random variables which all have the same distribution, what do we call them?
Independent and identically distributed (i.i.d)
89
A kth order linear recurrence relation (or difference equation) has the form....
ᵏΣⱼ₌₀ aⱼ uₙ₊ⱼ = f(n) with a₀ ≠ 0 and aₖ ≠ 0, where a₀...aₖ re constants independent of n A solution to such a difference equation is a sequence (uₙ)ₙ ≥ ₀ satisfying the sum for all n ≥ 0.
90
The general solution (uₙ)ₙ ≥ ₀ (i.e. if the boundary conditions are not specified) of ᵏΣⱼ₌₀ aⱼ uₙ₊ⱼ = f(n) can be written as ... Prove this
uₙ = vₙ +wₙ where (vₙ)ₙ ≥ ₀ is a particular solution to the equation and (wₙ)ₙ ≥ ₀ solves the homogeneous equation ᵏΣⱼ₌₀ aⱼ wₙ₊ⱼ = 0 proof pg31
91
How would you solve the second order linear difference equation: uₙ₊₁ + auₙ + buₙ₋₁ = f(n) ?
Substitute wₙ = Aλⁿ in wₙ₊₁ + awₙ + bwₙ₋₁ = 0 then divide by Aλⁿ⁻¹ to get the quadratic: λ² + aλ + b = 0 (Aux Eqn) General Soln = wₙ = A₁λ₁ⁿ + A₂λ₂ⁿ or if λ₁ = λ₂ = λ then wₙ = (A + Bn)λⁿ
92
Consider a random walk on the integers Z, started from some n > 0, which at each step increases by 1 with probability p, and decreases by 1 with probability q = 1 − p. Then the probability uₙ that the walk ever hits 0 is given by..... Prove it
uₙ = { (q/p)ⁿ if p>q 1 if p ≤ q Proof pg 38
93
Let X be a non-negative integer-valued random variable. Let S := { s ∈ R : ∞Σₖ₌₀ |s|ᵏ P(X = k) < ∞ } Then the probability generating function (p.g.f.) of X is Gₓ : S → R defined by ....
Gₓ(s) = E[sˣ] = ∞Σₖ₌₀ sᵏP(X=k)
94
pₓ(k) = pₖ = ...
pₓ(k) = pₖ = P(X=k)
95
Is the distribution of X uniquely determined by its probability generating function, Gₓ?
Yes
96
What is the probability generating function of the Bernoulli distribution?
Gₓ(s) = ₖΣpₖsᵏ = qs⁰ + ps¹ = q + ps | for all s ∈ R
97
What is the probability generating function of the Binomial distribution?
Gₓ(s) = ⁿΣₖ₌₀ sᵏ ⁿCₖ pᵏ (1-p)ⁿ⁻ᵏ = ⁿΣₖ₌₀ ⁿCₖ (ps)ᵏ (1-p)ⁿ⁻ᵏ = (1 - p + ps)ⁿ by the binomial theorem. This is valid for all s ∈ R
98
What is the probability generating function of the Poisson distribution?
Gₓ(s) = ∞Σₖ₌₀ sᵏ λᵏe^-λ/k! = e^-λ ∞Σₖ₌₀ (sλ)ᵏ/k! = e^λ(s-1) | for all s ∈ R
99
What is the probability generating function of the Geometric distribution with parameter p?
Gₓ(s) = ps/(1-(1-p)s) | provided that |s| < 1/1−p
100
If X and Y are independent, then Gₓ₊ᵧ(s) = ...
Gₓ₊ᵧ(s) = Gₓ(s)Gᵧ(s)
101
Prove that Gₓ₊ᵧ(s) = Gₓ(s)Gᵧ(s) if X and Y are independent
Gₓ₊ᵧ(s) = E[sˣ⁺ʸ] = E[sˣsʸ] Since X and Y are independent, sˣ and sʸ are independent. So this equals E[sˣ]E[sʸ] = Gₓ(s)Gᵧ(s)
102
Suppose that X₁, X₂, ..., Xₙ are independent Ber(p) random variables and let Y = X₁ + ... + Xₙ. How is Y distributed?
Y ∼ Bin(n, p)
103
Prove that Y ∼ Bin(n, p), if Y = X₁ + ... + Xₙ and X₁, X₂, ..., Xₙ are independent Ber(p) random variables
Gᵧ(s) = E[sʸ] = E[s^(X₁ + ... + Xₙ)] = E[s^X₁] ... E[s^Xₙ] = (1 - p + ps)ⁿ As Y has the same p.g.f. as a Bin(n, p) random variable, we deduce that Y ∼ Bin(n, p).
104
Suppose that X₁, X₂, ..., Xₙ are independent random variables such that Xᵢ ∼ Po(λᵢ) Then ⁿΣᵢ₌₁ Xᵢ ∼ .... In particular, what happens when λᵢ = λ for all 1 ≤ i ≤ n Prove all of this
ⁿΣᵢ₌₁ Xᵢ ∼ Po(ⁿΣᵢ₌₁ λᵢ) λᵢ = λ for all 1 ≤ i ≤ n: ⁿΣᵢ₌₁ Xᵢ ∼ Po(nλ) Proof pg41
105
Show that G'ₓ(1) = E[X]
``` G'ₓ(s) = d/ds E[sˣ] = d/ds ∞Σₖ₌₀ sᵏ P(X=k) = ∞Σₖ₌₀ d/ds sᵏ P(X=k) = ∞Σₖ₌₀ ksᵏ⁻¹P(X=k) = E[Xsˣ⁻¹] G'ₓ(1) = E[X] ```
106
G''ₓ(1) = ...
G''ₓ(1) = E[X(X − 1)] = E[X²] − E[X],
107
Write the variance of X in terms of Gₓ(1) and its derivatives
var(X) = G''ₓ(1) + G'ₓ(1) - (G'ₓ(1))²
108
dᵏ/dsᵏ Gₓ(s) |ₛ₌₁ = ...
dᵏ/dsᵏ Gₓ(s) |ₛ₌₁ = E[X(X-1) ... (X - k + 1)]
109
Let X₁, X₂, . . . be i.i.d. non-negative integer-valued random variables with p.g.f. Gₓ(s). Let N be another non-negative integer-valued random variable, independent of X₁, X₂, . . . and with p.g.f. Gₙ(s). Then the p.g.f. of ᵢ₌₁Σᴺ Xᵢ is ...... Prove it
The pgf of ᵢ₌₁Σᴺ Xᵢ is Gₙ(Gₓ(s)) Note that the sum ᵢ₌₁Σᴺ Xᵢ has a random number of terms. We interpret it as 0 if N = 0. Proof pg 44
110
Suppose that X₁, X₂, ... are independent and identically distributed Ber(p) random variables and that N ∼ Po(λ), independently of X₁, X₂, ... Then ᵢ₌₁Σᴺ Xᵢ ∼
ᵢ₌₁Σᴺ Xᵢ ∼ Po(λp)
111
Prove that: Suppose that X₁, X₂, ... are independent and identically distributed Ber(p) random variables and that N ∼ Po(λ), independently of X₁, X₂, ... Then ᵢ₌₁Σᴺ Xᵢ ∼ Po(λp)
Gₓ(s) = 1 - p + ps and Gₙ(s) = exp(λ(s − 1)) and so E[s^( ᵢ₌₁Σᴺ Xᵢ)] = Gₙ(Gₓ(s)) = exp(λ(1 - p + ps - 1)) = exp(λp(s-1)) Since this is the p.g.f. of Po(λp) and p.g.f.’s uniquely determine distributions, the result follows
112
What is the offspring distribution?
Suppose we have a population (say of bacteria). Each individual in the population lives a unit time and, just before dying, gives birth to a random number of children in the next generation. This number of children has probability mass function p(i), i ≥ 0, called the offspring distribution
113
Let Xₙ be the size of the population in generation n, so that X₀ = 1. Let Cᵢ⁽ⁿ⁾ be the number of children of the ith individual in generation n ≥ 0, so that we may write Xₙ₊₁ = ...
Xₙ₊₁ = C₁⁽ⁿ⁾ + C₂⁽ⁿ⁾ + ... + Cₓₙ⁽ⁿ⁾ We interpret this sum as 0 if Xₙ = 0 Note that C₁⁽ⁿ⁾, C₂⁽ⁿ⁾, .... are independent and identically distributed.
114
Let Xₙ be the size of the population in generation n, so that X₀ = 1. Let Cᵢ⁽ⁿ⁾ be the number of children of the ith individual in generation n ≥ 0, so that we may write Xₙ₊₁ = C₁⁽ⁿ⁾ + C₂⁽ⁿ⁾ + ... + Cₓₙ⁽ⁿ⁾ What is G(s)? and Gₙ(s)
``` G(s) = ∞Σᵢ₌₀ p(i)sᶦ Gₙ(s) = E[sˣⁿ] (That's X subscript n) ```
115
For n ≥ 0 Gₙ₊₁(s) = ... Prove it
Gₙ₊₁(s) = Gₙ(G(s)) = G(G(...G(s)...)) = G(Gₙ(s)) ^(n+1) times Proof pg 45
116
Suppose that the mean number of children of a single individual is µ i.e. ∞Σᵢ₌₁ ip(i) = µ E[Xₙ] = .... Prove it
E[Xₙ] = µⁿ Proof pg 46
117
Branching processes, what is the probability that the population dies out?
P(population dies out) = P(∞∪ₙ₌₀ {Xₙ = 0}) ≥ P (X₁ = 0) = p(0) > 0
118
``` Extinction Probability (non-examinable) pg 47-48 ```
Extinction Probability (non-examinable)
119
A random variable X defined on a probability space (Ω, F, P) is a function X: [ ] such that { w: [ ]} ∈ F for each x ∈ R.
A random variable X defined on a probability space (Ω, F, P) is a function X : Ω → R such that {ω : X(ω) ≤ x} ∈ F for each x ∈ R.
120
What is the cumulative distribution function of a random variable X?
is the function Fₓ : R → [0, 1] defined by Fₓ(x) = P (X ≤ x)
121
Continuous distributions The cdf = Fₓ(x) Is Fₓ decreasing? Prove
No, it's non-decreasing | Proof pg 51
122
Continuous distributions The cdf = Fₓ(x) P (a < X ≤ b) = ??? Prove
P (a < X ≤ b) = Fₓ(b) − Fₓ(a) for a < b | Proof pg 51
123
Continuous distributions The cdf = Fₓ(x) As x → −∞, Fₓ(x) → ??? Prove
x → −∞, Fₓ(x) → 0 | Proof pg 51/52
124
Continuous distributions The cdf = Fₓ(x) As x → ∞, Fₓ(x) → ??? Prove
x → ∞, Fₓ(x) → 1 Proof pg x → ∞, Fₓ(x) → 151/52
125
``` Continuous distributions Any functions satisfying: Fₓ is non-decreasing P (a < X ≤ b) = Fₓ(b) − Fₓ(a) for a < b x → −∞, Fₓ(x) → 0 x → −∞, Fₓ(x) → 0 and [ ] is the cumulative distribution function of some random variable defined on some probability space ```
Right Continuity
126
``` A continuous random variable X is a random variable whose c.d.f. satisfies Fₓ(x) = P[ ] = ∫ [ ] where fₓ : R → R is a function such that a) fₓ(u) [ ] 0 for all u ∈ R b) −∞ ∫ ∞ fₓ(u) du = ```
Fₓ(x) = P (X ≤ x) = −∞ ∫ˣ fₓ(u) du Bounds on the integral -∞ → x where fₓ : R → R is a function such that a) fₓ(u) ≥ 0 for all u ∈ R b) −∞ ∫ ∞ fₓ(u) du = 1
127
Continuous distributions | What is fₓ called?
fₓ is called the probability density function (p.d.f.) of X or, sometimes, just its density.
128
The Fundamental Theorem of Calculus tells us that Fₓ of the form given in the definition is differentiable with dFₓ(x)/dx = [ ]
dFₓ(x)/dx = fₓ(x) | at any point x such that fₓ(x) is continuous.
129
Is fₓ(x) a probability??
No!!!!! | Therefore it can exceed 1
130
If X is a continuous random variable with p.d.f fₓ then P(X=x) = [ ] P(a ≤ X ≤ b) = [ ]
P(X=x) = 0 for all x ∈ R | P(a ≤ X ≤ b) = ₐ∫ᵇ fₓ(x) dx
131
What is the p.d.f. of the Uniform distribution?
fₓ(x) = {1/b-a for a ≤ x ≤ b, | { 0 otherwise
132
What's the notation for X is distributed Uniformally?
X ∼ U[a, b]
133
What is the p.d.f. of the exponential distribution?
fₓ(x) = λe^(-λx), x ≥ 0
134
What is the p.d.f. of the gamma distribution?
α > 0 and λ ≥ 0 fₓ(x) = ((λ^α)/Γ(α)) x^(α-1)e^(-λx), x ≥ 0 Here, Γ(α) is the so-called gamma function, which is defined by Γ(α) = ∞∫₀ u^(α-1)e⁻ᵘ du for α > 0 For most values of α this integral does not have a closed form. However, for a strictly positive integer n, we have Γ(n) = (n − 1)!.
135
What is the p.d.f. of the e normal (or Gaussian) distribution?
µ ∈ R and σ²> 0 fₓ(x) = 1/√2πσ² exp(-(x − µ)²/2σ² ), x ∈ R
136
What's the notation for when X is gamma distributed?
X ∼ Gamma(α, λ)
137
What's the notation for X is distributed normally?
X ∼ N(µ, σ²)
138
What's the notation for X is distributed normally?
X ∼ N(µ, σ²)
139
What is the standard normal distribution?
N(0, 1)
140
P (x ≤ X ≤ x + δ) ≈ [ ]
P (x ≤ X ≤ x + δ) ≈ fₓ(x) δ
141
P (nδ ≤ X ≤ (n + 1)δ) ≈ [ ]
P (nδ ≤ X ≤ (n + 1)δ) ≈ fₓ(nδ)δ
142
Let X be a continuous random variable with probability density function fₓ. The expectation or mean of X is defined to be ...
E [X] = −∞ ∫ ∞ xfₓ(x) dx | whenever −∞ ∫ ∞ |x|fₓ(x) dx < ∞
143
Let X be a continuous random variable with probability density function fₓ and let h be a function from R to R. Then E [h(X)] = ???
E [h(X)] = −∞ ∫ ∞ h(x)fₓ(x) dx whenever −∞ ∫ ∞ |h(x)|fₓ(x) dx < ∞
144
Suppose X is a continuous random variable with p.d.f. fₓ. Then if a, b ∈ R then E [aX + b] = ??? and var (aX + b) Prove it
E [aX + b] = aE [X] + b var (aX + b) = a²var (X) Proof pg 58
145
Does E[1/X] = 1/E[X]?
No!!!!
146
Suppose that X is a continuous random variable with density fₓ and that h : R → R is a differentiable function which is strictly increasing. Then Y = h(X) is a continuous random variable with p.d.f. fᵧ(y) = Prove
fᵧ(y) = fₓ(h⁻¹(y))d/dy h⁻¹(y) where h⁻¹ is the inverse function of h Proof pg60
147
joint cumulative distribution function, Fₓ,ᵧ : R 2 → [0, 1], given by Fₓ,ᵧ (x, y) =
Fₓ,ᵧ (x, y) = P (X ≤ x, Y ≤ y)
148
joint cumulative distribution | Is Fₓ,ᵧ non-decreasing?
Yes
149
joint cumulative distribution | What does Fₓ,ᵧ = when a and y →∞
Fₓ,ᵧ(x, y) = 1
150
joint cumulative distribution | What does Fₓ,ᵧ = when x and y → - ∞
Fₓ,ᵧ(x, y) = 0
151
Let X and Y be random variables such that Fₓ,ᵧ(x, y) = −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv for some function fₓ,ᵧ : R² → R such that a) fₓ,ᵧ(u, v) [ ] 0 for all u, v ∈ R b) −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv = [ ]
a) fₓ,ᵧ(u, v) ≥ 0 for all u, v ∈ R | b) −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv = 1
152
If X and Y are jointly continuous, what is fₓ,ᵧ ??
their joint density function.
153
What is fₓ,ᵧ in terms of Fₓ,ᵧ(x,y)?
fₓ,ᵧ(x, y) = ∂²/∂x∂y Fₓ,ᵧ(x,y)
154
For a single continuous random variable X, it turns out that the probability that it lies in some nice set A ∈ R can be obtained by integrating its density over A P (X ∈ A) = ???
P (X ∈ A) = ₐ∫ fₓ(x) dx
155
For a single continuous random variable X for nice sets B ⊆ R² we obtain the probability that the pair (X, Y ) lies in B by integrating the joint density over the set B P ((X, Y ) ∈ B) = ??
P ((X, Y ) ∈ B) = ∫∫₍ₓ,ᵧ₎∈ᵦ fₓ,ᵧ(x, y)) dxdy
156
For a pair of jointly continuous random variables X and Y , we have P (a < X ≤ b, c < Y ≤ d) = ... Prove
P (a < X ≤ b, c < Y ≤ d) = 𝒸∫ᵈ ₐ∫ᵇ fₓ,ᵧ(x, y)) dxdy for a < b and c < d Proof pg62
157
Suppose X and Y are jointly continuous with joint density fₓ,ᵧ. Then X is a continuous random variable with density fₓ(x) =
-∞∫∞ fₓ,ᵧ(x, y)) dy
158
Suppose X and Y are jointly continuous with joint density fₓ,ᵧ. Then Y is a continuous random variable with density fᵧ(y) = Prove
-∞∫∞ fₓ,ᵧ(x, y)) dx Proof pg 63
159
the one-dimensional densities fₓ and fᵧ of the joint | distribution with density fₓ,ᵧ, are called what?
Marginal distribution
160
When are Jointly continuous random variables X and Y with joint density fₓ,ᵧ independent?
fₓ,ᵧ(x, y) = fₓ(x) fᵧ(y) | for all x, y ∈ R
161
jointly continuous random variables X₁, X₂, . . . , Xₙ with joint density fₓ₁,ₓ₂,...,ₓₙ are independent if...
fₓ₁,ₓ₂,...,ₓₙ(x₁, x₂, . . . , xₙ) = fₓ₁(x₁)fₓ₂(x₂) ... fₓₙ(xₙ) for all x₁, x₂, . . . , xₙ∈ R
162
if X and Y are independent then it follows easily that Fₓ,ᵧ (x, y) = ...
Fₓ,ᵧ (x, y) = Fₓ(x)Fᵧ(y) | for all x, y ∈ R.
163
Write E [h(X, Y )] in terms of a double integral
E [h(X, Y )] = -∞∫∞ -∞∫∞ h(x, y) fₓ,ᵧ(x, y)) dxdy
164
What is he cov(X, Y)?
cov (X, Y ) = E [(X − E [X])(Y − E [Y ])] = E [XY ] − E [X] E [Y ]
165
Let X₁, X₂, . . . , Xₙ denote i.i.d. random variables. Then these random variables are said to constitute a [ ] from the distribution
random sample of size n
166
What is the sample mean defined to be?
_ | Xₙ = 1/n ᵢ₌₁Σⁿ Xᵢ
167
What is var(X+Y)?? For random variables X and Y
var (X + Y ) = var (X) + var (Y ) + 2cov (X, Y )
168
What is var(ᵢ₌₁Σⁿ Xᵢ)?? For random variables X and Y
var(ᵢ₌₁Σⁿ Xᵢ) = ᵢ₌₁Σⁿ var(Xᵢ) + ᵢ≠ⱼΣcov(Xᵢ, Xⱼ) | = ᵢ₌₁Σⁿ var(Xᵢ) +2ᵢ
169
Suppose that X₁, X₂, . . . , Xₙ form a random sample from a distribution with mean µ and variance σ². Then the expectation and variance of the sample mean are ... Prove it
_ _ E[Xₙ] = µ and var(Xₙ) = 1/n σ² Proof pg 67
170
Let X₁, X₂, . . . , Xₙ be a random sample from a Bernoulli distribution with parameter p. What do E[Xᵢ], var(Xᵢ), and _ _ E[Xₙ] var(Xₙ) equal??
``` E[Xᵢ] = p var(Xᵢ) = p(1-p) for all 1 ≤ i ≤ n Hence, _ _ E[Xₙ] = p and var(Xₙ) = p(1-p)/n ```
171
Suppose that A is an event with probability P (A) and write p = P (A). Let X be the indicator function of the event A i.e. the random variable defined by X(ω) = 1ₐ(ω) = {1 if ω ∈ A {0 if ω ∉ A Then X ∼ [ ] and E[X} = [ ]
X ∼ Ber(p) and E [X] = p
172
State the weak law of large numbers .... | Prove it
Suppose that X₁, X₂, . . . . are independent and identically distributed random variables with mean µ. Then for any fixed ε > 0 As n → ∞ P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| > ε)→0 Proof pg 68
173
Weak law of large numbers: P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| ≤ ε)→??? As n → ∞
P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| ≤ ε)→1
174
What is Markov’s inequality? | Prove it
Suppose that Y is a non-negative random variable whose expectation exists. Then P(Y ≥ t) ≤ E[Y]/t for all t > 0. Proof pg68
175
What is Chebyshev’s inequality? | Prove it
Suppose that Z is a random variable with a finite variance. Then for any t > 0, P (|Z − E [Z] | ≥ t) ≤ var (Z)/t² Proof: Note that P (|Z − E [Z]| ≥ t) = P((Z − E [Z])² ≥ t²) and then apply Markov’s inequality to the non-negative random variable Y = (Z − E [Z])²