Chapter 3: Probability Flashcards
What are meant by random variables?
In probability theory, we describe the behaviour of random variables. This is a statistical term for variables that associate different numeric values with each of the possible outcomes of some random process.
What is meat by the term random in random variable?
By random here we do not mean the colloquial use of this term to mean something that is entirely unpredictable. A random process is simply a process whose outcome cannot be perfectly known ahead of time (it may nonetheless be quite predictable).
Imagine that we enter a lottery, where we select a number from 1 to 100, to have a chance of winning $1000. We suppose that in the lottery only one ball is drawn and it is fair, meaning that all numbers are equally likely to win.
Describe what this function would look like
A discrete probability distribution since the variable we measure – the winning number – is confined to a finite set of values. It would therefore look like a set of 100 bars of equal height and width since all numbers are equally likely to win.
Compare the function of the probability of drawing the lottery number with one depicting the probability of: Before test driving a second-hand car, we are uncertain about its value. From seeing pictures of the car, we might think that it is worth anywhere from $2000 to $4000, with all values being equally likely.
SInce the range of possible values are continuous (kinda), The graph would depict the probability density instead and it would be one square box from 2000 to 4000 with the height being a probability of 1/2000.
The aforementioned cases are both examples of valid probability distributions. So what are their defining properties?
o All values of the distribution must be real and non-negative.
o The sum (for discrete random variables) or integral (for continuous random variables) across all possible values of the random variable must be 1.
How is this satisfied in the discrete lottery case?
E^100 i = 1 1/100 = 1
i.e the sum of 100 1/100s = 1
How is this satisfied for the continuous case of the second-hand car example?
All values of the distribution must be real and non-negative: The graph indicates that p(v) = 1/2000 ≥ 0 for 2000 ≤ v ≤ 4000
integral (for continuous random variables) across all possible values of the random variable must be 1: Fortunately, since integration is essentially just working out an area underneath a curve, we can calculate the integral by appealing to the geometry of the graph. Since this is just a rectangular shape, we calculate the integral by multiplying the base by its height:
area = 1/2000 x 2000 = 1
It may seem that this definition is arbitrary or, perhaps, well-trodden territory for some readers, why is it important to note?
It is of central importance to Bayesian statistics. This is because Bayesians like to work with and produce valid probability distributions. This is because only valid probability distributions can be used to describe uncertainty. The pursuit of this ideal underlies the majority of all methods in applied Bayesian statistics – analytic and computational
How would you calculate probability that the winning number, X , is 3 in the discrete probability distribution for the lottery? How would you calculate 10 or less?
Easy!
Pr(X = 3) = 1 / 100
To calculate the probability that the winning number is 10 or less, we just sum the probabilities of it being {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}: 1/10
How would you calculate the probability that the value of the second-hand car is $2500?
We could conclude that Pr(value = $2500) = 1/2000. However, using the same logic, we would deduce that the probabilities of the value of the car being {$2500, $2500.10, $2500.01, $2500.001} are all 1/2000. Furthermore, we could deduce the same probability for an infinite number of possible values, which if summed together would yield infinity. This means that, for a continuous random variable, we always have Pr(θ = number) = 0, to avoid an infinite sum.
What is the solution to this problem regarding infinite sums in continuous distributions?
When we consider p(θ) for a continuous random variable, it turns out we should interpret its values as probability densities, not probabilities. We can use a continuous probability distribution to calculate the probability that a random variable lies within an interval of possible values.
What is the equivalent of a sum in when calculating probability from a continuous distribution?
To do this, we use the continuous analogue of a sum, an integral. Calculating an integral is equivalent to calculating the area under a probability density curve. For the car example, we can calculate the probability that the car’s value lies between $2500 and $3000 by determining the rectangular area underneath the graph shown:
1 / 2000 (height) x 500 (base) = 1/4
What is the difference between p(…) and pr(…)?
we use Pr to explicitly state that the result is a probability, whereas p(value) is a probability density.
What is meant by the base in calculating the example previously?
In the example of crossing the ice you are certain to fall into from the book:
For densities we must supply a volume, which provides the exchange rate to convert it into a probability. Note that the word volume is used for its analogy with three-dimensional solids, where we calculate the mass of an object by multiplying the density by its volume. Analogously, here we calculate the probability mass of an infinitesimal volume:
probability mass = probability density x volume
However, here a volume need not correspond to an actual three- dimensional volume in space, but to a unit of measurement across a parameter range of interest. In the above examples we use a length then an area as our volume unit, but in other cases it might be a volume, a percentage or even a probability.
How can we hope to obtain a sample of numbers from our distribution, since they are all individually impossible?
When we say an event is impossible, it has a probability of zero. When we use the word impossible we mean that the event is not within our space of potential outcomes.
Imagine a sample of numbers from a standard normal distribution. Here the purely imaginary number i does not belong to the set of possible outcomes and hence has zero probability. Conversely, consider attempting to guess exactly the number that we sample from a standard normal distribution. Clearly, obtaining the number 3.142 here is possible – it does not lie outside of the range of the distribution – so it belongs to our potential outcomes. However, if we multiply our probability density by the volume corresponding to this single value, then we get zero because the volume element is of zero width. So we see that events that have a probability of zero can still be possible.
How do we use Bayes’ rule differently for probability distributions and probability distributions?
While it is important to understand that probabilities and probability densities are not the same types of entity, the good news for us is that Bayes’ rule is the same for each.
p(θ = 1| X = 1) just becomes pr(θ = 1| X = 1)
When the data, X , and the parameter θ are discrete, and hence Pr denotes a probability. When the data and parameter are continuous and p denotes a probability density.
What is the mean of a distribution?
A mean, or expected value, of a distribution is the long-run average value that would be obtained if we sampled from it an infinite number of times.
How does the method of calculating a mean depend on the distribution?
The method to calculate the mean of a distribution depends on whether it is discrete or continuous in nature. However, the concept is essentially the same in both cases. The mean is calculated as a weighted sum (for discrete random variables) or integral (for continuous variables) across all potential values of the random variable where the weights are provided by the probability distribution.
Give both the equation for calculating the mean of discrete distributions and continuous distributions
E(X) = E aPr(X = a)
E(X) = S(all a) a p(a)d a
In the two expressions, α is any one of the discrete set, or continuum, of possible values for the random variable X, respectively. We use Pr in the first expression in (3.9) and p in the second, to indicate these are probabilities and probability densities, respectively.
Therefore how would you calculate the mean winning number of the lottery example
(1 x 1/100) + (2 x 1/100) … (99 x 1/100) x (100 x 1/100)
= 50.5
You can also demonstrate the long-run nature of the mean value of by computationally simulating many plays of the lottery. As the number of games played increases, the running mean becomes closer to this value.
How would you calculate the expected (or mean) value of the second-hand car?
This amounts to integrating the curve V x 1/2000 between $2000 and $4000. The region bounded by this curve and the axis can be broken up into triangular and rectangular regions , and so we calculate the total area by summing the individual areas (see figures):
area = {2000 x 1} + {0.5 x 2000 x 1} = 3000
A B
We got this through finding the area under the graph representing the PDF times the car’s value. A corresponds to the rectangle while B corresponds to the triangle, 2000 is the value of the car along the x axis (4000 - 2000) while 1 is the height of the probability density on the y axis. 0.5 is added in B because the triangle is half the size of the rectangle (also the formula for the area of a triangle)
Comment on the generalisability of these examples
Life is often more complex than the examples encountered thus far. We often must reason about the outcomes of a number of processes, whose results may be interdependent. The next few examples involve considering the outcome of two measurements to introduce the mechanics of two-dimensional probability distributions. Fortunately, these rules do not become more complex when generalising to higher dimensional problems. This means that if the reader is comfortable with the following examples, then they should understand the majority of calculations involving probability distributions.
Imagine that you are a horse racing aficionado and want to quantify the uncertainty in the outcome of two separate races. In each race there are two horses from a particular stable, called A and B. From their historical performance over 100 races, you notice that both horses often react the same way to the racing conditions. When horse A wins, it is more likely that, later in the day, B will also win, and vice versa, with similar interrelations for the losses; when A finds conditions tough, so does B. Wanting to flex your statistical muscle, you represent the historical race results by the two-dimensional probability distribution (see figures)
0 1 0 (lose) 30/100 10/100 1 (win) 10/100 50/100
Does this distribution satisfy the requirements for a valid probability distribution? Show why or why not
Since all the values of the distribution are real and non-negative, this satisfies our first requirement. Since our distribution is composed of two discrete random variables, we must sum over the possible values of both to test if it is normalised
E Pr(XA = i, XB = j) = 3/10 + 1/10 + 1/10 + 5/10 = 1
XA and XB are random variables which represent the race for horses A and B, respectively. Notice that since our situation considers the outcome of two random variables, we must index the probability, Pr(XA,XB), by both.
How can we interpret the probability distribution shown in this table? Specifically how do we figure out the probability that both horses lose?
The probability that both horses lose (and hence both their random variables equal 0) is just read off from the top-left entry in the table, meaning:
Pr(XA = 0, XB = 0) = 30/100
This is similar for looking at any of these outcomes:
Pr(XA = 1, XB = 1) = 50/100
Pr(XA = 1, XB = 0) = 10/100
Pr(XA = 0, XB = 1) = 10/100