Chapter 3: Probability Flashcards

Question

Suppose that we measure the foot size and literacy test scores for a group of individuals. Both of these variables can be assumed to be continuous. How many dimensions do we need to plot it?

Answer 1

Since this distribution is two-dimensional we need three dimensions to plot it – two dimensions for the variables and one dimension for the probability density. These three-dimensional plots are, however, a bit cumbersome to deal with, and so we prefer to use contour plots to graph two-dimensional continuous probability distributions (see figures.)

Answer 2

In contour plots, we mark the set of positions where the value of the probability density function is constant, as contour lines. The rate of change of the gradient of the function at a particular position in parameter space is, hence, determined by the local density of contour lines.

Answer 3

This means that there is a positive correlation between foot size and scores on the literacy test; as an individual’s foot size increases, so does their literacy score, on average. (This is because the confound of children's age)

Answer 4

A marginal probability distribution: To calculate this, we must average out the dependence of the other variable. Since we are interested only in the result of A, we can sum down the column values for B to give us the marginal distribution of A (see figures). Mathematically we can write down this rule for a two-dimensional probability distribution as: Discrete marginal probability distributions Pr(A = a) = E Pr(A = a, B = B)

Answer 5

For continuous random variables we use the continuous analogue of a sum, an integral, to calculate the marginal distribution because the other variable can now equal any of a continuum of possible values: S (all B) pAB(α,β) dB pAB(α,β) represents the joint probability distribution of random variables A and B, evaluated at (A = α ,B = β). Similarly, pA(α) represents the marginal distribution of random variable A, evaluated at A = α. Although it is somewhat an abuse of notation, for simplicity, from now on we write pAB(α,β) as p (A,B) and pA(α) as p(A).

Answer 6

We can obtain this distribution by ‘integrating out’ the dependence on foot size: (30)S(0) p(score, FS) dFS The result of carrying out the calculation in (3.17) is the distribution shown in the right-side graph in Figure 3.8. We have rotated this graph to emphasise that it is obtained by summing (really, integrating) across the joint density at each individual value of literacy score. Similarly, we can obtain the marginal distribution for foot size by integrating the joint density with respect to literacy score. The resultant distribution is shown in the bottom graph of Figure 3.8.

Answer 7

Sampling from the joint distribution of literacy score and foot size. In particular, if we can generate independent samples from the joint distribution of literacy score and foot size, we can estimate the marginal distribution for each variable. To estimate these marginal distributions we ignore the observations of the variable not directly of interest and draw a histogram of the remaining samples (see figures). While not exact, the shape of this histogram is a good approximation of the marginal distribution if we have enough samples.

Answer 8

An alternative way to think about marginal distributions is using Venn diagrams. In a Venn diagram, the area of a particular event indicates its probability, and the rectangular area represents all the events that can possibly happen, so it has an area of 1. In Figure 3.10, we specify the events of horses A and B winning as sub-areas in the diagram. These areas overlap, indicating a region of joint probability where Pr(XA = 1,XB = 1). Using this diagram, it is straightforward to calculate the marginal probability of A or B winning: we find the area of the elliptic shapes A or B, respectively (see figures).

Answer 9

In probability, when we observe one variable and want to update our uncertainty for another variable, we are seeking a conditional distribution. This is because we compute the probability distribution of one uncertain variable, conditional on the known value of the other(s).

Answer 10

In each case, we have reduced some of the uncertainty in the system by observing one of its characteristics. Hence, in the two-dimensional examples described above, the conditional distribution is one-dimensional because we are only now uncertain about one variable.

Answer 11

p(A | B) = p(A, B) / p(B) p(A|B) refers to the probability (or probability density) of A occurring, given that B has occurred. On the right-hand side of this expression, p(B) is the marginal distribution of B, and p(A,B) is the joint probability that A and B both occur. For the horses example, suppose that we observe that horse A wins. To calculate the probability that B also wins, we use: 50/100 / 10/100 + 50/100 = 5/6

Answer 12

We reduce our solution space to only the middle column. Therefore, we renormalise the solution space to have a total probability of 1 by dividing each of its entries by its sum of probabilities: 0 1 Pr(XR | XA = 1) 0 (lose) 30/100 10/100 10/100 / 60/100 = 1/6 1 (win) 10/100 50/100 50/100 / 60/100 = 5/6 Pr(XA) 40/100 60/100

Answer 13

If there is a relationship between two random variables, we say that they are dependent. This does not necessarily mean causal dependence, as it is sometimes supposed, in that the behaviour of one random variable affects the outcome of another. It just means that the outcome of the first is informative for predicting the second.

Answer 14

If two events, A and B, are disjoint, then if one occurs, the other cannot.

Answer 15

In this case, it is often mistakenly believed that the variables are independent, although this is not true. In this case, knowledge that event A has occurred provides significant information about whether B will. If A occurs, then we know for certain that B cannot.

Answer 16

Mathematically, this means that the conditional probability of A is equal to its marginal

Answer 17

Using the conditional probability rule, we use this to rewrite this expression as: Pr(A, B) / Pr(B) = Pr(A) In other words, the ratio of the joint probability A and B occurring to the marginal probability of B is the same as the overall probability of A

Answer 18

This central tendency of the sample mean increases along with sample size, since extreme values then require more individual scores to be simultaneously extreme, which is less likely. Also as our sample size increases, the distribution is an increasingly good fit to the normal distribution. This approximation, it turns out, becomes exact in the limit of an infinite sample size and is known as the central limit theorem (CLT).

Answer 19

For practical purposes approximation is generally reasonable if the sample size is above about 20

Answer 20

There are, in fact, a number of central limit theorems. The above CLT applies to the average of independent, identically distributed random variables. However, there are also central limit theorems that apply far less stringent conditions. This means that whenever an output is the result of the sum or average of a number of largely independent factors, then it may be reasonable to assume it is normally distributed.

Chapter 3: Probability Flashcards

(44 cards)