Q1. What is the difference between a random variable and a probability distribution? Provide an example.

Answer:Random variable is just a variable, it can take values in it’s domain and the value of which the agent is uncertain about. Probability distribution is a function that determines the probability of the proposition X=x, where X is random variable and x is a value from dom(X).

Q2. What is the smallest possible domain for a (non-deterministic) random variable, and why?

Answer: Smallest possible domain for a random variable contains two values, because it is the smallest domain where variable is able to ‘vary’.

Q3. In probability theory, what is a possible world? Provide an example.

Answer: A possible world in probability theory is possible assignment of all random variables.

All possible worlds must be mutually exhaustive and exclusive. For example two random variables X and Y with dom(X),dom(Y) = {T;F} results in 4 possible worlds

X=T, Y=T

X=T, Y = F

X = F, Y = T

X = F, Y = F

Q4. In words, explain how we can compute the probability of a proposition using a measure over possible worlds.

Answer: We add up probability measures of all possible worlds where this proposition is true.

Q5. What is marginalization?

Answer: Marginalization is a process of forming a marginal distribution by “marginalizing out” random variables - building a probability distribution where values of those variables are ignored.

Q6. In words, what does it mean to condition a joint probability distribution on one variable taking a certain value? Which variables is the resulting distribution defined over?

Answer: It means revising the beliefs given new information (one variable taking a certain value). The probability distribution will be recomputed taking into the account this new information. In this case the resulting distribution will be defined over all random variables from previous distribution except the one that took a certain value.

Q7. Explain, both in words and using math, the difference between p(ℎ|e) and p(ℎ⋀e).

Answer: p(ℎ|e) is the probability distribution of h given that the value of evidence e is known, and p(ℎ⋀e) is the probability distribution of h and e. Value of e in second case is not given. p(ℎ|e) = P(h^e)/P(e)

Q8. Derive Bayes' rule; explain how this derivation depends on the product rule.

Answer: P(h|e) = P(e|h)*P(h)/P(e)

Q9. Explain why Bayes' rule is important for AI.

Answer: We often want to do evidential reasoning. Bayes’ rule gives a way to do that.

Q10. In words, explain the difference between conditional independence and marginal independence.

Answer: For variables X and Y , if P(X|Y ) = P(X) then X is marginally independent of Y . That is, knowing the value of Y doesn’t change the probability of X. For conditional independence, if P(X|Y, Z) = P(X|Z) then X is conditionally independent of Y given Z. In other words, if you know the value of Z, knowing the value of Y doesn’t tell you anything more about X.

Alternative: For marginal independence : learning that Y=y does not change your belief in X and this is true for all values y that Y could take. For conditional independence: learning that Y=y does not change your belief in X when we already know Z=z and this is true for all values y that Y could take and all values z that Z could take

Q11. Give an example that was not given in class or the textbook, where random variables x and y are conditionally dependent given z, but marginally independent, and explain why these properties hold. Draw a Bayesian network that corresponds to your example.

Q12. Give an example that was not given in class or the textbook, where random variables x and y are conditionally independent given z, but marginally dependent, and explain why these properties hold. Draw a Bayesian network that corresponds to your example.

Q13. Give an example that was not given in class or the textbook, where random variables x and y are conditionally dependent given z, and also marginally dependent, and explain why these properties hold. Draw a Bayesian network that corresponds to your example.

Q16. Name an operation that could be performed on a factor f that could decrease f’s dimensionality, and explain how much smaller the new factor would be.

Asnwer: We can decrease a factor dimensionally by performing a marginalization operation of that factor. For example if we have factof f(A,B,C) and we marginalize out C, we get factor f(A,B)

Q17. Name an operation that could be performed on two factors f and f0 that could lead to an increase in dimensionality, and explain how much bigger the new factor could be.

Answer: We can multiply these two factors. This will lead to a factor with increased dimensions. The increase is sum of number of variables in these two factors - number of joint variables. For example we have f0(A,B,C)*f1(C,D) the resulting factor will be f2(A,B,C,D).

Q18. How are the parents of a node k in a Bayesian network determined? What do they have to do with conditional independence?

Answer: In Bayesian network, the parents of a node k are determined by what k is directly dependent on. Having conditional dependency changes what nodes are parents of k.

Q19. Consider a node N in a Bnet. N’s domain has size d. N has k parents, each with also domain size d. What is the size of the CPT for P(N|parentsOfN)? Is it true that the sum of all the entries in this CPT is equal to d? why or why not?

Answer: We need to store k*d^k rows in CPT for P(N|parentsOfN). No, it will equal to d^k since sum of all probabilies of P(N|P0 = k0, ... , Pi = ki) sum up to 1 and we have d^k combinations of parents.

Q20. What is the space complexity for storing a Bayesian network, where n is the number of random variables, d is the size of the biggest variable domain, and k is the maximal number of parents a variable has? Use O() notation and explain your answer in words.

Answer: O(n*d*d^k). In max, d^k possible combinations of parents nodes values and d values in domain of a node. So each variable requires no more than d*d^k rows in the table. And we have n variables in network so the space complexity is O(n*d*d^k)

Q21. In words, define the role of the elimination ordering in the variable elimination algorithm. How does this relate to the chain rule expansion ordering used in the construction of the underlying Bayesian network?

Answer: Elimination ordering influences treewidth - maximum number of variables in any factor during execution of VE algorithm. Good ordering will result in small treewidth. Chain rule expansion ordering also affects treewidth as it determines what factors are produced.

Q22. Give an example of a variable that would not need to be summed out in variable elimination.

Answer: If we are looking for P(G|H=T) variables G and H would not need to be sum out.