Final: Reasoning under Uncertainty Flashcards

1
Q

(L23) What is a random variable, domain and its characteristics?

A
  • Variable that an agent can be uncertain about its value
  • dom(X): set of values that X can take
  • Mutually exclusive and exhaustive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(L23) What is the probability distribution of a random variable?

A

P on a random variable X is a function dom(X) → [0, 1] such that for som value x → P(X = x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

(L23) Give examples of random variables?

A

(Discrete) Roll a dice,
dom = 1, …, 6
Probability distribution: even across all values of the dice
(Boolean) Raining outside

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(L23) w╞ X=x

A

variable X = x in world w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(L23) What are the semantics of probability?

A
  • µ(w): probability of world w
  • \sum w \in W of µ(w) = 1
  • P(f)= \ sum w⊧ f of µ(w) = 1 (sum the probabilities of the worlds where f is true)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(L23) Calculate the probability of proposition f given u(w) for the set of possible worlds

A

P(f)= \ sum w⊧ f of µ(w) = 1 (sum the probabilities of the worlds where f is true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(L23) Define a joint probability distribution

A
  • Their joint distribution is a probability distribution over the variables’ Cartesian product
  • Sum of entries across the whole table is 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(L24) Can you compute the probability distribution for each variable?

A

YES, by marginalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

(L24) Can you compute the probability distribution for any combination of variables?

A

YES, by marginalization we can compute the distribution over any subset of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(L24) Can you update these probabilities if you know something about the variables?

A

YES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(L24) DOWNSIDE of the joint probability distribution (JDC)

A

Size is exponential in the number of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(L24) What is normalization?

A

Rule out worlds and divide by p (new known information)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(L24) Prove the formula to compute conditional probability P(h | e) = P(h, e)/ P(e)

A

p.16

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(L24) Derive the Product Rule, Chain Rule and Bayes’ Rule

A

p.17

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(L24) Two ways to use conditional probability for inference

A
  • Causal knowledge (from cause to evidence) P(evidence| cause hypothesis)
  • Evidential reasoning (from evidence to cause) P(hypothesis| evidence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(L25) Define Marginal Independence + interpretation

A

For all x_i in dom(X) and all y_k dom(Y), P(X= x_i | Y= y_k ) = P(X= x_i) –> P(X, Y) = P(X) * P(Y)

  • New evidence Y (or X) does not affect current belief in X (or Y)

Joint probability Distribution (JPD) requiring 2^4 = 16 entries is reduced to two smaller ones (2^3 = 8 and 2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

(L25) Define Conditional Independence + interpretation

A

For all x_i \in dom(X), y_k \in dom(Y), z_m \in dom(Z),
P(X = x_i | Y = y_k, Z = z_m) = P(X = x_i | Z = z_m)

Value of Y doesn’t affect the belief in the value of X, given the value of Z.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

(L25) Space complexity of P(X1, …, Xn), with n independent random variables

A

Be independency, P(X1, …, Xn) = P(X1) * … * P(Nn)

O(d^n) → O(n*d) space complexity:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

(L25) Two equivalent statements:

Given catch is conditionally independent of toothache given cavity.

A
  1. P(tooth | catch, cavity) = P(tooth | cavity)

2. P(tooth, catch | cavity) = P(tooth | cavity) * P(catch | cavity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

(L25) How many values do we store in JPD, CPT?

A

Joint Probability Distributions (JPD): has O(d^n) values
Since all of them sum to 1, we store O(d^n) - 1 values

Conditional Probability Table (CPT): has O(d^n) values
Each row sums to 1, we store all but 1 per row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

(L25) Uses of conditional independence

A
  • Reduce the size of the joint distribution from exponential n to linear n where n is the # of variables.
  • Most basic and robust form of knowledge about uncertain environments.
22
Q

(L26) What is a Belief Network?

A

Directed Acyclic Graph that effectively expresses independence assertions among random variables

23
Q

(L26) What is constant and change for a system that collects some evidence?

A
Constant: Environment 
- Values of the evidence 
- True causes of the evidence 
Change: 
- Set of evidence collected so far
- Own belief in the cause of the fault
24
Q

(L26) Steps to build a Belief Network for a simple domain

A
  1. Apply Chain Rule
  2. Simplify according to marginal & conditional independence
  3. Express dependencies as a network
  4. It can be derived directly from the causal knowledge
25
Q

(L26) State and draw the types of inferences (Burglary, Alarm, JohnCalls, Earthquake)

A

Diagnostic: Burglary –> Alarm –> JohnCalls
Predictive: Burglary –> Alarm –> JohnCalls
Inter-Causal: Burglary –> Alarm Alarm –> JohnCalls

26
Q

(L26) Compute the representational saving in terms of number of probabilities required

A
  1. Compute the total number of joint distribution entries - 1. product of domains
  2. Compute the product with the new domain sizes, for every node
    CPT number of values (P(C_1 | L_1, N)) = dom(L1) * dom(N) * (dom(C1) - 1.
  3. Repeat and sum all CPTs
  4. Calculate the representational saving.
27
Q

(L26) How many rows are in a Conditional Probability Table for a boolean variable X with k parents?

A

2^k rows

28
Q

(L26) Let n boolean variables with no more than k parents, how many values to be stored?

A

O(n*2^k) values to be stored

29
Q

(L26) For n variables, how many values in the JPD need to be stored?

A

O(2^n)

30
Q

(L27) Two assumptions of Noisy-OR distributions.

A

TWO ASSUMPTIONS

  • All causes are listed (1)
  • For each cause, whatever inhibits it from generating the effect is independent from the inhibitors of the other causes. (2)
31
Q

(L27) Benefits of Noisy-OR distributions

A
  • Far fewer probabilities need to be specified.

- Number of probabilities is linear in the number of parents.

32
Q

(L27) Define Noisy-OR distributions.

A

Model to describe relations between variables in one CPT of a Bayesian network - Models non-interacting causes

33
Q

(L27) Assumptions of NaÏve Bayesian classifier

A

TWO ASSUMPTIONS

  • The value of each attribute depends on the classification
  • (Naïve) Attributes are independent of each other given the classification
34
Q

(L27) Benefits of NaÏve Bayesian classifier

A
  • Less number of parameters

- Easy to acquire (e.g. if you have a large collection of emails for which you know whether ot not they are spam)

35
Q

(L 27) How do you classify using the NaÏve Bayesian classifier

A
Choose the most likely class given the set of observations:
P(Spam = T| …) > P(Spam = F | …)
36
Q

(L27) Definition of a NaÏve Bayesian classifier

A

A very simple and successful BNet that allows us to classify entities in a set of classes, given a set of attributes.

37
Q

(L29) Techniques to simplify variable elimination.

A

Use the techniques to simplify variable elimination.

  • Variable Elimination → All the variables which the query is conditionally independent given the observations can be prune from the BNet
  • Unobserved leaf nodes → can be pruned
38
Q

(L29) Carry out variable elimination on a BNet by using factor representation and using the factor operations.

A

p.28

39
Q

(L30) What is a Markov Chain?

A

A single random variable for each time slice. Let S_t represent a state.

40
Q

(L30) What is the Markov Assumption?

A

Each random variable depends only on the previous one.

P(S_{t+ 1} | S_0, …, S_t) = P(S_{t+1} | S_t)

41
Q

(L30) What is the Stationary Process Assumption?

A

The mechanism that regulates how state variables change over time is stationary. It can be described by a single transitional model (a single CPT)

P(S_{t+1} | S_t) is the same for all t

42
Q

(L30) Specify Markov Chain and compute the probability of a sequence of states

A

By the Stationary assumption, and Markov assumption, we check the stochastic transition matrix, and P(S_0), and the joint probability is given by:

P(S_0, …, S_t) = P(S_0) * P(S_1 | S_0) * P(S_2 | S_1) …. P(S_{t-1} | S_{t-2}) * P(S_t | S_{t - 1})

43
Q

(L30) Describe the key problems in natural language processing

A
  1. Assign a probability to a sentence (Machine translation)
  2. Predict the next word (Speech recognition)

Assuming:
~ 10^5 words in a language
~ 10 words per sentence
→ We need probabilities for 10^50 sentences

44
Q

(L30) Justify and apply Markov Chains to compute the probability of a Natural Language sentence

A

Write down the probability of sequence of states.
Count sequential pairs of words
Count the number of words
Find the bigram P(wi | w_{i - 1})

45
Q

What is a Hidden Markov Chain HMM?

A

The reasoning system does not have access to the states but can make observations that give some information about the current state.

46
Q

Specify the components of a Hidden Markov Model (HMM). (conditions, dynamics, sensor model representations and domains)

A
  1. Starts with a Markov Chain
  2. Adds noisy observation about the state at each time state

P(S0) specifies initial conditions
P(S_t+1 | S_t) specifies the dynamics
P(O_t | S_t) specifies the sensor model

Size of P(S_0) = k
Size of P(S_{t+1} | S_t) = k * k
Size of P(O_t | S_t) = k * h

47
Q

Can we prune in HMM?

A

When we use the hidden markov model, all of the observations will always be known. Even if we know the actions that are taken as well, then there is nothing we can prune. For a hidden markov model, we won’t know the states.

48
Q

Justify and apply HMMs to problems such as robot localization

A

p.34

49
Q

Compare and contrast stochastic single-stage (one-off) decisions vs multistage decisions

A

Single-stage decisions:
Set of one or more primitive decisions that can be treated as a single macro decision to be made before acting, with no prior observations.

Represented as a decision tree

  • The agent has a set of decision to make (a macro-action it can perform)
  • Decisions can influence random variables
  • Decisions have probability distributions over outcomes

Multistage decisions
- Set of one or more decisions each of which depends on observations

50
Q

Define a Utility Function on possible worlds

A

Weighted average utility across possible worlds probabilities

51
Q

How are decision nodes and utility nodes represented in single stage decision networks?

A

Decision nodes: the agent chooses the value (rectangles)

Utility node: Parents are variables on which the utility function depends (diamond)

52
Q

Compute optimal decisions by Variable Elimination

A
  1. Create a factor for each conditional probability and for the utility
  2. Multiply factors and sum out all the random variables (this creates a factor that gives the expected utility for each decision)
  3. Choose the decision with the maximum value in the factor