Practice Exam Flashcards
(36 cards)
T/F
Q-learning can learn the optimal Q-function Q without ever
executing the optimal policy.
True
Yes, this is a property called off-policy learning.
Which of the following would the best reward function for a robot
that is trying to learn to escape a maze quickly (assume a discount of $$\gamma = 1$$):
(A) Reward of +1 for escaping the maze and a reward of zero at all other times.
(B) Reward of +1 for escaping the maze and a reward -1 at all other times.
(C) Reward of +1000 for escaping the maze and a reward 1 at all other times.
(B) Reward of +1 for escaping the maze and a reward -1 at all other times.
What does regret let us quantify?
(A) Whether our policy is optimal or not.
(B) The relative goodness of exploration procedures.
(C) The negative utility of a state like a fire pit.
(D) How accurately we estimated the probabilities of the transition function
(B) The relative goodness of exploration procedures.
Which of the following is NOT True for both MDPs and Reinforcement Learning?
(A) A discounted future reward is used.
(B) An instantaneous reward is used.
(C) After selecting an action at a state, the resulting state is probabilistically determined.
(D) The values for the transition function are known in advance.
(D) The values for the transition function are known in advance.
T/F
The utility function estimate must be completely accurate in order to get an optimal policy.
False
What is a contraction?
(A) The time savings from estimating the optimal policy via policy iteration instead of value iteration.
(B) Part of the proof of convergence for the value iteration algorithm.
(C) A shorter path to a node in the A* algorithm when that node is already present on the priority queue.
(D) The part of the state space that is not observable in partially observable MDPs.
(B) Part of the proof of convergence for the value iteration algorithm.
In the MDP framework we model the interaction between an agent and an environment. Which of the following statements are true of that framework
2.3.1
The agent selects actions, which deterministically move it to a new state in the environment.
False
In the MDP framework we model the interaction between an agent and an environment. Which of the following statements are true of that framework
2.3.2
The agent receives a reward only once it arrives in its goal state.
False
You roll two regular six-sided dice. What is the probability of getting a total sum of 10 or more given that the first dice shows a 6? Write as a decimal.
0.5
How many ways are there to apply the chain rule to a joint distribution with $$N$$ random variables?
(A) $$N$$
(B) $$N^2$$
(C) $$2^N$$
(D) $$N!$$
(D) $$N!$$
T/F
The Markov property says that given the past state, the present and the future are independent.
False
If a process is stationary, it means that:
(A) the state itself does not change
(B) the conditional probability table does not change over time
(C) the transition table is deterministic
(D) the agent has reached a terminal state
(B) the conditional probability table does not change over time
Which of the following is unnecessary to construct a dynamic Bayesian model (DBN)?
(A) The sensor model.
(B) The transition model.
(C) The prior distribution over the state variables.
(D) Multiple state and evidence variables.
(D) Multiple state and evidence variables.
What is the effect of the Markov assumption in n-gram language models?
(A) It makes it possible to estimate the probabilities from data.
(B) Long distance relationships, like subject verb agreement, are taken into account.
(C) The probability of a word is determined by all previous words in the sentence.
(D) The probability of a word is determined only by a single preceding word.
(A) It makes it possible to estimate the probabilities from data.
How are n-gram language models typically evaluated?
(A) Correlation with human judgments
(B) Cross-entropy measured against gold standard labels
(C) Perplexity on a test set
(D) Precision and recall
(C) Perplexity on a test set
T/F
The conditional probability distribution of a variable in a Bayesian network should be specified based on the probability distributions of all of the other variables (nodes).
False
Just need the parents
Write the joint probability for the Bayes’ Net shown below, encoding its independence assumptions into your equation. $$P(A, B, C, D, E) =$$
[$$P(A) \cdot P(B) \cdot P(C|A) \cdot P(D|C,A) \cdot
P(E|C,B)$$
Which of the following is true of locally structured (sparse) systems?
(A) Each subcomponent must interact directly with all the other components.
(B) The structure grows linearly in complexity (rather than exponentially).
(C) Every variable cannot be influenced by all of the others.
(D) All such systems are compact.
(B) The structure grows linearly in complexity (rather than exponentially).
T/F
It is possible that more than one Bayesian network can be used to represent the same joint distribution.
True
If two variables (nodes) X and Y in a Bayesian network do not share a path, which of the following must be true?
(A) X and Y can never be true at the same time.
(B) X and Y are conditionally independent.
(C) X has a direct influence on Y.
(D) There exists a causal relationship between X and Y
(B) X and Y are conditionally independent.
T/F
The Naive Bayes model is “naive” because it assumes that the features are conditionally independent of each other, given the class.
True
Write down the form of the joint probability model $$P(X_1, X_2, X_3, Y )$$ for this data using the Naive Bayes assumption.
$$P(Y)P(X_1|Y)P(X_2|Y)P(X_3|Y)$$
Which of the following can handle unknown contexts (assume words not in the vocabulary are all assigned to the same keyword)?
(Check all that apply.)
[A] Maximum Likelihood Estimate
[B] Stupid backoff
[C] Laplace Smoothing
[D] Smart backoff
[B] Stupid backoff
[C] Laplace Smoothing
Which of the following best characterizes the difference between parametric and nonparametric models?
(A) Parametric models cannot summarize data with a large number of training examples.
(B) Parametric models can be used if each hypothesis considers all of the other training examples to make the next prediction.
(C) Instance-based learning and memory-based learning use parametric models.
(D A parametric model has a fixed size on the number of parameters.
(D A parametric model has a fixed size on the number of parameters.