ExamQuestions Flashcards

(193 cards)

1
Q

What is the difference between Q-learning and SARSA?

A

Both used in reinforcement learning. Q-learning is off-policy, SARSA is on-policy.

Use SARSA: When safety during exploration matters (e.g., robot navigation near obstacles).
Use Q-learning: When learning optimal policy is more important than short-term realism (e.g., games, simulations).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is reproducibility important in AI?

A

To verify and trust scientific results and models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the MEU principle?

A

Maximum Expected Utility – choose the action with highest expected utility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the role of the learning rate α in RL?

A

It determines how much new information overrides old estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a transition matrix?

A

It defines the probabilities of moving from one state to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the importance of AI transparency?

A

To ensure accountability and trust in AI systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is batch normalization?

A

A technique to normalize layer inputs for faster and more stable training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is the decision node not allowed to influence chance nodes?

A

Because decisions don’t change the underlying state, only actions.

In a decision network, a decision node is not allowed to influence a chance node because that would imply the agent controls a random event directly, which violates the principle of modeling uncertainty. The probability that it rains tomorrow shouldn’t change based on the robot’s choice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between hidden and observed variables in HMMs?

A

A Hidden Markov Model (HMM) is a statistical model used to describe systems that evolve over time with hidden internal states that produce observable outputs.

States (S) – The internal, unobservable states of the system (e.g., “sunny” or “rainy” when you are inside).
Observations (O) – What you can see or measure (e.g., someone carrying an umbrella).
Transition Probabilities (P(Sₜ | Sₜ₋₁)) – Probability of moving from one state to another.
Emission Probabilities (P(Oₜ | Sₜ)) – Probability of an observation given a state.
Initial State Distribution (P(S₀)) – Probability of starting in each state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is backpropagation?

A

Backpropagation is the algorithm used to train neural networks.
It tells us how to adjust the weights in the network to reduce the error between predicted and actual output.
An algorithm to compute the gradient of the loss function w.r.t. each weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is it important that a Bayesian network is a DAG?

A

Because it avoids cycles, which would make probability calculations inconsistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are convolutional filters?

A

Small weight matrices applied across an input to detect local features. A convolutional filter is a small matrix (like 3×3 or 5×5) of weights that is slid over an image (or feature map) to detect patterns, like edges, textures, or other features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a heuristic?

A

A heuristic is a strategy or rule-of-thumb that helps an algorithm make decisions faster by estimating how close a state is to the goal. In AI search, a heuristic is a function:
h(n)=estimatedcostfromnodentoagoal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is tokenization?

A

Splitting text into words or sub-word units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is stemming?

A

Reducing words to their root forms. The goal is to group different forms of a word so they can be treated as the same during tasks like text classification or search. Playing -> play

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does observing a collider activate a path?

A

A collider is a node that has two incoming arrows. A → C ← B
Without any observation: A and B are independent.
A and B become dependent given C
Suppose:
A = “burglary”
B = “earthquake”
C = “alarm goes off”
If you know the alarm went off (C), learning there was a burglary (A) makes it less likely there was also an earthquake (B) — they now compete to explain the same event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is algorithmic bias?

A

Systematic errors in decision making due to biased training data.
Algorithmic bias occurs when an algorithm systematically produces unfair, prejudiced, or discriminatory outcomes — usually because it has learned patterns from biased training data or is influenced by biased assumptions in its design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does it mean if an MDP has a stationary policy?

A

Markov Descition Process. A stationary policy is a decision strategy where: The action the agent chooses in each state does not change over time. The best action depends only on the current state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is L2 regularization?

A

L2 regularization adds a penalty term to the loss function to discourage the model from learning large weights. It doesn’t change the goal of minimizing error — it just adds a “cost” to making the model too complex.

A penalty on the squared values of the weights to reduce overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the three main problems solved by HMMs?

A

Evaluation: Compute the probability of an observed sequence, using Forward algorithm

Decoding: Find the most likely sequence of hidden states, using Viterbi algorithm

Learning: Adjust model parameters to best explain observed data, using Baum-Welch algorithm (an EM algorithm)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an expert system?

A

An expert system is a type of AI program designed to replicate the decision-making ability of a human expert in a specific domain. It uses rules, facts, and a reasoning engine to draw conclusions and solve problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an episode in RL?

A

An episode is a complete sequence of interactions between an RL agent and the environment, starting from an initial state and ending in a terminal state (or after a set number of steps).
It’s like one full run of the agent trying to achieve its goal — e.g., finishing a game, navigating a maze, or completing a task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the backward algorithm used for?

A

The backward algorithm is one of the core algorithms used in Hidden Markov Models (HMMs). The backward algorithm is used to compute the probability of the ending portion of the observation sequence, given a current state. If the system is in state s_t at time t, what is the probability of seeing the observations from time t+1 to the end? Computes probability of future observations from a given state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a recurrent neural network (RNN)?

A

A Recurrent Neural Network (RNN) is a type of neural network designed to work with sequential data, such as time series, text, or speech. Unlike standard neural networks, RNNs have memory — they can retain information from previous inputs to help influence future predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is marginalization in probability?
Marginalization is the process of summing (or integrating) out one or more variables to get the probability distribution over a subset of variables. It's used to simplify complex distributions by removing unneeded variables. If you have two variables, X and Y, and you want the marginal probability of X, you sum over all values of Y. P(X=x) = sum_y P(X=x, Y=y).
26
How does a decision network differ from a Bayesian network?
Bayesian network: Purpose: Represent probabilistic relationships among variables. Nodes: Only chance nodes (random variables) Outcome: Probabilities of outcomes Used for: Inference, prediction, diagnosis Decision Network: Purpose: Make decisions under uncertainty based on outcomes and utilities Nodes: Includes chance, decision, and utility nodes Output: Optimal decisions to maximize expected utility Used for: Rational decision-making, planning
27
What is the difference between joint and conditional probability?
Joint is the probability of events together; conditional is given another event.
28
What is explainability in AI?
Explainability (or interpretability) in AI refers to how easily humans can understand why and how an AI system made a particular decision or prediction. 🤖🗣️ It’s about making AI transparent, trustworthy, and accountable to users, developers, and regulators.
29
What is a benefit of symbolic AI in terms of ethics?
Symbolic AI uses: Explicit rules, logic, and symbols Human-readable knowledge bases (like "IF fever AND cough THEN flu") Modern machine learning (like deep learning): Learns patterns automatically Often works like a black box — hard to inspect or explain It is more interpretable and easier to audit. Symbolic AI systems can clearly explain why they made a decision, using human-understandable logic and rules.
30
What is the forward algorithm used for?
he forward algorithm computes the probability of a sequence of observations given an HMM. How likely is it that this observed sequence was generated by this model? To compute the probability of an observed sequence.
31
What is max pooling in CNNs?
A down-sampling operation that keeps the maximum value in each region. Max pooling is a downsampling operation that: Slides a small window (e.g., 2×2) across the feature map At each location, it selects the maximum value within that window It keeps the strongest feature and discards weaker signals in each region.
32
What is the policy iteration algorithm?
The policy iteration algorithm is a classic method in Reinforcement Learning and Markov Decision Processes (MDPs) used to find the optimal policy — that is, the best set of actions an agent can take in every state to maximize expected reward. An algorithm that evaluates and improves a policy until it converges.
33
What is a case base?
A case base is a core concept in Case-Based Reasoning (CBR), a type of AI that solves new problems by remembering and adapting previous similar problems. A case base is a collection of past cases stored in memory. Each case typically includes: A problem description A solution (Optional) Outcome or success rating of the solution
34
What are terminal states in MDPs?
In Markov Decision Processes (MDPs), a terminal state is a special type of state that ends the episode. A terminal state is a state in an MDP where: No further actions are taken. No more rewards are received (typically). The episode ends once the agent enters this state. Example: Maze: exit, Chess: Checkmate or draw States from which no further transitions occur.
35
What is similarity measurement in CBR?
In Case-Based Reasoning (CBR), similarity measurement is a critical step — it determines which past cases are most relevant for solving a new problem. Similarity measurement is the process of comparing a new problem to past cases in the case base to find the most similar ones.
36
What is temporal difference learning?
Temporal Difference (TD) Learning is one of the most important ideas in Reinforcement Learning (RL). It’s the foundation of popular algorithms like Q-learning and SARSA. Temporal Difference learning is a method where an agent learns value estimates by comparing predictions at successive time steps — rather than waiting until the final outcome. At time t, the agent is in state s_t, and takes action a_t, gets reward r_(t+1) and lands in new state s_(t+1). Then it updates its value estimate for s_t based on: The immediate reward and the estimated value of the next state V(s_(t+1))
37
What is adaptation in CBR?
In Case-Based Reasoning (CBR), adaptation is the step where the system modifies a retrieved solution from a past case to make it fit a new problem.
38
What is the purpose of normalization in Bayesian inference?
In Bayesian inference, normalization ensures that the final result is a valid probability distribution — that is, all probabilities add up to 1.
39
What is the chain rule of probability?
P(A, B, C) = P(A)*P(B|A)*P(C|A,B).
40
How can influence diagrams support rational decision making?
Influence diagrams (also known as decision networks) are powerful tools in AI and decision theory that support rational decision-making under uncertainty. By allowing systematic evaluation of expected utility for each decision. Influence diagram might include: 🎲 Chance node: Disease (yes/no), Test Result (positive/negative) 🔷 Decision node: Run Test or Not 💰 Utility node: Patient health, cost of test
41
What is the black box problem in AI?
Difficulty in interpreting how complex models make decisions.
42
What is a long short-term memory (LSTM)?
A Long Short-Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN) designed to remember information over long sequences — something traditional RNNs struggle with. An LSTM is a neural network architecture that can learn long-term dependencies in sequential data by using a system of gates and a memory cell. 🧾 It's designed to preserve information over long sequences and forget irrelevant parts.
43
What is d-separation used for?
To determine whether a set of variables is conditionally independent in a Bayesian network. Does knowing variable A tell me anything more about variable B, once I already know C? If yes, then A and B are dependent given C If no, then A and B are conditionally independent given C → they are d-separated
44
What is a Markov blanket?
The Markov blanket is a key concept in Bayesian networks that tells you exactly which variables you need to know about to make a node conditionally independent from the rest of the network. The Markov blanket of a node X is the smallest set of nodes in a Bayesian network that, when known (observed), makes X conditionally independent of all the other nodes in the network. The set of a node's parents, children, and co-parents of its children.
45
In a Bayesian network, what type of node structure creates a v-structure?
Two parent nodes pointing to a common child.
46
What is the difference between value iteration and policy iteration?
Value iteration and policy iteration are two fundamental algorithms for solving Markov Decision Processes (MDPs). Both aim to find an optimal policy, but they go about it in different ways. Value iteration updates utilities; policy iteration updates policies. Example: Value iteration: You rank every city (state) by how close it is to the destination and then decide what to do. Policy iteration: You start with a full plan (policy), see how well it works (evaluation), then tweak it.
47
When is Bayes’ Rule typically used?
To reverse conditional probabilities when direct measurement is hard. P(X|Y) = P(Y|X) * P(X) / P(Y)
48
What is a language model used for?
To predict the probability of a sequence of words A language model (LM) is a model that learns to assign probabilities to sequences of words. It predicts how likely a given sequence is — or what word is likely to come next.
49
What is named entity recognition?
Named Entity Recognition (NER) is a key task in Natural Language Processing (NLP) that involves identifying and classifying specific pieces of information in text. Detecting entities in a sentence (words or phrases) Classifying them into predefined categories like: 🧑 Person 🌍 Location 🏢 Organization
50
Why is CBR interpretable?
Case-Based Reasoning (CBR) is considered highly interpretable because it solves new problems by referring to concrete, human-understandable past cases — rather than relying on abstract or opaque computations like deep neural networks. Because solutions are based on concrete past examples.
51
What is a utility function?
A function that assigns a numerical value to outcomes representing preferences.
52
Why is discounting used in MDPs?
To prioritize immediate rewards over distant future rewards. In MDPs, we use a discount factor γ∈[0,1] to reduce the weight of future rewards when computing the total expected reward. Total_reward = R(t+1) + γR(t+2) + γ^2R(t+3) ...
53
What is the vanishing gradient problem?
When gradients become too small for deep networks to learn effectively. The vanishing gradient problem is a major challenge in training deep neural networks — especially Recurrent Neural Networks (RNNs) and deep feedforward networks. Occurs when gradients (used to update weights) become extremely small as they are propagated backward through the network during training. As a result, early layers (or earlier time steps in RNNs) learn very slowly or not at all — because the weight updates are near zero. This is especially problematic when using sigmoid or tanh activation functions, because their derivatives are small (max ~0.25), so gradients shrink with each layer or time step.
54
What is Occam's Razor in AI?
Prefer simpler models that explain the data. In AI and machine learning, this means: When building models, choose the simplest model that fits the data well. Occam’s Razor helps AI systems: Avoid overfitting (memorizing noise in training data) Generalize better to unseen data Choose between models with similar performance
55
What is lemmatization?
Technique in Natural Language Processing (NLP). Simplify words while preserving their meaning and grammatical role. Reducing words to their dictionary base form. Unlike stemming, which may just chop off endings, lemmatization uses context (like part of speech) to return the correct, meaningful root word.
56
What is indexing in CBR?
Organizing cases to make retrieval efficient. In Case-Based Reasoning (CBR), indexing is the process of organizing and accessing cases efficiently so the system can quickly retrieve relevant past experiences when faced with a new problem.
57
Why are non-linear activation functions needed?
To allow networks to model complex functions. With only linear functions, no matter how deep the network is, it can only learn linear functions — which severely limits what it can model.
58
What is the Bellman optimality equation for Q-values?
The Bellman optimality equation for Q-values is a central concept in Reinforcement Learning (RL) — it defines how to compute the maximum possible expected return from any given state–action pair under an optimal policy. Q(s,a) = E[r + γ max_a' Q(s',a') | s, a]. Q(s,a) = Q value for a state and action. r = recieved reward γ = discount factor E = expectation for different outcomes if transitions are stochastic
59
What is model interpretability?
How easily humans can understand a model's logic.
60
What is a loss function?
A function that measures the difference between predicted and actual outputs.
61
What are the components of a Hidden Markov Model?
States, observations, transition model, observation model. Hidden states S: What we want to infer (not directly observed) Observations O: What we observe at each time step Transition matrix A: Probabilities of moving between hidden states Emission matrix B: Probabilities of observations given hidden states Initial distribution π: Where the sequence starts
62
What does the Turing Test evaluate?
Whether a machine's behavior is indistinguishable from a human's.
63
How are word embeddings used in spam classification?
Word embeddings are dense vector representations of words that capture semantic meaning and contextual similarity. ✅ Capturing meaning, not just exact words ✅ Grouping similar spammy words together (e.g., “buy now” ~ “purchase today”) ✅ Feeding useful input features into machine learning models They convert words to vectors which are then aggregated and fed to classifiers.
64
What is a decision tree?
A flowchart-like structure used for classification or regression. A decision tree is a supervised machine learning model that makes predictions by splitting data into branches based on feature values — like asking a series of if-then questions. Is "free" in email? / \ Yes No / \ Contains "click"? Not Spam / \ Yes No Spam Not Spam
65
What does it mean for an agent to be rational?
It means the agent chooses actions that maximize expected utility. A rational agent is an agent that always chooses the action that maximizes its expected performance measure, based on: Its current knowledge Its perception of the environment Its goals In other words: It does the right thing, given the information and goals it has.
66
What is a CPT in Bayesian networks?
Conditional Probability Table, specifying probabilities for a node given its parents Nodes represent random variables Edges represent dependencies Each node has a CPT that defines: P(Node|Parents) If a node has no parents, its CPT is just a prior distribution. Burglary P(Alarm = T) P(Alarm = F) | -------- | --------------- | ---------------- | | True | 0.95 | 0.05 | | False | 0.01 | 0.99 | | Burglary | P(Alarm = T) | P(Alarm = F) |
67
Why are CNNs better for image classification than MLPs?
Because of spatial locality, fewer parameters, and shared weights. Unlike Multi-Layer Perceptrons (MLPs), which treat every pixel independently, Convolutional Neural Networks (CNNs): ✅ Understand the spatial layout of images ✅ Use filters to detect meaningful patterns like edges and textures ✅ Reuse weights across the image to reduce complexity and improve learning MLP: One pattern for every cat in every possible position — inefficient and brittle CNN: Learns cat features (ears, whiskers, tail) and recognizes them anywhere in the image — efficient and generalizable
68
What is the goal of weak AI?
To build systems that can perform specific tasks intelligently, without having general understanding or consciousness. It’s about creating software that behaves intelligently in a narrow domain, like recognizing faces, classifying spam, or recommending movies.
69
What is pruning in decision trees?
Pruning removes parts of the tree that are not useful or are too specific to the training data. A fully grown decision tree might perfectly classify training examples but perform poorly on new, unseen data. This is called overfitting. Pruning helps combat this by making the tree smaller and more general.
70
What is a convolutional neural network (CNN)?
A Convolutional Neural Network (CNN) is a type of deep learning model designed to automatically and efficiently learn spatial patterns — especially from images and visual data. A Convolutional Neural Network (CNN) is a neural network that: Uses convolutional layers to detect features (like edges, textures, shapes) Learns to classify, detect, or segment images by processing them in parts Preserves the spatial structure of data (unlike traditional MLPs) 📸 It's the go-to architecture for tasks like image classification, object detection, and face recognition. Main layers: Convolution, ReLU, Pooling, Fully Connected
71
What does it mean if two variables are conditionally independent?
Two variables A and B are said to be conditionally independent given a third variable C if once you know C, knowing A gives you no additional information about B — and vice versa.
72
What is value iteration?
Value iteration is a classic dynamic programming algorithm used in Markov Decision Processes (MDPs) to compute the optimal policy by iteratively improving value estimates for each state. 💡 It’s used to find out: “What is the best thing to do in each state to maximize long-term reward?”
73
Can weak AI pass the Turing test?
Turing test: If a machine can engage in a text conversation such that a human cannot reliably tell whether they’re talking to a machine or a person, it "passes" the test. Yes, weak AI can pass the Turing test. Many modern AI systems — like ChatGPT or other large language models — are examples of weak AI: They don’t understand language like a human. They don’t have goals, consciousness, or emotions. But they can mimic human-like responses extremely well. So even though the intelligence is simulated and narrow, the output may be convincing enough to fool a human — thus passing the test.
74
What is the difference between prior and posterior probability?
Prior is the initial belief; posterior is the updated belief after evidence.
75
How does ID3 build decision trees?
ID3 (Iterative Dichotomiser 3) is a classic algorithm used to build decision trees for classification tasks. It does this by greedily selecting the best feature at each step based on information gain. ID3 is a top-down, greedy algorithm that builds a decision tree by: Choosing the best attribute to split the data at each node Recursively creating child nodes for each possible value of that attribute Stopping when: All examples in the subset belong to the same class There are no more attributes left to split on
76
What is alpha in filtering algorithms?
In the context of filtering algorithms — especially in Bayesian filtering (like in Hidden Markov Models, Kalman filters, or particle filters) — α (alpha) is often used as a normalization constant. Normalizes the probabilities so they sum to 1.
77
What is backward smoothing?
Backward smoothing is a technique used in Hidden Markov Models (HMMs) and Bayesian filtering to improve the estimate of the hidden state at time t, after observing future evidence (up to time T). Backward smoothing uses both past and future observations to give a better estimate of a hidden state than filtering alone. (Filtering only uses past and current experiences)
78
What is the difference between strong and weak AI?
Strong AI is conscious and self-aware; weak AI is not.
79
What is information gain?
Information gain is a key concept in decision tree learning (like ID3) that tells us which attribute is most useful for splitting the data at each step. Information gain measures how much uncertainty (entropy) is reduced after splitting a dataset on a particular attribute.
80
What is a perceptron?
A perceptron is the simplest type of artificial neural network, originally proposed by Frank Rosenblatt in 1958, designed to mimic a single neuron in the human brain. It’s a binary classifier that makes decisions by weighing input signals, applying a threshold, and outputting either 0 or 1.
81
What is the difference between classification and regression?
Both classification and regression are types of supervised learning, where you train a model using labeled data. The main difference lies in the type of output they predict: Classification predicts a category or label, like spam or not spam. Regression predicts continuous numeric values, like house price.
82
What is prediction in HMM?
Prediction in HMM refers to the process of computing the most likely future hidden state(s) based on past observations — without yet seeing future evidence.
83
What are pooling layers in CNNs?
Pooling layers are a key component of Convolutional Neural Networks (CNNs) that are used to reduce the spatial dimensions (width and height) of the feature maps while preserving important information. ✅ 1. Max Pooling (most common) Takes the maximum value in each window Highlights the most prominent feature ✅ 2. Average Pooling Takes the average value in the window Smoother output, but may dilute sharp features Typical architecture: Input image → Convolution → ReLU → **Pooling** → Convolution → ReLU → Pooling → Fully Connected → Output
84
What are activation functions?
In a neural network, an activation function determines the output of a neuron given its input — it introduces non-linearity into the model, allowing it to learn complex patterns. ReLU: max(0,x) Most common, fast, avoids vanishing gradients Sigmoid: Output between 0 and 1, used in binary classification Tanh: Output between -1 and 1, zero-centered Softmax: Turns outputs into probabilities — for multi-class classification
85
What is Q-learning?
Q-learning is a fundamental reinforcement learning (RL) algorithm that teaches an agent how to act optimally in an environment by learning the value of actions without knowing the environment’s dynamics. Imagine a robot in a maze: It tries random moves (exploration) It gets rewards when it finds good outcomes (e.g., reaching the goal) It updates its Q-table with better estimates over time Eventually, it learns the best action for each state Q-learning: Model-free RL algorithm to learn optimal actions Learns: Q-values: how good each action is in each state Goal: Maximize long-term cumulative reward Used in: Games (like Atari, chess), robotics, navigation, decision-making
86
What is a policy in MDPs?
In an MDP (Markov Decision Process), a policy defines the agent’s behavior — that is: 🔁 A policy π is a mapping from states to actions. It tells the agent what action a to take when in state s. Two types of policies: Deterministic: Always choose same action in state. In state s1 choose a1. Stochastic: Choose action based on probability. In state s1, 20% a1, 80% a2
87
What is semi-supervised learning?
Learning from a small amount of labeled data and a large amount of unlabeled data. Labeled data is costly; unlabeled is cheap and plentiful
88
What are the four types of AI agents?
Simple reflex: A vacuum cleaner that turns left if it sees a wall. Model-based reflex: A robot that knows the room layout and updates where it thinks it is, even when it can't see everything. Goal-based: A GPS system finding the fastest route to a destination. Utility-based: A self-driving car balancing speed, safety, and passenger comfort.
89
What is entropy in decision trees?
In decision trees, entropy is a measure of impurity or uncertainty in a dataset. It tells us how mixed the class labels are at a given node. 🎯 The goal in building a decision tree is to split the data in a way that reduces entropy — meaning the split leads to purer subsets.
90
How is a joint probability distribution computed in a Bayesian network?
A Bayesian network represents the joint probability distribution (JPD) over a set of variables using a directed acyclic graph (DAG). You multiply the conditional probabilities of each variable given its parents. Imagine a Bayesian network with 3 nodes: A: no parents B: parent is A C: parents are A and B Then the joint probability is: P(A,B,C)=P(A)⋅P(B∣A)⋅P(C∣A,B) As the product of the conditional probabilities of each variable given its parents.
91
What is a language model?
A language model (LM) is a model that learns to assign probabilities to sequences of words. It predicts how likely a given sequence is — or what word is likely to come next.
92
What are the three canonical structures for conditional independence?
Chain, common cause, common effect (v-structure).
93
What makes a neural network 'deep'?
Having multiple hidden layers.
94
What are some ethical concerns in AI?
Bias, surveillance, job loss, transparency, misuse in weapons.
95
What is the retain step in CBR?
The retain step is the final phase of the Case-Based Reasoning (CBR) cycle, where the system stores the new problem-solving experience so it can use it in the future. The full CBR cycle has four main steps: Retrieve — Find similar past cases Reuse — Adapt the solution to the new problem Revise — Test and improve the proposed solution Retain — Store the new solution as a new case.
96
What is the update rule in Q-learning?
Q(s,a) ← Q(s,a) + α[r + γ max_a' Q(s',a') − Q(s,a)] α = learning rate γ = discount factor
97
What is a Bayesian network?
A graphical model that represents probabilistic relationships among variables.
98
What is filtering in HMM?
Filtering in an HMM is the process of computing the probability distribution over the current hidden state given all evidence (observations) up to now. Given everything I've observed so far, what is the most likely state I'm in right now In an HMM: Hidden states (like weather conditions) are not directly observable. We only see evidence (like whether someone is carrying an umbrella). Filtering helps us maintain a belief state — a probability distribution over possible current states — as new evidence arrives.
99
What is the explore-exploit dilemma?
The trade-off between exploring new actions and exploiting known good ones. If the agent only exploits: ✅ It gets good rewards now ❌ But it might miss even better actions it never tried If the agent only explores: ✅ It might find the best action eventually ❌ But it wastes time (and reward) trying bad ones
100
What is the Chinese Room argument?
A thought experiment against strong AI asserting that syntax alone isn't understanding. Imagine: You're locked in a room. You don’t speak Chinese at all. You’re given a book of rules (a program) that tells you: "When you see this Chinese symbol, write this other one in response." Native Chinese speakers slip questions into the room in Chinese. You follow the rules and send back perfectly written answers — in Chinese. From the outside, it looks like you understand Chinese. But you don’t — you’re just manipulating symbols.
101
What does the learning rate control?
The learning rate controls how much the model's weights are updated during training in response to the error it sees. Specifically, it determines the step size at each iteration of the optimization algorithm (typically gradient descent) when moving toward a minimum of the loss function. Too high a learning rate can cause the model to overshoot the minimum, potentially leading to divergence or erratic behavior. Too low a learning rate makes training very slow and might get stuck in local minima. It's a key hyperparameter that affects both the speed and stability of training.
102
What is supervised learning?
Learning from labeled data.
103
What causes overfitting in decision trees?
Using too many splits or fitting noise in the training data. Overfitting in decision trees happens when the tree learns not just the general patterns in the training data, but also the noise and specific quirks. This leads to excellent performance on the training set but poor generalization to new, unseen data. Common causes include: Tree depth is too large – the tree keeps splitting until each leaf has very few samples (or even just one), capturing noise. No pruning – without pruning, the tree doesn't remove branches that add complexity without improving accuracy. Low minimum samples per leaf or split – if the model is allowed to split even on very small data subsets, it can memorize the training data. Too many features – especially with irrelevant or redundant features, the tree can make splits that are too specific. No regularization – lack of constraints like max_depth, min_samples_split, or min_samples_leaf.
104
What is expected utility?
Expected utility is a concept from decision theory that represents the average value of an outcome, weighted by the probability of each possible outcome and the utility (or value) the decision-maker assigns to those outcomes. Expected Utility=∑(Probability of Outcome×Utility of Outcome)
105
What is a word embedding?
A dense vector representing the semantic meaning of a word. A word embedding is a way to represent words as dense vectors of real numbers in a continuous vector space, where semantically similar words are mapped to nearby points. Unlike one-hot encoding (which is sparse and doesn't capture meaning), word embeddings capture relationships and context between words. For example, in a well-trained embedding space: vector("king") - vector("man") + vector("woman") ≈ vector("queen") Popular word embedding models include: Word2Vec GloVe FastText
106
What is inference in Bayesian networks?
Inference in Bayesian networks is the process of computing the probability of one or more variables given evidence about others. In other words, it's about updating beliefs based on observed data using the network's structure and conditional probabilities. If a Bayesian network models a medical diagnosis and you observe that a patient has a cough, inference can help compute the probability they have the flu, given that observation. Computing the posterior distribution of a variable given evidence.
107
What is the Bellman equation?
The Bellman equation is a fundamental recursive relationship in reinforcement learning and dynamic programming. It describes how the value of a state (or state-action pair) is related to the values of successor states. The value of a state equals the expected immediate reward plus the discounted value of the next state.
108
What are utility and probability used for in rational agents?
They are used to make decisions under uncertainty. In rational agents, utility and probability are used together to guide decision-making under uncertainty: Probability is used to model uncertainty about the world — i.e., how likely different outcomes are. Utility is used to express preferences — i.e., how desirable each outcome is to the agent. A rational agent chooses actions to maximize expected utility, which combines these two: Expected Utility=∑(Probability of outcome)×(Utility of outcom
109
What is the most likely explanation in an HMM?
The sequence of states that most likely generated the observations. The Viterbi algorithm uses dynamic programming to avoid recomputing subproblems and finds the single path through the HMM that has the highest joint probability of both the state sequence and the observations.
110
What is the revise step in CBR?
The proposed solution is tested and possibly corrected. In Case-Based Reasoning (CBR), the revise step is where the proposed solution from a retrieved and adapted past case is evaluated and potentially improved before being retained for future use. Specifically, in the CBR cycle (Retrieve, Reuse, Revise, Retain), the Revise step involves: Testing the proposed solution (e.g. in the real world or a simulation), Detecting errors or mismatches between the solution and the actual outcome, Correcting the solution if needed. This step ensures that the final solution is accurate and suitable before it is learned from (i.e., retained in the case base).
111
What happens during reuse in CBR?
The solution of the retrieved case is adapted to the new case. During the reuse step in Case-Based Reasoning (CBR), the system takes the solution from the most relevant past case(s) and adapts it to fit the new problem. Specifically, reuse involves: Extracting the solution from the retrieved case(s), Modifying or adapting that solution if the new problem differs in important ways, Producing a proposed solution to be tested in the revise step. For example, in a troubleshooting system: If a similar past case fixed a network issue by restarting the router, reuse might propose the same action — unless the current case has a different network setup, in which case it might need to adapt the solution (e.g., restart a switch instead).
112
How does RL differ from MDPs?
In RL, the transition model and rewards are unknown and learned through experience. Reinforcement Learning (RL) and Markov Decision Processes (MDPs) are closely related but not the same: 🔁 MDP: A formal model that describes decision-making in environments with: States Actions Transition probabilities Rewards Assumes full knowledge of the model (i.e., transition and reward functions are known). Used to define the problem. 🤖 RL: A learning framework used to solve MDPs when the model is unknown. The agent learns: What actions to take (policy) From interacting with the environment (trial and error) Estimates transition and reward functions or learns value functions/policies directly.
113
What is gradient descent used for in neural nets?
To update weights in order to minimize error. Gradient descent is used in neural networks to optimize the model’s weights by minimizing the loss function — which measures how far the network's predictions are from the actual targets. Here's how it works: Compute the loss (e.g., cross-entropy or MSE) for a batch of training data. Calculate the gradient of the loss with respect to each weight (using backpropagation). Update each weight in the direction that reduces the loss: w←w−η *∂w/∂L ​
114
What is the sum rule in probability?
P(A) = Σ P(A, B) over all B.
115
What is a Markov Decision Process?
A model for sequential decision making with states, actions, transition probabilities, and rewards. A Markov Decision Process (MDP) is a formal framework used to model decision-making in environments where outcomes are partly random and partly under the control of a decision-maker (agent). MDPs satisfy the Markov property: The future is independent of the past given the present. In other words, the next state and reward depend only on the current state and action, not on the full history.
116
What is overfitting?
When a model performs well on training data but poorly on unseen data.
117
What is the goal of reinforcement learning?
To learn a policy that maximizes expected cumulative reward. The goal of reinforcement learning (RL) is to train an agent to learn a policy — a strategy for choosing actions — that maximizes cumulative reward over time while interacting with an environment. The agent learns through trial and error, using feedback from the environment (in the form of rewards or punishments) to improve its behavior over time — without knowing the environment's dynamics in advance.
118
What is the role of transition models?
To specify the probability of moving between states. The role of transition models is to describe how an environment changes in response to actions — specifically, they define the probability of moving to a new state given the current state and action. Formally, in a Markov Decision Process (MDP), the transition model is: P(s ′ ∣s,a) In model-free RL, the agent learns without explicitly using a transition model. In model-based RL, the agent tries to learn or is given the transition model to plan more efficiently.
119
In a chain structure A -> B -> C, is A independent of C given B?
Yes.
120
What is a utility node?
A node that quantifies the agent's preferences. A utility node is a component in a decision network that represents the agent’s preferences over outcomes by assigning numerical utility values to different states of the world.
121
What is dropout?
Dropout is a regularization technique used in neural networks to prevent overfitting during training. During each training step, randomly selected neurons are “dropped out” (i.e., temporarily removed) from the network with a certain probability (e.g., 0.5). This means those neurons don’t contribute to the forward pass or backpropagation in that step. At test time, no neurons are dropped, but their outputs are scaled to account for the dropout during training.
122
What is the purpose of a decision node?
A decision node in a decision network (or influence diagram) represents a choice the agent can make — that is, a point where the agent selects an action based on the available information. Key characteristics: Typically shown as a rectangle in diagrams. Has no probabilities (unlike chance nodes). Inputs: Can take information from chance nodes or other decisions (what the agent knows when making the choice). Output: Feeds into utility nodes (to evaluate the consequences of the decision).
123
What is unsupervised learning?
Finding patterns in unlabeled data.
124
What is TF-IDF?
TF-IDF (Term Frequency–Inverse Document Frequency) is a numerical statistic used in information retrieval and text mining to reflect how important a word is to a document in a collection (corpus). Words that appear frequently in a single document but rarely across others (like "robot" in a robotics paper) get high TF-IDF scores — meaning they are more informative. Common words like "the" or "and" get low scores because they appear in many documents.
125
What are decision networks?
Bayesian networks extended with decision and utility nodes.
126
What is the discount factor?
A factor γ ∈ [0,1] that determines how future rewards are valued.
127
What is the conditional independence assumption?
That certain variables are independent given others, reducing the number of probabilities needed.
128
Why are word embeddings better than one-hot encoding?
Word embeddings are better than one-hot encoding for most NLP tasks because they provide dense, meaningful, and scalable representations of words. Here's why: One-hot encoding: Words are represented as binary vectors with a single 1 and the rest 0s — no information about meaning or similarity. Example: "cat" and "dog" are just as unrelated as "cat" and "laptop". Word embeddings: Words are mapped to dense vectors where similar words have similar values (e.g., "cat" and "dog" are close in the vector space).
129
In a v-structure A -> C <- B, are A and B independent given C?
No, they become dependent when C is observed.
130
What is the product rule in probability?
P(A, B) = P(A|B) * P(B).
131
Why is fairness important in AI?
To ensure that decisions do not systematically disadvantage any group. Fairness is important in AI because AI systems increasingly make decisions that directly impact people's lives, and unfair or biased systems can cause real harm — including discrimination, exclusion, or unequal treatment.
132
What is the Viterbi algorithm used for?
The Viterbi algorithm is used to find the most likely sequence of hidden states (also called the best path) in a Hidden Markov Model (HMM), given a sequence of observed events. Common applications: Speech recognition Part-of-speech tagging DNA sequence analysis Spell checking and error correction What it does: Given: A sequence of observations (e.g., sounds or words), An HMM with known transition and emission probabilitiesThe Viterbi algorithm finds the single best state sequence S = (s1, s2, s3, ...) that maximizes the joint probability: arg max_s P(S,O) Where O is the observed sequence. It uses dynamic programming to avoid recomputing overlapping subproblems, making it much faster than brute-force enumeration of all possible state sequences.
133
What is a Bayesian network?
A Bayesian network is a type of probabilistic graphical model that uses a directed acyclic graph (DAG) to represent a set of random variables and their conditional dependencies. Each node represents a variable, and each edge represents a direct influence from one variable to another. These networks are used to compute probabilities efficiently by leveraging conditional independencies among variables.
134
What is conditional independence in Bayesian networks?
Two variables are conditionally independent given a third if knowing the third makes the first two independent. In Bayesian networks, this means that once you observe the 'middle' variable on a path (unless it's a collider), the information from one end of the path doesn't influence the other. Conditional independence is critical for simplifying joint probability computations.
135
What is the difference between on-policy and off-policy reinforcement learning?
On-policy learning means the agent learns the value of the policy it is currently using to make decisions (like SARSA). Off-policy learning, like Q-learning, means the agent learns the value of an optimal policy regardless of the policy it is actually using. This allows Q-learning to learn optimal strategies even while exploring with a different behavior.
136
What is utility in decision theory?
Utility is a numeric value that represents the desirability or preference for a particular outcome. In AI, utility functions help agents evaluate and compare outcomes, guiding rational decision-making under uncertainty. Higher utility corresponds to more preferred outcomes.
137
What is the CBR (Case-Based Reasoning) cycle?
CBR involves solving new problems based on solutions to past similar problems. The cycle includes four steps: (1) Retrieve the most similar case(s), (2) Reuse the solution of the retrieved case, (3) Revise the proposed solution if needed, and (4) Retain the new experience for future use. It is a model of learning from experience.
138
What is overfitting in machine learning?
Overfitting happens when a model learns patterns specific to the training data, including noise, rather than general patterns. This results in poor performance on unseen data. It's like memorizing answers for a test rather than understanding the material. Techniques to prevent overfitting include regularization, dropout, cross-validation, and using more training data.
139
What is an HMM (Hidden Markov Model)?
An HMM is a statistical model used to represent systems that are Markov processes with hidden states. It assumes that the system being modeled is a sequence of observations generated by hidden states, each of which follows a Markov process (only depends on the previous state). HMMs are widely used in speech recognition, bioinformatics, and time series analysis.
140
What is Q-learning?
Q-learning is a model-free reinforcement learning algorithm. It learns a Q-value function that estimates the expected cumulative reward for taking a given action in a given state, and following the best policy thereafter. It updates its estimates based on the Bellman equation, using the maximum reward of the next state, even if the current action is exploratory.
141
What is a decision tree?
A decision tree is a flowchart-like structure used in supervised learning to make decisions or classifications. Each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or output. It works by recursively partitioning the data into subsets that are as pure as possible.
142
What is reinforcement learning?
Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. It receives rewards or penalties for its actions and aims to learn a policy that maximizes cumulative reward over time. Unlike supervised learning, it doesn't get direct feedback on the correct action but must learn from trial and error.
143
What is a Hidden Markov Model (HMM) used for?
A Hidden Markov Model is used to model systems where the underlying process (state) is not directly observable (hidden), but we have observations that are probabilistically related to these states. It is especially useful in time-series data like speech recognition, where the actual spoken words (states) are inferred from audio signals (observations). HMMs use two key probabilities: transition probabilities between states and emission probabilities for observations.
144
What does the Bellman equation express in reinforcement learning?
The Bellman equation defines the relationship between the value of a state and the expected return of future states. It breaks down the value of a state into the immediate reward received from the current action and the discounted future value. This recursive structure allows dynamic programming methods like value iteration to efficiently compute optimal policies in Markov Decision Processes (MDPs).
145
What is filtering in the context of HMMs?
Filtering is the task of computing the probability distribution over the current hidden state given all past observations. It is useful for tracking or monitoring applications. The forward algorithm is commonly used to perform filtering efficiently by propagating beliefs through time using the transition and observation models.
146
How does value iteration work in MDPs?
Value iteration is an algorithm that updates the utility of each state using the Bellman equation until the values converge. It starts with arbitrary utilities and repeatedly updates each state’s value based on expected utilities of successor states and rewards. Once the values stabilize, the optimal policy can be derived by choosing actions that maximize expected utility at each state.
147
What is an influence diagram?
An influence diagram is an extension of a Bayesian network that includes decision nodes and utility nodes. It is used for decision-making under uncertainty. Chance nodes represent random variables, decision nodes represent choices, and utility nodes represent preferences. The diagram encodes dependencies and helps compute expected utilities to guide rational decisions.
148
What is the difference between supervised and unsupervised learning?
In supervised learning, the model is trained on labeled data where the output is known, such as classification or regression tasks. In unsupervised learning, the model tries to find patterns or structures in data without labeled outputs, such as clustering or dimensionality reduction. Supervised learning is used when prediction is the goal, while unsupervised learning is used for exploration and understanding data structure.
149
What is the exploration vs. exploitation trade-off in reinforcement learning?
This trade-off refers to the dilemma of choosing between exploring new actions to discover better rewards and exploiting known actions that yield high rewards. Effective RL strategies need to balance these two approaches to avoid getting stuck in suboptimal behavior. Common solutions include ε-greedy policies or algorithms like Upper Confidence Bound (UCB).
150
What is a Markov Decision Process (MDP)?
An MDP is a mathematical framework for modeling decision-making problems where outcomes are partly random and partly under the control of an agent. It consists of states, actions, transition probabilities, a reward function, and a discount factor. The goal is to find a policy that maximizes the expected sum of rewards over time.
151
What is the purpose of dropout in neural networks?
Dropout is a regularization technique used during training of neural networks to prevent overfitting. It randomly 'drops out' a proportion of neurons during each training step, forcing the network to learn redundant representations. This helps ensure the model generalizes better to new data by preventing it from relying too heavily on specific neurons.
152
What is a convolutional neural network (CNN)?
A CNN is a deep learning model designed for processing data with a grid-like topology, such as images. It uses convolutional layers to apply filters that detect local patterns like edges or textures, followed by pooling layers to reduce spatial dimensions. CNNs are efficient in learning hierarchical features and are widely used in computer vision tasks.
153
What are word embeddings in NLP?
Word embeddings are dense vector representations of words where words with similar meaning have similar representations. Unlike one-hot encoding, which is sparse and does not capture relationships, embeddings like Word2Vec or GloVe capture semantic and syntactic similarities and allow algorithms to understand context and meaning in text data.
154
What is the role of the discount factor γ in reinforcement learning?
The discount factor determines how much future rewards are valued compared to immediate rewards. A value close to 1 means future rewards are nearly as valuable as immediate ones, promoting long-term planning. A value near 0 emphasizes short-term gains. It ensures the sum of future rewards is finite and reflects time preference in decision making.
155
What is a utility function in AI decision making?
A utility function assigns a numeric value to each possible outcome, reflecting the agent’s preference for that outcome. Higher utility means the outcome is more desirable. In decision theory, agents use utility functions to evaluate options and choose actions that maximize expected utility, ensuring rational and goal-directed behavior.
156
How does SARSA differ from Q-learning?
SARSA (State-Action-Reward-State-Action) is an on-policy algorithm that updates its value estimates based on the actual action taken in the next state. Q-learning is off-policy and updates based on the best possible action in the next state, regardless of which action was actually taken. SARSA tends to be more conservative and safer, especially in risky environments.
157
What is the Chinese Room argument?
The Chinese Room argument, proposed by philosopher John Searle, challenges the notion that a computer running a program can understand language or possess a mind. In the thought experiment, a person manipulates Chinese symbols using a rule book without understanding the language, suggesting that syntactic processing (like AI) does not equal semantic understanding.
158
What is the Turing Test and what does it evaluate?
The Turing Test, proposed by Alan Turing, is a test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human. If a human evaluator cannot reliably tell whether responses come from a machine or a person, the machine is considered to have passed the test. It evaluates behavioral intelligence, not consciousness or understanding.
159
What is smoothing in HMMs?
Smoothing refers to estimating the hidden state at a previous time step, given the entire sequence of observations. Unlike filtering, which only uses past and current observations, smoothing incorporates future observations to make better estimates of past states. The forward-backward algorithm is commonly used for this purpose.
160
What is the most likely explanation in HMMs?
The most likely explanation is the sequence of hidden states that is most probable given the entire sequence of observations. It is computed using the Viterbi algorithm, which efficiently finds the single best path through the state space rather than the most probable state at each time step.
161
What are the key steps of the gradient descent algorithm?
Gradient descent involves computing the gradient (partial derivatives) of the loss function with respect to model parameters, then updating the parameters in the direction that reduces the loss. It’s an iterative process used to minimize error in machine learning models by adjusting weights using a learning rate.
162
What is backpropagation in neural networks?
Backpropagation is an algorithm used to compute the gradient of the loss function with respect to each weight in the network. It works by propagating errors backward from the output layer to the input layer, applying the chain rule of calculus. It enables efficient training of multi-layer networks via gradient descent.
163
What is entropy in the context of decision trees?
Entropy is a measure of impurity or uncertainty in a dataset. In decision trees, it's used to determine how mixed the classes are within a dataset. If a dataset contains only one class, its entropy is 0, meaning it is pure. The goal of splitting data in decision trees is to reduce entropy, creating branches that are as pure as possible.
164
What is information gain in decision trees?
Information gain measures the reduction in entropy achieved by splitting a dataset based on a particular attribute. It helps select the attribute that best separates the data into different classes. The attribute with the highest information gain is chosen for splitting at each node in the tree-building process.
165
What is pruning in decision trees?
Pruning is the process of removing nodes or branches from a decision tree to reduce its complexity and improve generalization. This helps prevent overfitting, which can occur when the tree memorizes noise in the training data. Pruning can be done during or after training using techniques like cost complexity pruning or reduced error pruning.
166
What is a perceptron?
A perceptron is a basic type of neural network unit used for binary classification. It computes a weighted sum of inputs, applies an activation function (like the sign function), and outputs either +1 or -1. If the data is linearly separable, the perceptron learning algorithm can find a separating hyperplane that classifies the data correctly.
167
What is a learning rate in neural networks?
The learning rate is a hyperparameter that controls how much the model's weights are adjusted in response to the calculated error each time they are updated. If the learning rate is too high, the model may overshoot the optimal weights. If it is too low, training can be very slow or get stuck in local minima.
168
What is the difference between classification and regression?
Classification is the task of predicting a categorical label (e.g., spam or not spam), whereas regression predicts a continuous quantity (e.g., price of a house). Both are types of supervised learning, but use different loss functions and evaluation metrics suited to their problem type.
169
What is regularization in machine learning?
Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function. This discourages the model from learning overly complex patterns. Common types include L1 regularization (Lasso), which promotes sparsity, and L2 regularization (Ridge), which penalizes large weights.
170
What is the curse of dimensionality?
The curse of dimensionality refers to the challenges that arise when working with high-dimensional data. As the number of dimensions increases, data becomes sparse, distances between points become less meaningful, and algorithms may struggle to generalize. This makes learning and visualization more difficult without dimensionality reduction techniques.
171
What is dimensionality reduction?
Dimensionality reduction is the process of reducing the number of input variables in a dataset while preserving as much information as possible. Techniques like Principal Component Analysis (PCA) or t-SNE are used to simplify data, improve visualization, and make machine learning algorithms more efficient.
172
What is Principal Component Analysis (PCA)?
PCA is a technique used for dimensionality reduction that transforms the original features into a new set of uncorrelated features (principal components). These components are ordered by the amount of variance they explain in the data. PCA helps reduce complexity while retaining the most important information.
173
What is a value function in reinforcement learning?
A value function estimates how good it is for an agent to be in a given state (or to take a specific action in a state), in terms of expected future rewards. It helps guide the agent's decisions by indicating which states or actions lead to higher cumulative rewards over time.
174
What is policy evaluation in reinforcement learning?
Policy evaluation is the process of determining the expected return (value) of each state under a specific policy. This involves calculating the value function by averaging the rewards and transitions when following the policy, and is a key step in methods like policy iteration.
175
What is policy improvement in reinforcement learning?
Policy improvement is the process of using the value function of a current policy to derive a better policy. The agent chooses actions that maximize the expected value in each state, effectively creating a new policy that is guaranteed to perform at least as well as the old one.
176
What is a greedy policy in reinforcement learning?
A greedy policy is one that always selects the action with the highest estimated value in a given state. While this strategy can quickly find good actions, it may miss better long-term strategies due to lack of exploration. It's often combined with exploration strategies like ε-greedy.
177
What is ε-greedy strategy?
The ε-greedy strategy balances exploration and exploitation by choosing the best known action most of the time (with probability 1-ε), and exploring a random action with probability ε. This prevents the agent from getting stuck with suboptimal policies by ensuring occasional exploration.
178
What is an activation function in neural networks?
An activation function introduces non-linearity into a neural network, allowing it to learn complex patterns. Common activation functions include ReLU, sigmoid, and tanh. Without them, the network would be equivalent to a linear model and unable to solve problems like image recognition or natural language understanding.
179
What is the role of the softmax function?
Softmax is an activation function commonly used in the output layer of classification networks. It converts raw output scores (logits) into probabilities that sum to 1. Each score is exponentiated and divided by the sum of all exponentiated scores, allowing the network to predict class probabilities.
180
What is the vanishing gradient problem?
The vanishing gradient problem occurs when gradients become very small during backpropagation, especially in deep networks. This causes weights in earlier layers to update very slowly, hindering learning. It often occurs with sigmoid or tanh activations and is mitigated by using ReLU or residual connections.
181
What is a recurrent neural network (RNN)?
An RNN is a type of neural network designed to handle sequential data by maintaining a hidden state that captures information from previous time steps. This makes them suitable for tasks like language modeling, time-series prediction, and speech recognition. However, they struggle with long-range dependencies.
182
What are long short-term memory (LSTM) networks?
LSTMs are a special kind of RNN designed to remember information for longer periods. They include memory cells and gating mechanisms that regulate the flow of information, allowing them to overcome the vanishing gradient problem and capture long-term dependencies in sequences like text or speech.
183
What is the main idea behind deep learning?
Deep learning uses multi-layered neural networks to learn complex representations of data. Each layer learns increasingly abstract features, from simple edges in early layers to high-level concepts in deeper ones. This allows deep learning models to achieve high performance on tasks like image and speech recognition.
184
What is case-based reasoning (CBR)?
Case-based reasoning is a problem-solving approach where new problems are solved by adapting solutions from similar past problems. Instead of learning general rules from data, CBR stores and reuses specific experiences. This makes it useful in domains where human-like reasoning based on past cases is important, such as legal or medical decision-making.
185
What is the difference between model-free and model-based reinforcement learning?
Model-free RL learns policies or value functions directly from experience, without building a model of the environment. Examples include Q-learning and SARSA. Model-based RL, on the other hand, involves learning or using a model of the environment's dynamics to plan ahead and simulate outcomes, which can lead to more sample-efficient learning.
186
What is a utility node in a decision network?
A utility node represents the agent’s preferences over possible outcomes in a decision network. It assigns numerical utility values to different outcomes, allowing the agent to compare and choose between them rationally. The goal is to select decisions that maximize expected utility based on probabilities and outcomes.
187
What is a decision node in a decision network?
A decision node represents a point where the agent must choose between different actions or options. It has no parent nodes because it is under the agent's control, but it influences future chance and utility nodes. The agent selects the decision that leads to the highest expected utility.
188
What is a chance node in a Bayesian or decision network?
A chance node represents a random variable whose value is not controlled by the decision-maker but is determined probabilistically. Its conditional probabilities are defined in a Conditional Probability Table (CPT), and its value affects other nodes in the network.
189
What is the difference between a belief state and a state in RL?
A state in reinforcement learning is the complete representation of the environment at a given time. A belief state, used in partially observable environments (POMDPs), is a probability distribution over possible actual states, representing the agent’s uncertainty about the current situation.
190
What is the forward algorithm used for in HMMs?
The forward algorithm is used to compute the probability of an observed sequence of events in an HMM. It efficiently sums over all possible hidden state sequences using dynamic programming, avoiding the need to enumerate all paths explicitly. It is used in filtering and sequence evaluation.
191
What is an emission probability in HMMs?
An emission probability is the probability of observing a particular evidence or output given a specific hidden state. It models how the hidden states produce observations and is a key part of the observation model in HMMs.
192
Why are CNNs more efficient than fully connected networks for image data?
CNNs take advantage of spatial locality in images by using shared filters across different regions of the image. This drastically reduces the number of parameters compared to fully connected layers, allowing for faster training, better generalization, and the ability to learn spatial hierarchies of features.
193
What is transfer learning in deep learning?
Transfer learning involves using a model trained on one task and adapting it to a different but related task. It is especially useful when data for the target task is limited. A common approach is to take a pre-trained deep network and fine-tune it on new data, leveraging previously learned features.