Artificial Intelligence (Written Midterm) Flashcards

Question

Fill in the blank: The _______ is a function used for performance measurement in a search problem.

Answer 1

[cost function]

Answer 2

False ## Footnote The goal state can be a set of one or more target states.

Answer 3

P (Polynomial) problems ## Footnote Example: Sorting an unordered list (e.g., Quicksort)

Answer 4

O(n^k), where k is a constant ## Footnote Examples: O(n), O(n^2), O(n^3), O(1)

Answer 5

Nondeterministic Polynomial ## Footnote It involves a nondeterministic Turing Machine that guesses a solution in one step, which can be verified in polynomial time.

Answer 6

Verifying a Sudoku solution ## Footnote This demonstrates that solutions can be verified in polynomial time.

Answer 7

P ⊆ NP ## Footnote This means all problems that can be solved in polynomial time can also be verified in polynomial time.

Answer 8

Problems that are not necessarily in NP, but can be reduced to any NP problem ## Footnote Their solutions are not guaranteed to be verified quickly.

Answer 9

The Halting problem ## Footnote This problem is notable for not being in NP.

Answer 10

Problems that are both in NP and NP-Hard ## Footnote They can be verified quickly and can be reduced to any other NP problem in polynomial time.

Answer 11

If an NP-Complete problem can be solved in polynomial time, then all NP problems can be solved in polynomial time ## Footnote This is a critical concept in computational theory.

Answer 12

The Traveling Salesman Problem (TSP) ## Footnote This problem is well-known for its complexity and applications in optimization.

Answer 13

NP-Complete problems are both verifiable in polynomial time and NP-Hard ## Footnote NP problems can be verified quickly, while NP-Hard problems may not be.

Answer 14

* Completeness * Optimality * Time Complexity * Space Complexity ## Footnote Completeness checks if a solution is found, optimality assesses if it's the best solution, time complexity measures how long it takes, and space complexity measures memory usage.

Answer 15

All nodes in the state space level by level ## Footnote BFS is guaranteed to find a solution if one exists.

Answer 16

O(b^d), where b is the branching factor and d is the depth ## Footnote This complexity indicates how the algorithm scales with the size of the search space.

Answer 17

Explores a branch deeply before backtracking ## Footnote DFS is faster but does not always guarantee an optimal solution.

Answer 18

Time complexity: O(b^m), Space complexity: O(bm), where m is the maximum depth ## Footnote These complexities reflect how the algorithm's performance varies with the maximum depth.

Answer 19

Reduces the search space size and provides more efficient solutions for complex problems ## Footnote Heuristic methods help streamline the search process.

Answer 20

An estimate that measures the distance between the current state and the goal state ## Footnote Heuristics guide the search towards more promising paths.

Answer 21

The estimated cost of the shortest path from the current state n to a goal state ## Footnote This estimation is crucial for evaluating potential paths in the search space.

Answer 22

Either true or false ## Footnote This means it operates under binary logic.

Answer 23

A numerical belief ## Footnote Ranges from 0 (certainly false) to 1 (certainly true).

Answer 24

Lacks complete information about the environment

Answer 25

Outcomes of actions are unpredictable

Answer 26

Other agents or dynamic elements actively oppose or interfere with the agent’s goals

Answer 27

The set of all possible world states

Answer 28

All possible sensor reports

Answer 29

A utility value

Answer 30

States with higher utility

Answer 31

Make the best available decision, not just survive

Answer 32

Always chooses the action that results in the highest expected utility

Answer 33

Either true or false (0 or 1)

Answer 34

Partial truths (0 to 1)

Answer 35

The agent observes input-output pairs and learns a mapping between the input and output

Answer 36

The agent only sees input data without explicit feedback Goal: Identifying patterns and structures

Answer 37

The agent takes actions and receives rewards or penalties from the environment Goal: Finding the optimal strategy to maximize long-term rewards

Answer 38

The output belongs to a finite set. Weather: "sunny", "cloudy", "rainy". Image recognition: ("dog", "cat", "car").

Answer 39

The output is a continuous value. Expected temperature tomorrow (15.6 ◦C). Estimating the market value of a house based on square footage.

Answer 40

Two variables change together, they are related, but one does not necessarily cause the other. Example: Ice cream consumption and the number of heat strokes both increase in summer.

Answer 41

Changing one variable directly affects another variable. Example: Wearing a seatbelt reduces the number of fatal accidents.

Answer 42

A third factor may be responsible for changes in both variables. Example: Ice cream sales and drowning incidents – the common cause is the weather.

Answer 43

A model is fit to a set of training data(X), to create a function to predict the Y value depending on new X data.

Answer 44

The model’s tendency to deviate from the expected value across different datasets. Bias typically arises from a limited hypothesis space, meaning the model cannot learn certain patterns. Example: Linear models can only fit straight lines.

Answer 45

The model does not sufficiently learn the patterns within the data. Cause: Too simple hypothesis space (high bias). Example: A linear model attempting to fit a curved dataset.

Answer 46

The model fits the training data too well but does not generalize well to new data. Cause: Too complex hypothesis space (high variance). Example: A 12th-degree polynomial that fits the training data perfectly but produces vastly different results on test data.

Answer 47

Use a fixed number of parameters to summarize the data. Example: Linear regression

Answer 48

No fixed number of parameters — the model size depends on the amount of data. Example: Piecewise linear model

Answer 49

The model uses the two closest known points as a basis for interpolation for each new query. Predictions are made by extending the line between the two nearest points.

Answer 50

Uses the k nearest neighbors for prediction. Two variations: The prediction is the average value of the points. Local linear regression.

Answer 51

Instead of a discrete k number of neighbors, the model weights the neighboring points. Weighting is done using a kernel function that determines the influence of a point on the prediction.

Answer 52

Finds the optimal decision boundary that maximizes the distance between samples belonging to different classes. The SVM seeks a hyperplane that maintains the greatest possible distance between samples of different classes

Answer 53

Logistic Regression: Estimates probabilities and draws a decision boundary. LDA: Uses a statistical approach, assuming classes are normally distributed. SVM: Seeks the optimal margin.

Answer 54

Separates classifications with lines

Answer 55

Separates classifications into rings

Answer 56

Separates classification with lines that can curve

Answer 57

A decision tree is a classification model that makes decisions based on the elements of the input vector. It uses a tree structure where each internal node is a test, and each leaf is a decision. Decisions are made by sequentially testing attributes. Starting from the root, the path is followed down the appropriate branch until a leaf node is reached. The leaf node contains the classification of the given example.

Answer 58

The goal is to reduce bias and variance, achieving better generalization performance Main Methods: Bagging (Bootstrap Aggregation): Building multiple independent models on different samples of the dataset. Example: Random Forest Boosting: Models that build upon each other, trying to correct previous mistakes. Examples: AdaBoost, Gradient Boosting Stacking: Using the outputs of various models as inputs for a meta-model Voting: Decision-making based on the votes of multiple models. Example: Combining the results of different algorithms

Answer 59

A model based on the ensemble learning of decision trees Multiple independent decision trees are built, and their decisions are combined through voting Operation: Bagging: Each decision tree learns from a different sample of the dataset Random Attribute Selection: Only a random subset of attributes is considered for each split Majority Voting: The final decision is made based on the votes of individual trees

Answer 60

Combines multiple weak learners to create a stronger model During training, hard-to-classify examples are given higher weights, focusing the model more on them: Initially, all examples have equal weights The first model learns a decision rule Misclassified examples have their weights increased, and the next model focuses more on them This process repeats until the pre-set number of models is reached The final model is a weighted combination of all individual models Popular algorithms: AdaBoost, Gradient Boosting, XGBoost Advantages Bias reduction: The ensemble model can capture patterns that a single model might miss Better generalization: Using multiple weak models results in more stable performance

Answer 61

Combination of multiple different models trained on the same dataset Instead of relying on a single model, a meta-model learns how to combine the outputs of the base models The different models (e.g., SVM, logistic regression, decision tree) are trained on the original data A second-level meta-model (e.g., another logistic regression) learns how to combine the predictions of the base models Advantages: Bias reduction: Using multiple models together helps generalize Better performance: Often better than any single base model

Answer 62

What is the difference? Supervised learning: The model learns from labeled data (xi , yi) Unsupervised learning: The model searches for structures and hidden patterns in unlabeled data xi ∈ X Examples: Supervised learning: Image classification (e.g., cat or dog?) Unsupervised learning: Automatic image grouping based on similarity without predefined labels

Answer 63

Definition: Discover hidden categories within the data Assign each data point a hidden label zn Example: Grouping documents by topic (politics, sports, economics) Automatically grouping images without knowing their content beforehand Important: Clustering is different from classification: classification involves labeled data, while clustering groups data in an unsupervised manner.

Answer 64

Task: Given a dataset D = {xn} N n=1 , where xn ∈ R d The goal is to partition the data into K clusters The clusters are not predefined; they are discovered by the algorithm Key Idea: Group elements based on their proximity Assign a prototype point to each cluster: µk , which represents the cluster center

Answer 65

Main Idea: Construct a nested hierarchy of clusters The result is a dendrogram showing the order of cluster merging Two main approaches: The agglomerative approach builds from the bottom up—each sample starts as its own cluster and then merges The divisive method works oppositely, starting from the top and gradually splitting into smaller clusters

Answer 66

K-Means vs GMM K-Means performs hard assignments—each point belongs to a single cluster GMM applies probabilistic clustering—each point has a probability of belonging to a cluster K-Means assumes spherical clusters, while GMM can model elliptical clusters GMM: Assumes data comes from multiple Gaussian distributions Each cluster is modeled as a Gaussian distribution Uses the Expectation-Maximization (EM) algorithm for parameter optimization

Answer 67

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) K-Means and GMM assume specific cluster shapes. DBSCAN does not require the number of clusters to be predefined. Handles clusters of different shapes and noise data. Key concepts: Core sample: A point with at least minPoints neighbors within radius ϵ. Border sample: A neighbor of a core point, but not a core point itself. Outlier: A point that does not belong to any cluster.

Answer 68

Definition: Goal: Reduce the number of features while preserving the most important information A commonly used technique: Principal Component Analysis (PCA) Why is it useful? Helps visualize data in lower dimensions Reduces noise and removes redundant data Improves computational efficiency on large datasets Example: In an image database, retaining key colors and shapes while removing noisy details

Answer 69

Goal: Reduce the dimensionality of data while preserving the maximum variance. PCA finds new axes (principal components) along which the variance of the data is maximized. The method projects the data onto these axes. Suitable for: visualization (2D/3D), noise reduction, preprocessing before supervised learning.

Answer 70

1. Dimensionality Reduction and Visualization If k = 2 or k = 3, the data can be visually represented. 2. Preprocessing for Machine Learning Models Dimensionality reduction decreases computational costs. Helps prevent overfitting: lower-dimensional inputs lead to simpler hypothesis classes. 3. Noise Reduction and Pattern Recognition Face recognition: PCA projects 100x100 pixel faces into a lower-dimensional space (eigenfaces method). Removes noise caused by small lighting variations and imaging differences. Distance-based similarity measurement in the reduced-dimensional space: successful face comparison.

Answer 71

Algorithms that identify data points that significantly deviate from "normal" patterns. Typical applications: fraud detection, fault detection, network security. Two popular solutions: One-Class SVM: defines a boundary around the data and considers points outside it as anomalies. Isolation Forest: isolates unusual points using random decision trees; efficient for large datasets.

Artificial Intelligence (Written Midterm) Flashcards

(96 cards)