Artificial Intelligence (Written Midterm) Flashcards

(96 cards)

1
Q

What is intelligence?

A

The ability to:
* Gain and apply experience.
* Develop knowledge, skills, and problem-solving abilities.
* Adapt to new situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is human intelligence?

A

Arises from biological processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does artificial intelligence (AI) refer to?

A

Systems that can:
* Think like humans or act rationally.
* Solve complex problems using algorithms and technology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Narrow AI?

A

Specialized in specific tasks (e.g., recommendation systems).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is General AI?

A

Hypothetical, capable of performing any human task (understand, learn, and apply knowledge).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Superintelligence?

A

Theoretical, surpassing human intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Turing-test approach in AI?

A

Acting humanly; example: Chatbot that can successfully ‘look human’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the cognitive modeling approach in AI?

A

Thinking humanly; involves experimental observation and study of humans and living beings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the logic-based approach in AI focus on?

A

Thinking rationally; includes syllogism, deductive systems, formal logic, and probability theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a rational agent?

A

An entity acting autonomously that acts to maximize its performance measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of sensors and actuators in agents?

A

Sensors perceive the environment (e.g., camera, microphone) and actuators affect the environment (e.g., motors, display).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the sequence in relation to an agent’s perception?

A

Everything the agent has ever perceived.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a rational agent’s behavior depend on?

A

Its built-in knowledge and the entire sequence of observations it has observed so far.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is an agent’s behavior mathematically described?

A

By a function that maps the perception sequence to action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the types of environments an agent can operate in?

A
  • Fully or partially observable.
  • Single- or multi-agent.
  • Deterministic or non-deterministic (stochastic).
  • Episodic or serial.
  • Static or dynamic.
  • Discrete or continuous.
  • Rules are known or unknown.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or False: A rational agent’s performance should be measured and adjusted to expectations.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the formal definition of a search problem?

A

A search problem is defined by the state space, initial state, goal state, actions, state-transition model, and cost function.

A search problem involves finding a path from the initial state to the goal state using available actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the state space represent in a search problem?

A

The state space is the set of all possible states of the environment.

It encompasses every configuration that the agent might encounter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the initial state in a search problem?

A

The initial state is the agent’s starting state.

This is where the search begins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the goal state in a search problem?

A

The goal state is the set of one or more target states to be achieved.

It defines what the search is trying to accomplish.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the actions in a search problem?

A

Actions are a finite set of actions available in a given state.

These are the moves or operations that the agent can perform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a state-transition model?

A

A state-transition model defines the result of each action.

It specifies how actions change the current state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a cost function in a search problem?

A

A cost function is used for performance measurement, specifying the cost of actions.

It helps evaluate the efficiency of different paths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What additional elements are associated with search problems?

A

Additional elements include:
* Path
* Solution
* Optimal solution
* State-transition graph

These elements help in understanding the structure and outcomes of the search process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Fill in the blank: The _______ is a function used for performance measurement in a search problem.
[cost function]
26
True or False: The goal state can only be one specific state in a search problem.
False ## Footnote The goal state can be a set of one or more target states.
27
What are problems that can be solved efficiently called?
P (Polynomial) problems ## Footnote Example: Sorting an unordered list (e.g., Quicksort)
28
What is the time complexity notation for polynomial problems?
O(n^k), where k is a constant ## Footnote Examples: O(n), O(n^2), O(n^3), O(1)
29
What does NP stand for?
Nondeterministic Polynomial ## Footnote It involves a nondeterministic Turing Machine that guesses a solution in one step, which can be verified in polynomial time.
30
Give an example of an NP problem.
Verifying a Sudoku solution ## Footnote This demonstrates that solutions can be verified in polynomial time.
31
What is the relationship between P and NP?
P ⊆ NP ## Footnote This means all problems that can be solved in polynomial time can also be verified in polynomial time.
32
What are NP-Hard problems?
Problems that are not necessarily in NP, but can be reduced to any NP problem ## Footnote Their solutions are not guaranteed to be verified quickly.
33
Give an example of an NP-Hard problem.
The Halting problem ## Footnote This problem is notable for not being in NP.
34
What defines NP-Complete problems?
Problems that are both in NP and NP-Hard ## Footnote They can be verified quickly and can be reduced to any other NP problem in polynomial time.
35
What is the significance of solving an NP-Complete problem in polynomial time?
If an NP-Complete problem can be solved in polynomial time, then all NP problems can be solved in polynomial time ## Footnote This is a critical concept in computational theory.
36
Provide an example of an NP-Complete problem.
The Traveling Salesman Problem (TSP) ## Footnote This problem is well-known for its complexity and applications in optimization.
37
What is the key difference between NP and NP-Complete?
NP-Complete problems are both verifiable in polynomial time and NP-Hard ## Footnote NP problems can be verified quickly, while NP-Hard problems may not be.
38
What are the evaluation criteria for search algorithms?
* Completeness * Optimality * Time Complexity * Space Complexity ## Footnote Completeness checks if a solution is found, optimality assesses if it's the best solution, time complexity measures how long it takes, and space complexity measures memory usage.
39
What does Breadth-First Search (BFS) examine?
All nodes in the state space level by level ## Footnote BFS is guaranteed to find a solution if one exists.
40
What is the time and space complexity of Breadth-First Search?
O(b^d), where b is the branching factor and d is the depth ## Footnote This complexity indicates how the algorithm scales with the size of the search space.
41
How does Depth-First Search (DFS) explore the search space?
Explores a branch deeply before backtracking ## Footnote DFS is faster but does not always guarantee an optimal solution.
42
What are the time and space complexities of Depth-First Search?
Time complexity: O(b^m), Space complexity: O(bm), where m is the maximum depth ## Footnote These complexities reflect how the algorithm's performance varies with the maximum depth.
43
Why is heuristic search important?
Reduces the search space size and provides more efficient solutions for complex problems ## Footnote Heuristic methods help streamline the search process.
44
What is a heuristic in the context of search algorithms?
An estimate that measures the distance between the current state and the goal state ## Footnote Heuristics guide the search towards more promising paths.
45
What does h(n) represent in heuristic search?
The estimated cost of the shortest path from the current state n to a goal state ## Footnote This estimation is crucial for evaluating potential paths in the search space.
46
What does a Logical Agent consider a statement to be?
Either true or false ## Footnote This means it operates under binary logic.
47
What does a Probabilistic Agent assign to a statement?
A numerical belief ## Footnote Ranges from 0 (certainly false) to 1 (certainly true).
48
What is Partial observability?
Lacks complete information about the environment
49
What does Nondeterminism refer to?
Outcomes of actions are unpredictable
50
What are Hostile environments characterized by?
Other agents or dynamic elements actively oppose or interfere with the agent’s goals
51
What is a Belief State?
The set of all possible world states
52
What does Contingent Planning prepare for?
All possible sensor reports
53
What does Utility Theory assign to each state?
A utility value
54
What does an agent prefer according to Utility Theory?
States with higher utility
55
What is the goal of an agent in Utility Theory?
Make the best available decision, not just survive
56
What defines a rational agent?
Always chooses the action that results in the highest expected utility
57
In classical logic, how is a statement evaluated?
Either true or false (0 or 1)
58
What does fuzzy logic allow for?
Partial truths (0 to 1)
59
Supervised Learning
The agent observes input-output pairs and learns a mapping between the input and output
60
Unsupervised Learning
The agent only sees input data without explicit feedback Goal: Identifying patterns and structures
61
Reinforcement Learning
The agent takes actions and receives rewards or penalties from the environment Goal: Finding the optimal strategy to maximize long-term rewards
62
Classification:
The output belongs to a finite set. Weather: "sunny", "cloudy", "rainy". Image recognition: ("dog", "cat", "car").
63
Regression:
The output is a continuous value. Expected temperature tomorrow (15.6 ◦C). Estimating the market value of a house based on square footage.
64
Correlation
Two variables change together, they are related, but one does not necessarily cause the other. Example: Ice cream consumption and the number of heat strokes both increase in summer.
65
Causation (Causality)
Changing one variable directly affects another variable. Example: Wearing a seatbelt reduces the number of fatal accidents.
66
Correlation ̸= Causation
A third factor may be responsible for changes in both variables. Example: Ice cream sales and drowning incidents – the common cause is the weather.
67
Fitting
A model is fit to a set of training data(X), to create a function to predict the Y value depending on new X data.
68
Bias
The model’s tendency to deviate from the expected value across different datasets. Bias typically arises from a limited hypothesis space, meaning the model cannot learn certain patterns. Example: Linear models can only fit straight lines.
69
Underfitting
The model does not sufficiently learn the patterns within the data. Cause: Too simple hypothesis space (high bias). Example: A linear model attempting to fit a curved dataset.
70
Overfitting
The model fits the training data too well but does not generalize well to new data. Cause: Too complex hypothesis space (high variance). Example: A 12th-degree polynomial that fits the training data perfectly but produces vastly different results on test data.
71
Parametric models:
Use a fixed number of parameters to summarize the data. Example: Linear regression
72
Non-parametric models:
No fixed number of parameters — the model size depends on the amount of data. Example: Piecewise linear model
73
Piecewise Linear Regression
The model uses the two closest known points as a basis for interpolation for each new query. Predictions are made by extending the line between the two nearest points.
74
k-Nearest Neighbors (k-NN) Regression
Uses the k nearest neighbors for prediction. Two variations: The prediction is the average value of the points. Local linear regression.
75
Locally Weighted Regression (LWR)
Instead of a discrete k number of neighbors, the model weights the neighboring points. Weighting is done using a kernel function that determines the influence of a point on the prediction.
76
Support Vector Machine (SVM):
Finds the optimal decision boundary that maximizes the distance between samples belonging to different classes. The SVM seeks a hyperplane that maintains the greatest possible distance between samples of different classes
77
Logistic Regression/LDA/SVM
Logistic Regression: Estimates probabilities and draws a decision boundary. LDA: Uses a statistical approach, assuming classes are normally distributed. SVM: Seeks the optimal margin.
78
Linear kernel (SVM)
Separates classifications with lines
79
RBF (Radial Basis Function) kernel(SVM)
Separates classifications into rings
80
Polynomial kernel(SVM)
Separates classification with lines that can curve
81
Decision Tree
A decision tree is a classification model that makes decisions based on the elements of the input vector. It uses a tree structure where each internal node is a test, and each leaf is a decision. Decisions are made by sequentially testing attributes. Starting from the root, the path is followed down the appropriate branch until a leaf node is reached. The leaf node contains the classification of the given example.
82
Ensemble Learning
The goal is to reduce bias and variance, achieving better generalization performance Main Methods: Bagging (Bootstrap Aggregation): Building multiple independent models on different samples of the dataset. Example: Random Forest Boosting: Models that build upon each other, trying to correct previous mistakes. Examples: AdaBoost, Gradient Boosting Stacking: Using the outputs of various models as inputs for a meta-model Voting: Decision-making based on the votes of multiple models. Example: Combining the results of different algorithms
83
Random Forest
A model based on the ensemble learning of decision trees Multiple independent decision trees are built, and their decisions are combined through voting Operation: Bagging: Each decision tree learns from a different sample of the dataset Random Attribute Selection: Only a random subset of attributes is considered for each split Majority Voting: The final decision is made based on the votes of individual trees
84
Boosting: Learning from Difficult Examples
Combines multiple weak learners to create a stronger model During training, hard-to-classify examples are given higher weights, focusing the model more on them: Initially, all examples have equal weights The first model learns a decision rule Misclassified examples have their weights increased, and the next model focuses more on them This process repeats until the pre-set number of models is reached The final model is a weighted combination of all individual models Popular algorithms: AdaBoost, Gradient Boosting, XGBoost Advantages Bias reduction: The ensemble model can capture patterns that a single model might miss Better generalization: Using multiple weak models results in more stable performance
85
Stacking: Layered Ensemble Learning
Combination of multiple different models trained on the same dataset Instead of relying on a single model, a meta-model learns how to combine the outputs of the base models The different models (e.g., SVM, logistic regression, decision tree) are trained on the original data A second-level meta-model (e.g., another logistic regression) learns how to combine the predictions of the base models Advantages: Bias reduction: Using multiple models together helps generalize Better performance: Often better than any single base model
86
Supervised vs. Unsupervised Learning
What is the difference? Supervised learning: The model learns from labeled data (xi , yi) Unsupervised learning: The model searches for structures and hidden patterns in unlabeled data xi ∈ X Examples: Supervised learning: Image classification (e.g., cat or dog?) Unsupervised learning: Automatic image grouping based on similarity without predefined labels
87
Clustering
Definition: Discover hidden categories within the data Assign each data point a hidden label zn Example: Grouping documents by topic (politics, sports, economics) Automatically grouping images without knowing their content beforehand Important: Clustering is different from classification: classification involves labeled data, while clustering groups data in an unsupervised manner.
88
K-Means
Task: Given a dataset D = {xn} N n=1 , where xn ∈ R d The goal is to partition the data into K clusters The clusters are not predefined; they are discovered by the algorithm Key Idea: Group elements based on their proximity Assign a prototype point to each cluster: µk , which represents the cluster center
89
Hierarchical Clustering
Main Idea: Construct a nested hierarchy of clusters The result is a dendrogram showing the order of cluster merging Two main approaches: The agglomerative approach builds from the bottom up—each sample starts as its own cluster and then merges The divisive method works oppositely, starting from the top and gradually splitting into smaller clusters
90
Gaussian Mixture Model
K-Means vs GMM K-Means performs hard assignments—each point belongs to a single cluster GMM applies probabilistic clustering—each point has a probability of belonging to a cluster K-Means assumes spherical clusters, while GMM can model elliptical clusters GMM: Assumes data comes from multiple Gaussian distributions Each cluster is modeled as a Gaussian distribution Uses the Expectation-Maximization (EM) algorithm for parameter optimization
91
DBSCAN – Density-Based Clustering
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) K-Means and GMM assume specific cluster shapes. DBSCAN does not require the number of clusters to be predefined. Handles clusters of different shapes and noise data. Key concepts: Core sample: A point with at least minPoints neighbors within radius ϵ. Border sample: A neighbor of a core point, but not a core point itself. Outlier: A point that does not belong to any cluster.
92
Dimensionality Reduction
Definition: Goal: Reduce the number of features while preserving the most important information A commonly used technique: Principal Component Analysis (PCA) Why is it useful? Helps visualize data in lower dimensions Reduces noise and removes redundant data Improves computational efficiency on large datasets Example: In an image database, retaining key colors and shapes while removing noisy details
93
Principal Component Analysis (PCA)
Goal: Reduce the dimensionality of data while preserving the maximum variance. PCA finds new axes (principal components) along which the variance of the data is maximized. The method projects the data onto these axes. Suitable for: visualization (2D/3D), noise reduction, preprocessing before supervised learning.
94
Applications of PCA
1. Dimensionality Reduction and Visualization If k = 2 or k = 3, the data can be visually represented. 2. Preprocessing for Machine Learning Models Dimensionality reduction decreases computational costs. Helps prevent overfitting: lower-dimensional inputs lead to simpler hypothesis classes. 3. Noise Reduction and Pattern Recognition Face recognition: PCA projects 100x100 pixel faces into a lower-dimensional space (eigenfaces method). Removes noise caused by small lighting variations and imaging differences. Distance-based similarity measurement in the reduced-dimensional space: successful face comparison.
95
What is anomaly detection?
Algorithms that identify data points that significantly deviate from "normal" patterns. Typical applications: fraud detection, fault detection, network security. Two popular solutions: One-Class SVM: defines a boundary around the data and considers points outside it as anomalies. Isolation Forest: isolates unusual points using random decision trees; efficient for large datasets.
96