MI Flashcards
(59 cards)
What is Supervised Learning?
Supervised learning is a machine learning approach that trains on labeled data, where each example has a known correct output or label.
Describe the structure of a Decision Tree.
A Decision Tree has: internal nodes for feature tests, branches for outcomes, and leaves for final classifications.
How is a Decision Tree constructed?
- Start with all data at the root. 2. Select a feature that best splits the data (e.g., using Gini or Information Gain). 3. Partition data accordingly. 4. Recursively repeat until a stopping condition (max depth, min samples, etc.) is reached.
Give an example of a Decision Tree classification.
Example: Predicting laptop purchases. If the person is a student, check budget. If budget > $1000, predict purchase.
What is the k-Nearest Neighbor (k-NN) algorithm?
k-NN is an instance-based learning method that classifies (or regresses) a new point based on its closest k neighbors.
Describe the k-NN algorithm.
- Choose k (number of neighbors). 2. Compute distance (e.g., Euclidean) from the new point to all training points. 3. Select the k closest. 4. Combine results via majority vote (classification) or average (regression).
What are key evaluation metrics in supervised learning?
Common metrics include: Accuracy (correct/total), Precision & Recall (especially for imbalanced data), and ROC AUC (trade-off between TPR and FPR).
What is a Neural Network?
A Neural Network is composed of: an input layer (features), one or more hidden layers (transformation of inputs), and an output layer (final predictions).
How does backpropagation train a Neural Network?
- Forward pass to get predictions. 2. Compute loss by comparing to true label. 3. Use gradients (via backpropagation) to update weights. 4. Repeat until convergence.
Where are neural networks commonly used?
Neural networks are used in image recognition, speech processing, and self-driving cars.
What is Unsupervised Learning?
Unsupervised learning discovers patterns from unlabeled data, unlike supervised learning which relies on labeled examples.
What is Clustering?
Clustering groups similar data points based on similarity (e.g., distance).
How does k-Means clustering work?
- Pick the number of clusters k. 2. Randomly choose k initial centers. 3. Assign each point to the nearest center. 4. Update centers based on assignments. 5. Repeat until centers stabilize.
How does Expectation-Maximization clustering work?
It alternates: E-Step (assign probabilistic memberships to clusters) and M-Step (update cluster parameters based on those memberships).
Where is clustering used?
Clustering is used in customer segmentation, text classification, and medical diagnosis.
What is a Constraint Satisfaction Problem (CSP)?
A CSP involves finding values for a set of variables within their domains so that all constraints are satisfied.
What are the key components of a CSP?
- Variables (items to assign). 2. Domains (possible values). 3. Constraints (rules that restrict valid assignments).
How can CSPs be solved?
- Backtracking Search (systematically try assignments). 2. Constraint Propagation (reduce domains before/during search). 3. Local Search (iteratively refine a complete assignment).
What is Generalized Arc Consistency (GAC) in CSPs?
GAC ensures that for every constraint, each value in a variable’s domain is consistent with some value in the domains of other variables in that constraint.
How does the Generalized Arc Consistency (GAC) algorithm work?
- Initialize a queue with all constraints. 2. Make a variable’s domain consistent for each constraint. 3. If values are removed, re-check related constraints. 4. Repeat until no more changes.
Why is Generalized Arc Consistency (GAC) useful?
It prunes inconsistent values early, reducing the search space and preventing needless backtracking.
What is a Bayesian Network?
A Bayesian Network is a directed acyclic graph whose nodes represent variables and edges indicate probabilistic dependencies.
How does inference work in a Bayesian Network?
- Exact Inference (e.g., variable elimination). 2. Approximate Inference (e.g., Gibbs sampling, MCMC).
What are some applications of Bayesian Networks?
They are used for medical diagnosis, spam filtering, robotics, and decision support systems.