dsaPOML Flashcards
(32 cards)
What is classification in supervised learning?
Learning a function that maps an item into one of a set of predefined classes.
Give an example of classification.
Categorizing email messages as ‘spam’ or ‘not spam’.
What is regression in supervised learning?
Learning a function that maps an item to a real value (continuous categories).
What is the goal of a supervised learning task?
Assign previously unseen records to a class as accurately as possible.
What is a test set used for?
To determine the accuracy of the model.
Define classification.
The process of predicting the class of a new item and identifying to which class it belongs.
What is a classifier?
An algorithm that maps the input data to a specific category.
What is a feature?
An individual measurable property of a phenomenon being observed.
What is binary classification?
Classification task with two possible outcomes.
What is multi-class classification?
Classification with more than two classes where each sample is assigned to one target label.
What is multi-label classification?
Classification task where each sample is mapped to a set of target labels.
What is an example application of classification in direct marketing?
Reducing mailing costs by targeting consumers likely to buy a new product.
What is the goal of fraud detection in classification?
Predict fraudulent cases in credit card transactions.
Name one classification technique.
Decision tree
What is Naive Bayes?
A probabilistic classifier based on independence assumptions.
What does the Naive Bayes algorithm calculate?
The probability that an event will occur given that another event has already occurred.
What is the formula for the posterior probability in Naive Bayes?
P(c|x) = P(x|c) * P(c) / P(x)
What are the types of Naïve Bayes?
- Gaussian
- Multinomial
- Bernoulli
What is K-Nearest Neighbors (K-NN)?
A non-parametric, supervised learning classifier using proximity for classifications.
What is the first step in KNN prediction?
Calculate the distance between the new data point and all other points in the training set.
What does a decision tree classifier do?
Splits the population into homogeneous sets based on significant attributes.
What is Gini impurity?
A measure used to quantify a dataset’s impurity level or disorder.
What is the stopping condition for decision tree growth?
- All data in a node belong to the same class
- A maximum depth is reached
What does low Gini impurity indicate?
A node is relatively pure, meaning a majority of data points belong to one or a few classes.