dsaPOML Flashcards by Daki Dima

What is classification in supervised learning?

Learning a function that maps an item into one of a set of predefined classes.

How well did you know this?

Not at all

Perfectly

Give an example of classification.

Categorizing email messages as ‘spam’ or ‘not spam’.

How well did you know this?

Not at all

Perfectly

What is regression in supervised learning?

Learning a function that maps an item to a real value (continuous categories).

How well did you know this?

Not at all

Perfectly

What is the goal of a supervised learning task?

Assign previously unseen records to a class as accurately as possible.

How well did you know this?

Not at all

Perfectly

What is a test set used for?

To determine the accuracy of the model.

How well did you know this?

Not at all

Perfectly

Define classification.

The process of predicting the class of a new item and identifying to which class it belongs.

How well did you know this?

Not at all

Perfectly

What is a classifier?

An algorithm that maps the input data to a specific category.

How well did you know this?

Not at all

Perfectly

What is a feature?

An individual measurable property of a phenomenon being observed.

How well did you know this?

Not at all

Perfectly

What is binary classification?

Classification task with two possible outcomes.

How well did you know this?

Not at all

Perfectly

What is multi-class classification?

Classification with more than two classes where each sample is assigned to one target label.

How well did you know this?

Not at all

Perfectly

What is multi-label classification?

Classification task where each sample is mapped to a set of target labels.

How well did you know this?

Not at all

Perfectly

What is an example application of classification in direct marketing?

Reducing mailing costs by targeting consumers likely to buy a new product.

How well did you know this?

Not at all

Perfectly

What is the goal of fraud detection in classification?

Predict fraudulent cases in credit card transactions.

How well did you know this?

Not at all

Perfectly

Name one classification technique.

Decision tree

How well did you know this?

Not at all

Perfectly

What is Naive Bayes?

A probabilistic classifier based on independence assumptions.

How well did you know this?

Not at all

Perfectly

What does the Naive Bayes algorithm calculate?

Study These Flashcards

The probability that an event will occur given that another event has already occurred.

What is the formula for the posterior probability in Naive Bayes?

Study These Flashcards

P(c|x) = P(x|c) * P(c) / P(x)

What are the types of Naïve Bayes?

Study These Flashcards

Gaussian
Multinomial
Bernoulli

What is K-Nearest Neighbors (K-NN)?

Study These Flashcards

A non-parametric, supervised learning classifier using proximity for classifications.

What is the first step in KNN prediction?

Study These Flashcards

Calculate the distance between the new data point and all other points in the training set.

What does a decision tree classifier do?

Study These Flashcards

Splits the population into homogeneous sets based on significant attributes.

What is Gini impurity?

Study These Flashcards

A measure used to quantify a dataset’s impurity level or disorder.

What is the stopping condition for decision tree growth?

Study These Flashcards

All data in a node belong to the same class
A maximum depth is reached

What does low Gini impurity indicate?

Study These Flashcards

A node is relatively pure, meaning a majority of data points belong to one or a few classes.

What does entropy measure in decision trees?

The information needed to classify, similar to Gini impurity but logarithmic.

What are the steps in constructing a decision tree?

* Start with all training data at the root * Choose the best feature to split the data * Split the dataset into subsets * Repeat recursively on each subset

What is a lazy learner algorithm?

An algorithm that does not learn from the training set immediately but stores the dataset.

What is the elbow method in KNN?

A method where model’s error rate is plotted against different values of K to find the best K.

What is the main challenge in choosing K in KNN?

It affects the model's performance, balancing between bias and variance.

What is a regression tree?

A tree where the target variable can take continuous values (real numbers).

What is the significance of using odd numbers for K in binary classification?

To avoid a tie in majority voting.

What is the process of KNN regression?

KNN averages the values of the K nearest neighbors for prediction.

dsaPOML Flashcards

(32 cards)