dsaPOML Flashcards

(32 cards)

1
Q

What is classification in supervised learning?

A

Learning a function that maps an item into one of a set of predefined classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give an example of classification.

A

Categorizing email messages as ‘spam’ or ‘not spam’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is regression in supervised learning?

A

Learning a function that maps an item to a real value (continuous categories).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of a supervised learning task?

A

Assign previously unseen records to a class as accurately as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a test set used for?

A

To determine the accuracy of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define classification.

A

The process of predicting the class of a new item and identifying to which class it belongs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a classifier?

A

An algorithm that maps the input data to a specific category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a feature?

A

An individual measurable property of a phenomenon being observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is binary classification?

A

Classification task with two possible outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is multi-class classification?

A

Classification with more than two classes where each sample is assigned to one target label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is multi-label classification?

A

Classification task where each sample is mapped to a set of target labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an example application of classification in direct marketing?

A

Reducing mailing costs by targeting consumers likely to buy a new product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the goal of fraud detection in classification?

A

Predict fraudulent cases in credit card transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name one classification technique.

A

Decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Naive Bayes?

A

A probabilistic classifier based on independence assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the Naive Bayes algorithm calculate?

A

The probability that an event will occur given that another event has already occurred.

17
Q

What is the formula for the posterior probability in Naive Bayes?

A

P(c|x) = P(x|c) * P(c) / P(x)

18
Q

What are the types of Naïve Bayes?

A
  • Gaussian
  • Multinomial
  • Bernoulli
19
Q

What is K-Nearest Neighbors (K-NN)?

A

A non-parametric, supervised learning classifier using proximity for classifications.

20
Q

What is the first step in KNN prediction?

A

Calculate the distance between the new data point and all other points in the training set.

21
Q

What does a decision tree classifier do?

A

Splits the population into homogeneous sets based on significant attributes.

22
Q

What is Gini impurity?

A

A measure used to quantify a dataset’s impurity level or disorder.

23
Q

What is the stopping condition for decision tree growth?

A
  • All data in a node belong to the same class
  • A maximum depth is reached
24
Q

What does low Gini impurity indicate?

A

A node is relatively pure, meaning a majority of data points belong to one or a few classes.

25
What does entropy measure in decision trees?
The information needed to classify, similar to Gini impurity but logarithmic.
26
What are the steps in constructing a decision tree?
* Start with all training data at the root * Choose the best feature to split the data * Split the dataset into subsets * Repeat recursively on each subset
27
What is a lazy learner algorithm?
An algorithm that does not learn from the training set immediately but stores the dataset.
28
What is the elbow method in KNN?
A method where model’s error rate is plotted against different values of K to find the best K.
29
What is the main challenge in choosing K in KNN?
It affects the model's performance, balancing between bias and variance.
30
What is a regression tree?
A tree where the target variable can take continuous values (real numbers).
31
What is the significance of using odd numbers for K in binary classification?
To avoid a tie in majority voting.
32
What is the process of KNN regression?
KNN averages the values of the K nearest neighbors for prediction.