ML Part 3 Flashcards

Question 1

Q

What is a random forest?

Answer

A

An ensemble of decision trees trained on random subsets of data and features.

Question 2

Q

What is bagging?

Answer

A

Bootstrap aggregating: training models on random samples and averaging their predictions.

Question 3

Q

Why do random forests reduce overfitting?

Answer

A

By averaging predictions from many trees trained on different data subsets.

Question 4

Q

What is feature importance in random forests?

Answer

A

A metric that shows how useful each feature was for making predictions.

Question 5

Q

What is the out-of-bag (OOB) score?

Answer

A

A validation score using samples not included in the bootstrap sample.

Question 6

Q

What is boosting?

Answer

A

An ensemble method that combines weak learners sequentially to improve performance.

Question 7

Q

What is gradient boosting?

Answer

A

A boosting method that minimizes loss by adding trees that correct previous errors.

Question 8

Q

What is the learning rate in boosting?

Answer

A

A hyperparameter that controls the contribution of each tree to the ensemble.

Question 9

Q

What is early stopping in boosting?

Answer

A

Halting training when validation performance no longer improves.

Question 10

Q

Name two popular gradient boosting libraries.

Answer

A

XGBoost and LightGBM.

Question 11

Q

What is the Naive Bayes algorithm?

Answer

A

A probabilistic classifier based on Bayes’ theorem assuming feature independence.

Question 12

Q

What assumption makes Naive Bayes ‘naive’?

Answer

A

It assumes that all features are conditionally independent given the class.

Question 13

Q

Why is Naive Bayes effective for text classification?

Answer

A

Because it handles high-dimensional sparse data well.

Question 14

Q

What is Laplace smoothing?

Answer

A

A technique to handle zero probabilities by adding a small constant.

Question 15

Q

What is Principal Component Analysis (PCA)?

Answer

A

A method for reducing dimensionality by projecting data onto directions of maximum variance.

Question 16

Q

What are principal components?

Answer

Study These Flashcards

A

New orthogonal axes capturing the most variance in the data.

Question 17

Q

Why use PCA?

Answer

Study These Flashcards

A

To reduce complexity, noise, and improve efficiency.

Question 18

Q

What is an eigenvector in PCA?

Answer

Study These Flashcards

A

A direction in the feature space along which variance is measured.

Question 19

Q

What is an eigenvalue in PCA?

Answer

Study These Flashcards

A

The amount of variance captured by its corresponding eigenvector.

ML Part 3 Flashcards

(19 cards)