ML Part 3 Flashcards
(19 cards)
What is a random forest?
An ensemble of decision trees trained on random subsets of data and features.
What is bagging?
Bootstrap aggregating: training models on random samples and averaging their predictions.
Why do random forests reduce overfitting?
By averaging predictions from many trees trained on different data subsets.
What is feature importance in random forests?
A metric that shows how useful each feature was for making predictions.
What is the out-of-bag (OOB) score?
A validation score using samples not included in the bootstrap sample.
What is boosting?
An ensemble method that combines weak learners sequentially to improve performance.
What is gradient boosting?
A boosting method that minimizes loss by adding trees that correct previous errors.
What is the learning rate in boosting?
A hyperparameter that controls the contribution of each tree to the ensemble.
What is early stopping in boosting?
Halting training when validation performance no longer improves.
Name two popular gradient boosting libraries.
XGBoost and LightGBM.
What is the Naive Bayes algorithm?
A probabilistic classifier based on Bayes’ theorem assuming feature independence.
What assumption makes Naive Bayes ‘naive’?
It assumes that all features are conditionally independent given the class.
Why is Naive Bayes effective for text classification?
Because it handles high-dimensional sparse data well.
What is Laplace smoothing?
A technique to handle zero probabilities by adding a small constant.
What is Principal Component Analysis (PCA)?
A method for reducing dimensionality by projecting data onto directions of maximum variance.
What are principal components?
New orthogonal axes capturing the most variance in the data.
Why use PCA?
To reduce complexity, noise, and improve efficiency.
What is an eigenvector in PCA?
A direction in the feature space along which variance is measured.
What is an eigenvalue in PCA?
The amount of variance captured by its corresponding eigenvector.