ML Part 5 Flashcards
(20 cards)
What is feature engineering?
The process of creating, selecting, or transforming variables to improve model performance.
What is one-hot encoding?
Converting categorical variables into binary indicator columns.
What is label encoding?
Assigning each category a unique integer value.
What is feature scaling?
Rescaling input features to a standard range (e.g., 0-1 or standard score).
When is feature scaling important?
For distance-based models like k-NN or SVM.
What is normalization?
Scaling data to a range (typically 0 to 1).
What is standardization?
Rescaling data to have mean 0 and standard deviation 1.
What is missing value imputation?
Filling in missing values using a rule (e.g., mean, median, model).
What is data leakage in preprocessing?
Using information in training that would not be available at prediction time.
Why should scaling be fit only on training data?
To avoid data leakage from test data influencing the model.
What is feature selection?
Choosing a subset of input variables to use in a model.
What is univariate feature selection?
Selecting features based on statistical tests between inputs and target.
What is recursive feature elimination (RFE)?
A method that fits models and recursively removes least important features.
What is the curse of dimensionality?
Model performance degrades as the number of features increases without more data.
What is multicollinearity?
When two or more features are highly correlated, causing instability in coefficients.
What is hyperparameter tuning?
The process of finding the best values for model parameters not learned from data.
What is grid search?
An exhaustive search over a manually specified set of hyperparameters.
What is random search?
Sampling hyperparameters randomly over specified ranges.
What is Bayesian optimization for tuning?
A probabilistic method that models the objective function to find good hyperparameters.
What is cross-validation used for during tuning?
To evaluate model performance for each hyperparameter configuration.