Feature Selection Flashcards

Question 1

Q

What is feature selection

Answer

A

the process of identifying and selecting the most relevant features (variables) from a dataset to improve model performance, reduce overfitting, and enhance interpretability, by removing irrelevant or redundant features

Question 2

Q

Why does feature selection improve the model performance?

Answer

A

By focusing on the most predictive features, models can learn more accurately and generalize better to unseen data

Question 3

Q

Why does feature selection reduce overfitting?

Answer

A

Overfitting occurs when a model learns the training data too well, including noise and irrelevant details, leading to poor performance on new data. Feature selection helps prevent this by focusing on the most relevant information.

Question 4

Q

Why does feature selection enhance interpretability?

Answer

A

Simpler models with fewer features are easier to understand and explain, which can be crucial for decision-making and trust in the model’s predictions.

Question 5

Q

Why does feature selection reduce computational cost?

Answer

A

Using fewer features can significantly speed up model training and reduce the memory required to store and process the data.

Question 6

Q

Why does feature selection reduce data?

Answer

A

Feature selection helps to reduce the dimensionality of the dataset, which can be beneficial for both storage and processing

Question 7

Q

What are the filter methods?

Answer

A

These methods evaluate the relevance of features independently of any machine learning algorithm, using statistical measures like correlation, chi-square, or information gain

Question 8

Q

What are the wrapper methods

Answer

A

These methods use a machine learning algorithm to evaluate different subsets of features, selecting the subset that yields the best performance

Question 9

Q

What are embedded methods?

Answer

A

These methods integrate feature selection into the model training process, allowing the model to learn which features are most important during training

Question 10

Q

What is recursive feature elimination? RFE

Answer

A

This wrapper method iteratively removes features based on their importance scores, starting with all features and gradually reducing the set until the desired number of features is reached

Question 11

Q

What is sequential feature selection (SFS)

Answer

A

This method builds a feature subset by sequentially adding or removing features based on their impact on model performance

Question 12

Q

What is principal component analysis? PCA

Answer

A

This technique transforms the data into a new set of uncorrelated variables (principal components), allowing for dimensionality reduction and feature selection.

Question 13

Q

What is the chi-squared test?

Answer

A

A filter method, the most common, that calculates a chi-square score for every input variable against the target and the variables with the highest score are selected as the input set of features

Question 14

Q

What is the Pearson correlation?

Answer

A

Pearson correlation coefficient is calculated for each input against each other when several independent variables are correlated. One among each highly correlated variable is removed.

Question 15

Q

What are decision tree algorithms?

Answer

A

Features are selected such that a node condition should be able to split the data so that similar values in the target variable end up in the same split.

Feature Selection Flashcards

(15 cards)