Feature engineering Flashcards

1
Q

Feature engineering

A

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. In conclusion, feature engineering is an essential step that requires a mix of domain knowledge, intuition, and a bit of trial and error. When done correctly, it can significantly improve the performance of machine learning models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Definition
A

Feature engineering is a crucial step in the machine learning pipeline that involves creating new features or modifying existing features to improve machine learning model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Importance
A

The performance of machine learning models heavily depends on the quality of the features in the dataset. Even sophisticated models cannot learn from irrelevant features. Good feature engineering can often make the difference between a poor model and an excellent one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Domain Knowledge
A

Incorporating domain knowledge can help in creating features that make machine learning algorithms work better. By understanding the context of the problem, one can create relevant features that capture essential aspects of the problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Categorical Encoding
A

Many machine learning models require the input data to be in numerical format. Categorical variables (like ‘color’, ‘city’ etc.) are typically converted to numerical format using techniques like one-hot encoding, label encoding, or target encoding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Handling Missing Values
A

Missing data is a common problem in real-world datasets. Techniques to handle missing data include imputation (filling missing values with statistical measures like mean or median) and creating an indicator feature to highlight when a value was missing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Feature Scaling
A

Certain machine learning algorithms like linear regression, logistic regression, SVM, k-nearest neighbors (KNN), and neural networks require the input features to be on similar scales. Techniques like min-max scaling and standardization are used to scale the features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Feature Transformation
A

Features can be transformed to better fit the assumptions of a machine learning algorithm. Common transformations include logarithmic transformation, square root transformation, square transformation, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Feature Selection
A

Feature selection involves selecting the most useful features to train your machine learning model. This can reduce overfitting, improve accuracy, and reduce training time. Methods include correlation coefficients, chi-square test, mutual information, and feature importance from tree-based models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Feature Extraction
A

This technique reduces the dimension of high-dimensional data. Techniques like Principal Component Analysis (PCA), t-SNE, and UMAP are used for feature extraction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. Time-Series Specific
A

In time-series problems, features are often engineered from date-time variables, such as hour of day, day of week, quarter of year, month, year, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly