Feature Engineering Flashcards

(13 cards)

1
Q

What is feature engineering?

A

manipulation — addition, deletion, combination, mutation — of your data set to improve machine learning model training, leading to better performance and greater accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the common feature types?

A

Numerical
Categorical
Ordinal
Binary
Text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data exploration and understanding?

A

understanding the types of features and their distributions - the SHAPE of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is handling missing data?

A

this addresses missing values through imputation or removal of instances/features with missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is variable encoding?

A

converting categorical variables into numerical format suitable for ML algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is feature scaling?

A

normalizing numerical features to ensure they are on a similar scale, improving model performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is feature creation?

A

generating new features by combining existing ones to capture relationships between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is handling outliers?

A

Identifying and addressing outliers in the data through techniques like trimming or transforming data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is normalization

A

normalizing features to bring them to a common scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is binning or discretization

A

Converting continuous features into discrete bins to capture specific patters in certain ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is text data processing?

A

If dealing with text, perform tasks like tokenization, stemming, and remove stop words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Time Series Features

A

Extract relevant time based features like lag features or rolling stats for time series data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are vector features?

A

Commonly used for training in ML, data is represented in features and these features are organized into vectors. Vector is a mathematical object that has magnitude and direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly