Feature Engineering Flashcards
(13 cards)
What is feature engineering?
manipulation — addition, deletion, combination, mutation — of your data set to improve machine learning model training, leading to better performance and greater accuracy
What are the common feature types?
Numerical
Categorical
Ordinal
Binary
Text
What is data exploration and understanding?
understanding the types of features and their distributions - the SHAPE of the data
What is handling missing data?
this addresses missing values through imputation or removal of instances/features with missing data
What is variable encoding?
converting categorical variables into numerical format suitable for ML algorithms
What is feature scaling?
normalizing numerical features to ensure they are on a similar scale, improving model performance
What is feature creation?
generating new features by combining existing ones to capture relationships between variables
What is handling outliers?
Identifying and addressing outliers in the data through techniques like trimming or transforming data
What is normalization
normalizing features to bring them to a common scale
What is binning or discretization
Converting continuous features into discrete bins to capture specific patters in certain ranges
What is text data processing?
If dealing with text, perform tasks like tokenization, stemming, and remove stop words
What is Time Series Features
Extract relevant time based features like lag features or rolling stats for time series data
What are vector features?
Commonly used for training in ML, data is represented in features and these features are organized into vectors. Vector is a mathematical object that has magnitude and direction.