Feature extraction Flashcards

1
Q

Feature extraction

A

Feature extraction is a process that transforms or reduces the dimensionality of the original data into a set of new composite features that are more interpretable and useful for machine learning model training. In summary, feature extraction techniques are powerful tools to simplify machine learning problems by reducing the dimensionality of the data. The choice of feature extraction method depends on the specific requirements and constraints of the problem at hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Definition
A

Feature extraction is a dimensionality reduction process, where an initial set of raw data is reduced to more manageable groups (or features) for processing, while still accurately and comprehensively describing the original data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Goal
A

The primary goal of feature extraction is to extract a set of features from the raw data that are most relevant for the task at hand. This helps in reducing computational complexity and addresses the ‘curse of dimensionality’ problem in machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Principal Component Analysis (PCA)
A

PCA is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance, and each succeeding component has the highest possible variance under the constraint that it is orthogonal to the preceding components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Linear Discriminant Analysis (LDA)
A

LDA is a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. It is primarily used for dimensionality reduction in the pre-processing step for pattern-classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. t-Distributed Stochastic Neighbor Embedding (t-SNE)
A

t-SNE is a machine learning algorithm for visualization. It is a non-linear dimensionality reduction technique that is particularly well suited for the visualization of high-dimensional datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Autoencoders
A

Autoencoders are a type of artificial neural network used for learning efficient codings of input data. They have an input layer, an output layer, and one or more hidden layers connecting them. The output layer has the same number of nodes as the input layer. Their main use is to perform dimensionality reduction for feature extraction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Non-Negative Matrix Factorization (NMF)
A

NMF is a group of algorithms where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to interpret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Independent Component Analysis (ICA)
A

ICA is a computational method for separating a multivariate signal into additive subcomponents. It is a special case of blind source separation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Benefits of Feature Extraction
A

The main benefits of feature extraction include reducing the computational cost, reducing the complexity of the problem, and mitigating issues with data privacy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. Limitations of Feature Extraction
A

The limitations of feature extraction include loss of interpretability (as new features may not have a clear interpretation), and risk of information loss due to the reduction of dimensionality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly