Which machine learning algorithm should I use? Flashcards
(33 cards)
Dimension reduction
Reducing the number of variables under consideration. In many applications, the raw data have very high dimensional features and some features are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship.
Supervised learning
Supervised learning algorithms make predictions based on a set of examples.
- Classification
- Regression
- Forecasting
PCA
An unsupervised clustering method which maps the original data space into a lower dimensional space while preserving as much information as possible. The PCA basically finds a subspace that most preserves the data variance, with the subspace defined by the dominant eigenvectors of the data’s covariance matrix.
CheatSheet

Linear SVM and kernel SVM
When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separable space into a higher dimension linearly separable space.
When most dependent variables are numeric, logistic regression and SVM should be the first try for classification.

Unsupervised: Clustering

Factors to consider in ML algorithm
- The size, quality, and nature of data.
- The available computational time.
- The urgency of the task.
- What you want to do with the data.
Supervised: Classification

SVD
- SVD is also widely used as a topic modeling tool, known as latent semantic analysis, in natural language processing (NLP).
- SVD of a user-versus-movie matrix is able to extract the user profiles and movie profiles which can be used in a recommendation system
Classification
When the data are being used to predict a categorical variable
DBSCAN
When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used by connecting samples through density diffusion.

Regression
When predicting continuous values
Hierarchical result
use hierarchical clustering
Semi-supervised learning
Use unlabeled examples with a small amount of labeled data to improve the learning accuracy.
When trying to solve a new ML problem what are the three steps?
- Define the problem. What problems do you want to solve?
- Start simple. Be familiar with the data and the baseline results.
- Then try something more complicated.
Why we need PCA, SVD and LDA
We generally do not want to feed a large number of features directly into a machine learning algorithm since some features may be irrelevant or the “intrinsic” dimensionality may be smaller than the number of features
Supervised: Regression

Neural networks and deep learning
- A neural network consists of three parts: an input layer, hidden layers, and an output layer.
- The number of hidden layers defines the model complexity and modeling capacity.
- Output layer is a categorical variable, then the neural network is a way to address classification problems.
- Output layer is a continuous variable, then the network can be used to do regression.
- Output layer is the same as the input layer, the network can be used to extract intrinsic features.
What are [1], [2], [3], and [4]?

[1] Unsupervised: Dimensionality Reduction)
[2] Unsupervised: Clustering
[3] Supervised: Regression
[4] Supervised: Classification
Considerations when choosing an algorithm
- Accuracy (Phase III)
- Training time (Phase II)
- Ease of use (Phase I)
What are PCA, SVD and LDA
Principal component analysis (PCA)
Singular value decomposition (SVD)
Latent Dirichlet allocation (LDA)
Hierarchical clustering
Hierarchical partitions can be visualized using a tree structure (a dendrogram). It does not need the number of clusters as an input and the partitions can be viewed at different levels of granularities (i.e., can refine/coarsen clusters) using different K.

Perform dimension reduction
Principal component analysis
k-means, k-modes, and GMM (Gaussian mixture model) clustering
- Clustering aims to partition n observations into k clusters.
- K-means define hard assignment: the samples are to be and only to be associated with one cluster.
- GMM : define soft assignment: Each sample has a probability to be associated with each cluster.
- Both algorithms are simple and fast enough for clustering when the number of clusters k is given.

