Supervised learning algorithms make predictions based on a set of examples. Classification Regression Forecasting

SVD is also widely used as a topic modeling tool, known as l atent semantic analysis , in natural language processing (NLP). SVD of a user-versus-movie matrix is able to extract the user profiles and movie profiles which can be used in a recommendation system

Which machine learning algorithm should I use? Flashcards by Ibrahim Abualhaol

Dimension reduction

Reducing the number of variables under consideration. In many applications, the raw data have very high dimensional features and some features are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship.

How well did you know this?

Not at all

Perfectly

Supervised learning

Supervised learning algorithms make predictions based on a set of examples.

Classification
Regression
Forecasting

How well did you know this?

Not at all

Perfectly

PCA

An unsupervised clustering method which maps the original data space into a lower dimensional space while preserving as much information as possible. The PCA basically finds a subspace that most preserves the data variance, with the subspace defined by the dominant eigenvectors of the data’s covariance matrix.

How well did you know this?

Not at all

Perfectly

CheatSheet

How well did you know this?

Not at all

Perfectly

Linear SVM and kernel SVM

When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separable space into a higher dimension linearly separable space.

When most dependent variables are numeric, logistic regression and SVM should be the first try for classification.

How well did you know this?

Not at all

Perfectly

Unsupervised: Clustering

How well did you know this?

Not at all

Perfectly

Factors to consider in ML algorithm

The size, quality, and nature of data.
The available computational time.
The urgency of the task.
What you want to do with the data.

How well did you know this?

Not at all

Perfectly

Supervised: Classification

How well did you know this?

Not at all

Perfectly

SVD

SVD is also widely used as a topic modeling tool, known as latent semantic analysis, in natural language processing (NLP).
SVD of a user-versus-movie matrix is able to extract the user profiles and movie profiles which can be used in a recommendation system

How well did you know this?

Not at all

Perfectly

Classification

When the data are being used to predict a categorical variable

How well did you know this?

Not at all

Perfectly

DBSCAN

When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used by connecting samples through density diffusion.

How well did you know this?

Not at all

Perfectly

Regression

When predicting continuous values

How well did you know this?

Not at all

Perfectly

Hierarchical result

use hierarchical clustering

How well did you know this?

Not at all

Perfectly

Semi-supervised learning

Use unlabeled examples with a small amount of labeled data to improve the learning accuracy.

How well did you know this?

Not at all

Perfectly

When trying to solve a new ML problem what are the three steps?

Define the problem. What problems do you want to solve?
Start simple. Be familiar with the data and the baseline results.
Then try something more complicated.

How well did you know this?

Not at all

Perfectly

Why we need PCA, SVD and LDA

Study These Flashcards

We generally do not want to feed a large number of features directly into a machine learning algorithm since some features may be irrelevant or the “intrinsic” dimensionality may be smaller than the number of features

Supervised: Regression

Study These Flashcards

Neural networks and deep learning

Study These Flashcards

A neural network consists of three parts: an input layer, hidden layers, and an output layer.
The number of hidden layers defines the model complexity and modeling capacity.
Output layer is a categorical variable, then the neural network is a way to address classification problems.
Output layer is a continuous variable, then the network can be used to do regression.
Output layer is the same as the input layer, the network can be used to extract intrinsic features.

What are [1], [2], [3], and [4]?

Study These Flashcards

[1] Unsupervised: Dimensionality Reduction)

[2] Unsupervised: Clustering

[3] Supervised: Regression

[4] Supervised: Classification

Considerations when choosing an algorithm

Study These Flashcards

Accuracy (Phase III)
Training time (Phase II)
Ease of use (Phase I)

What are PCA, SVD and LDA

Study These Flashcards

Principal component analysis (PCA)

Singular value decomposition (SVD)

Latent Dirichlet allocation (LDA)

Hierarchical clustering

Study These Flashcards

Hierarchical partitions can be visualized using a tree structure (a dendrogram). It does not need the number of clusters as an input and the partitions can be viewed at different levels of granularities (i.e., can refine/coarsen clusters) using different K.

Perform dimension reduction

Study These Flashcards

Principal component analysis

k-means, k-modes, and GMM (Gaussian mixture model) clustering

Study These Flashcards

Clustering aims to partition n observations into k clusters.
K-means define hard assignment: the samples are to be and only to be associated with one cluster.
GMM : define soft assignment: Each sample has a probability to be associated with each cluster.
Both algorithms are simple and fast enough for clustering when the number of clusters k is given.

**Linear regression** **and** **Logistic regression**

**LDA, GMM, and NLP**

* LDA is a ***probabilistic*** topic model and it decomposes documents into topics in a similar way as a Gaussian mixture model (***GMM***) decomposes continuous data into Gaussian densities. * * Differently from the GMM, an LDA models ***discrete data*** (words in documents) and it constrains that the topics are a prior distributed according to a ***Dirichlet*** distribution

**Forecasting**

Making predictions about the future based on the past and present data.

## Footnote **Reinforcement learning**

Reinforcement learning analyzes and optimizes the behavior of an agent based on the ***feedback*** from the environment. Machines try different scenarios to discover which actions yield the ***greatest reward***, rather than being told which actions to take. ***Trial-and-error*** and delayed reward distinguishes reinforcement learning from other techniques.

**Unsupervised learning**

The machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlies the data, such as: * a clustering structure, * a low-dimensional manifold, * a sparse tree and graph.

**Clustering**

Grouping a set of data examples so that examples in one group (or one cluster) are more similar (according to some criteria) than those in other groups.

**Unsupervised: Dimensionality reduction**

**Trees and ensemble trees**

* Subdivide the feature space into regions with mostly the same label * Random Forrest and gradient boosting are two popular ways to use tree algorithms to achieve good accuracy as well as overcoming the over-fitting problem

**Numeric prediction quickly**

* Decision trees * Logistic regression

Which machine learning algorithm should I use? Flashcards

(33 cards)