Machine Learning - Unsupervised Flashcards

1
Q

Anomaly Detection

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cluster Analysis

A

Methods to assign a set of objects into groups. These groups are called clusters and objects in a cluster are more similar to each other than to those in other clusters. Well known algorithms are hierarchical clustering, k-means, fuzzy clustering, supervised clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Clustering: Canopy

A

A preprocessing step for K-means or Hierarchical clustering. Intended to speed up clustering operations on large data sets. Begin with the set of data points to be clustered. Remove a point from the set, beginning a new ‘canopy’. For each point left in the set, assign it to the new canopy if the distance less than the loose distance. If the distance of the point is additionally less than the tight distance, remove it from the original set. Repeat from step 2 until there are no more data points in the set to cluster. These relatively cheaply clustered canopies can be sub-clustered using a more expensive but accurate algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clustering: Definition

A

Methods to assign a set of objects into groups. Objects in a cluster are more similar to each other than to those in other clusters. Enables understanding of the differences as well as the similarities within the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster Analysis: Distance Measures Between Clusters

A

In hierarchical clustering: 1. Average linkage: It is the average distance between all the points in two clusters. 2. Single linkage: It is the distance between nearest points in two clusters 3. Complete linkage: It is the distance between farthest points in two clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cluster Analysis: Distance Metrics Between Items

A
  1. Euclidean distance: The geometric distance between objects in the multidimensional space. The shortest path between two objects. It is used to obtain sphere-shaped clusters. 2. City block (Manhattan) distance. It corresponds to the sum of distances along each dimension and is less sensitive to outliers. It is used to obtain diamond-shaped clusters. 3. Cosine similarity measure. It is calculated by measuring the cosine of angle between two objects. It is used mostly to compute the similarity between two sets of transaction data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Clustering: Flavors

A

Ward hierarchical clustering, k-means, Gaussian Mixture Models, spectral, Birch, Affinity propogation, fuzzy clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cluster Analysis: Gaussian Mixture Models (GMM)

A

An unsupervised learning technique for clustering that generates a mixture of clusters from the full data set using a Gaussian (normal) data distribution model for each cluster. The GMM’s output is a set of cluster attributes (mean, variance, and centroid) for each cluster, thereby producing a set of characterization metadata that serves as a compact descriptive model of the full data collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cluster Analysis: Hierarchical Clustering

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Clustering: K-Means

A

For a given K, finds K clusters by iteratively moving cluster centers to the cluster centers of gravity and adjusting the cluster set assignments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cluster Analysis: K-Means Overview

A

What: K-means is one of the most widely used clustering techniques because of its simplicity and speed. It partitions the data into a user specified number of clusters, k. Why: Simplicity, speed. It is fast for large data sets, which are common in segmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cluster Analysis: K-Means: 4 Key Steps

A

1: Initialization of k centroids; 2: Data points assigned to nearest centroid; 3: Relocation of each mean to the center of it’s points; 4: Repeat step 2 and 3 until assignments no longer change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cluster Analysis: K-Means: Cautions

A

Clusters may converge to a local minimum. Due to this issue, the clusters that are obtained might not be the right ones. To avoid this, it might be helpful to run the algorithm with different initial cluster centroids and compare the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cluster Analysis: K-Means: How

A
  1. Initialization: The algorithm is initialized by picking the initial k cluster representatives or “centroids”. These initial seeds can be sampled at random from the dataset, or by taking the results of clustering a small subset of the data;2. Data Assignment. Each data point is assigned to its closest centroid, with ties broken arbitrarily. This results in a partitioning of the data.;3. Recompute and reset the location of the “means”. Each cluster representative is relocated to the center (mean) of all data points assigned to it.;Now repeat step 3 and 4 until the convergence criterion is met (e.g., the assignment of objects to clusters no longer changes over multiple iterations) or maximum iteration is reached.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cluster Analysis: K-Means: Scaling Options

A

Note that each iteration needs N × k comparisons, which determines the time complexity of one iteration. The number of iterations required for convergence varies and may depend on N, but as a first cut, this algorithm can be considered linear in the dataset size. The k-means algorithm can take advantage of data parallelism. When the data objects are distributed to each processor, step 3 can be parallelized easily by doing the assignment of each object into the nearest cluster in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Clustering: Preference Bias

A

Prefers data that is in groupings given some form of distance (Euclidean, Manhattan, or others)

17
Q

Clustering: Restriction Bias

A

No restriction

18
Q

Clustering: Type

A

Unsupervised learning, class type clusterning

19
Q

Gaussian Mixture Models

A

20
Q

Hidden Markov Models

A

21
Q

Hidden Markov Models: Cons

A

22
Q

Hidden Markov Models: Definition

A

Markov models are a kind of probabilistic model often used in language modeling. The observations are assumed to follow a Markov chain, where each observation is independent of all past observations given the previous one.

23
Q

Hidden Markov Models: Example Applications

A

Temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, and bioinformatics.

24
Q

Hidden Markov Models: Flavors

A

Markov chains, Hidden Markov Models

25
Q

Hidden Markov Models: Preference Bias

A

Generally works well for system information where the Markov assumption holds

26
Q

Hidden Markov Models: Pros

A

Markov chains are useful models of many natural processes and the basis of powerful techniques in probabilistic inference and randomized algorithms.

27
Q

Hidden Markov Models: Restriction Bias

A

Prefers time series data and memoryless information

28
Q

Hidden Markov Models: Type

A

Supervised or unsupervised with class: Markovian

29
Q

Markov Models

A

Markov models are a kind of probabilistic model often used in language modeling. The observations are assumed to follow a Markov chain, where each observation is independent of all past observations given the previous one. In a Markov chain, a system transitions stochastically from one state to another. It is a memoryless process, in the sense that the distribution over the next state depends only on the current state, and not on the state at any past time. Markov chains are useful models of many natural processes and the basis of powerful techniques in probabilistic inference and randomized algorithms. A famous Markov chain is the so-called “drunkard’s walk”, a random walk on the number line where, at each step, the position may change by +1 or ?1 with equal probability. From any position there are two possible transitions, to the next or previous integer. The transition probabilities depend only on the current position, not on the manner in which the position was reached. For example, the transition probabilities from 5 to 4 and 5 to 6 are both 0.5, and all other transition probabilities from 5 are 0. These probabilities are independent of whether the system was previously in 4 or 6.

30
Q

PCA

A

31
Q

SVD

A

32
Q

Unsupervised Learning

A

Allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.