Anomaly Detection + Distances Flashcards

1
Q

What is DTW and how does it work?

A

Dynamic Type Warping or sequence alignment aligns time-series non-linearly in time, and finds the best match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you create an Isolation forrest?

A

Repeat N times:
* Randomly pick a feature (dimension) f
* Split said f at random between [min, max]
* Continue until all leafs contain singletons
* The path length to reach a leaf is the isolation score
* Average this length over all trees to get the anomaly score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main ways to approach anomaly detection?

A
  • Distance-based: a point is anomalous when it is far from other points
  • Density-based: a point is anomalous when it is in a low density region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to compute local reachability density (lrd)?

A

lrd = #nn/sum of distances from point to the neighbours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to compute local outlier factor (lof)?

A

lof = (1/#nn) * (sum of lrd of neighbours)/lrd of point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a residual?

A

The difference between the expected and real value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is PCA

A

Principal Component Analysis reduces the dimension of data by finding the principal vectors that capture most of the data’s patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is PCA useful for anomaly detection?

A

Outliers have variability in the smallest component (that should be constant)
The set of feature it finds should explain most of the variance. If Datapoints vary in unexplained dimensions are anomalous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does dimensionality reduction find outliers in general?

A

Abnormal data will map to abnormal code, so when reconstructing, this will give us a more clear view of the anomalies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is edit distance?

A

The amount of changes required to a series to be equal to another series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the properties of a distance metric?

A

d > 0
d(a,b) = 0 iff a == b
d(a,b) = d(b,a)
d(a,c) <= d(a,b) + d(b,c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does K-means work?

A

Randomly choose cluster centers for how many clusters you want.

Find the nearest neighbours -> compute centroid and assign each observation to the cluster with the closest center to the observation.

Iterate until the cluster assignments stop changing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does hierarchical clustering work?

A

Treat each observation as a cluster.
Find the closest two clusters and merge. Repeat until all points belong to one cluster.
Create dendrogram -> remove links from top until you have as many clusters as you want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the types of cluster distance?

A

Single linkage: distance between closest points from the different clusters

Compllete linkage: distance between the furthest points in the clusters

Average linkage: average distance between all pairs of points.

Ward distance: difference between the total within cluster sum of squares for the two clusters separately, and the within cluster ss result from merging the two clusters in one cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does hierarchical clustering need the distance to be a metric?

A

Single, complete, and average linkage only require symmetry and non-negative properties

Ward and centroid-based require euclidean distance since they minimize squared error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does DBSCAN work?

A

Classify points as Core, Border, and Noise, based on a radius e and a t expected neighbours.

Core: if number of points at distance e or less is >= t but has core in circle

Border: if -//- is < t but has no core in circle

Only requires symmetry and non-negative

17
Q

Graph based (spectral) clustering

A

Construct neighbourhood graph G, based on the distance of the points being below a threshold, or nearest neighbors. Set the weight on the edges as the distance. Find communities using graph mining. Return found communitie as clusters.

18
Q

When is a clustering considered good in terms of inter and intra - cluster distances?

A

When the Intra/Inter ratio is small.

19
Q

What is CURE?

A

Cluster using representatives is useful for non-centroid clustering. Since is it very expensive, we can select a couple of points to represent a cluster (prototyping)

20
Q

What does BFR do?

A
  • aimts to minimize time and space required by l-means
  • points can be assigned to three sets:
    > discard set - assigned to a pre-existing cluster
    > compressed set - mini-clusters
    > retained set - outliers, points that do not belong to any cluster
21
Q

What do we require for a cluster in BFR?

A
  • N - the number of data points in the cluster
  • SUM - a vector containing with sums of all feature values
  • SUMSQ - a vector with sums of squared feature values

With these we can keep track of the centroid of the cluster, and compute a threshold for cluster membership.

22
Q

What are the 3 types of anomalyes?

A

Point: a singular datapoint is anomalous w.r.t to the rest of the data.
Contextual: Detectable depending on context. Using sliding windows and the assumption that the next datapoint should be close to the previous, large leaps are considered anomalous
Collective: The individual instances within the collective anomaly are not anomalour by themselves. They require a relationship among data instances (sequential, spatial, graph). Also detected with sliding windows.

23
Q

What is hamming distance?

A

The number of different bits in two bit strings.

24
Q

What does the silhouette 1 and -1 values correlate with.

A

1 is good clustering, -1 is mixed