20-anomaly detection Flashcards

1
Q

What is an outlier / anomoly?

A

A pattern in the data that does not conform to the normal/standard/expected behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are applications of anomaly detection?

A

Fraud detection
Ecosystem disturbance
Medicine and public health
Aviation safety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the types of anomaly?

A

Point/global anomaly
Contextual/conditional anomalies
Collective anomalies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a point / global anomaly?

A

An individual data instance is anomalous with respect to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of attributes in reference to anomaly detection?

A

Contextual attributes
Behavioural attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are collective anomalies?

A

A subset of data points are anomalous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between anomaly and noise?

A

Noise is random error and not interesting. Anomalies are interesting and

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is supervised anomaly detection?

A

Labels are available for normal data and anomalies. Classifiers distinguish between normal data and anomalies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is semi-supervised anomaly detection?

A

Labels are only available for normal data. Model normal objects and report those not matching the model as outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the challenges with semi-supervised anomaly detection?

A

Requires labels from normal class. Possibly high false alarm rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is unsupervised point anomaly detection?

A

Proximity based, density based, clustering based, statistical anomaly detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is statistical anomaly detection?

A

Anomalies are objects that are fit poorly by aW statistical model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is proximity-based anomaly detection?

A

An object is an anomaly if the nearest neighbours of the object is far away. Compute distance between every pair of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the ways to detect anomalies through proximity?

A
  • Data points for which there are fewer than p neighboring points within a distance D
  • The top n data points whose distance to the kth nearest neighbor is
    greatest
  • The top n data points whose average distance to the k nearest neighbors is greatest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the pros of proximity based anomaly detection?

A

Easier to determine proximity compared to statistical distribution
Quantitative measure of degree to which an object is an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the cons of proximity based anomaly detection?

A

O(n^2) complexity
Outlier score is based on k
Does not work well when there is variable density

17
Q

What is density based outlier detection?

A

Outliers are objects in regions of low density. Outlier score is inverse of density around object

18
Q

What are examples density scores?

A

Number of objects within distance. Inverse of average distance to k nearest neighbours

19
Q

What is cluster based anomaly detection?

A

Outliers are objects that don’t belong strongly to a cluster. Assess degree to which object belongs to any cluster. Use relevant distance to centroids

20
Q

What are the pros of cluster based anomaly detection?

A

O(n) complexity. Extends concept of outlier from single objects to groups.

21
Q

What are the cons of cluster based anomaly detection?

A

Requires distance thresholds. Sensitive to the number of clusters. Outliers may impact clusters.

22
Q
A