Lecture 17: 12th November 2019 Flashcards

Question

What is the class balancing problem?

Answer 1

The idea that the vast majority of traffic being normal leads to inaccurate prediction algorithms in IDSes. The difference can make ML/DM algorithms favour normality too much to be correct more often, resulting in a high FAR/FPR.

Answer 2

- Leave One Out (LOO) method uses K-fold partitioning which is a technique to build a model on K-1 folds of the data (splits of the dataset) and evaluate against the final fold and - Hold out (K=2, implying one part for training, one for testing) - Prospective sampling uses a new sampled dataset separate from the training set. - Randomization methods use sample instances without replacement.

Answer 3

A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. It is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.

Answer 4

The column headers are the predicted values (e.g. true and false) and the row headers are the actual values (e.g. true false). Cells hold the frequency at which each combo (e.g. predicted false, was actually true) is seen. This lets you find the true pos, true neg, false pos, and false neg counts and rates.

Answer 5

Sensitivity is the True Positive Rate, i.e. the detection rate or recall (TPR) = TP / (TP + FN) Specificity is the True Negative Rate (TNR) = TN / (TN + FP) Precision = TP / (TP + FP) The F-measure is the Harmonic Mean of Precision and Sensitivity = 2TP / (2TP + FN + FN) The Geometric Mean (G-mean) measure computes accuracy over every class separately and then computes their geometric mean. G-mean = square root (TPR x TNR) = square root [(TP / [TP + FN]) x (TN / [TN + FP])]

Answer 6

Either into a discrete class/group or as a probability/et of probabilities. For example, could classify a day as "going to be sunny" or "90% chance sunny, 2% cloudy, 8% rain".

Answer 7

The cutoff point for a probability value to fall within a certain group or classification. For example, if there is a 95% or more chance a day will be sunny, class it as predicted to be sunny.

Answer 8

Receiver Operator Characteristic = a plot of the FPR (x) against TPR (y) for every discriminating threshold value used to classify instances. Since only examining positive rates, y = x is poor: want high y and low x.

Answer 9

AUC = area under the curve for ROC graphs. It's a binary measure of the performance of the model (1 = good, 0 = bad). Since only examining positive rates, y = x is poor: want high y and low x.

Answer 10

IDS performance depends on the patterns of users and the characteristics of various services and protocols. The use of adaptive thresholds is a way to approach changing traffic profiles. Anomaly-based systems tend to have high false positive rates (alarms) because of their current model's inability to adapt to changes in data patterns. Batch learning systems of training followed by testing are slow to react to network traffic changes. Proposed real-time learning systems would aggregate data from distributed sensors on hosts into a single data warehouse. However, using fixed thresholds to flag anomalies may not be as accurate as adaptable thresholds. Scaleability issues also occur here and various Data Mining, Markov chain models etc. have been tried but all perform poorly when using high volume and realistic data.

Answer 11

Investigating how well thresholds adapt and their accuracy with different algorithms.

Answer 12

Changing (tuning) thresholds improve detection rates. Some algorithms are more adaptable (tunable) than others - improve the most by changing (tuning) thresholds. Threshold tuning reduces the severe effects of traffic variability on detection models. 10% (or less) of a full dataset is required to tune thresholds to get a G-mean accuracy of 90%. Random forest only needed .05%: little data needed for training (or retraining) and a lot can be left for use in testing.

Answer 13

Use MOA to help build changing predictive models by either the strategic training of multiple weak models or to adapt a model once a drift is detected.

Answer 14

- undersampling of the majority class or oversampling of the minority class - cost function, based approaches to assign costs/ weights to minority instances. - one-class anomaly learning methods to build a model using one class

Lecture 17: 12th November 2019 Flashcards

Intelligent Intrusion Detection Systems (38 cards)