Lecture 17: 12th November 2019 Flashcards

Intelligent Intrusion Detection Systems

1
Q

What are Intelligent Intrusion Detection Systems?

A

Conceptual IDSes that are accurate enough in their detection and classification of instructions to be called intelligent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some types of IDS?

A
  • Host Based (HIDS)
  • Network Based (NIDS)
  • Hypervisor Based
  • Application Based
  • Protocol Based
  • Server Based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are HIDS?

A

Host-Based IDS (HIDS): These look at logfiles to verify Message digests/checksums of key system files. They are effectively an internal system monitor. An intrusion detection system that monitors the computer infrastructure on which it is installed, analyzing traffic and logging malicious behavior.

A Network-based IDS (NIDS) looks for attack signatures in network traffic whereas s Host-based IDS (HIDS) looks for attack signatures in log files of hosts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are NIDS?

A

Network-Based IDS (NIDS): these can be trained to recognise attack signatures via pattern matching (e.g. expressions, bytecode), frequency or threshold crossing (uncommon port usage). IDSes that are intelligently distributed within networks that passively inspect traffic traversing the devices on which they sit and report back to an administrator.

A Network-based IDS (NIDS) looks for attack signatures in network traffic whereas s Host-based IDS (HIDS) looks for attack signatures in log files of hosts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are hypervisor-based IDSes?

A

A proposed cloud-based IDS to overcome the limitations of network load in a cloud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are application-based IDSes?

A

IDSes which control data exchange per application, such as a web browser or email client.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are protocol-based IDSes?

A

IDSes which control one protocol, e.g. all HTTP traffic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are server-based IDSes?

A

IDSes shared by a subnetwork or server group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some methods of intrusion detection used in IDSes? Which of them require a training phase?

A
  • Misuse Based
  • Anomaly Based
  • Classification Based
  • Combination Based or Hybrid Approaches

last 3 require a training phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Misuse Based intrusion detection?

A

The misuse-based approach uses a set of signatures representing the patterns of already known attacks to filter malicious activities. They are matched against previously defined patterns - can’t be applied to new intrusion types dynamically. Uses general-purpose GPUs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Anomaly Based intrusion detection?

A

An anomaly-based intrusion detection system is an intrusion detection system for detecting both network and computer intrusions and misuse by monitoring system activity and classifying it as either normal or anomalous. The classification is based on statistics, ML algorithms, or heuristics, rather than discrete patterns or signatures, and attempts to detect any type of misuse that falls out of normal system operation. This is as opposed to signature-based systems, which can only detect attacks for which a signature has previously been created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Classification Based intrusion detection?

A

IDSes using classification algorithms or methods such as Binary or Multi-Classification, Decision Trees, SVM, NN, Bayes, KNN (K Nearest Neighbour). These require
pre-phase of labelling but can be applied to new intrustion types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Combination Based/Hybrid Approaches intrusion detection?

A

use the best of each other type of technique but suffer from high computational costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which new technologies might give rise to new types of intrusion detection?

A

cloud computing and IoT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is batch processing? What is stream processing? Which types of IDS use each?

A

Batch processing = load in large chunks of data and then process it.

Stream processing = process data as it comes in.

Traditional ones (all discussed) are almost always batch processing but with cloud and IoT (and large-scale applications) stream processing makes more sense - give you real-time results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are ensemble methods? When are they used?

A

Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking). Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

They’re used to build single models from big datasets.

17
Q

What is drift?

A

Change in a model property over time

18
Q

What are the two types of drift we talk about?

A

concept and feature

19
Q

What is concept drift?

A

Concept Drift is when a data distribution varies over time (abrupt, incremental, gradual or recurring drifts) and describes the nature of network traffic.

In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes.

20
Q

What is feature drift?

A

Feature Drift is when features change over time as changes in data patterns dictate different levels of features (packet types). Feature drift occurs whenever a subset of features becomes, or ceases to be, relevant to the learning task; thus, learners must detect and adapt to these changes accordingly.

21
Q

What is MOA?

A

Massive Online Analysis = an open-source framework software that allows to build and run experiments of machine learning or data mining on evolving data streams.

22
Q

What is the quality of datasets like in IDS?

A

Poor. Most are outdated and sparse because companies don’t want to share their data, which has an adverse effect on the quality of IDSes, for commercial and privacy-related reasons. Datasets from before 2000 are still commonly used today in academia - companies probably have their own in-house datasets.

23
Q

What are some common problems with IDS?

A

Some pre-processing to address data cleaning and to format and remove redundant, duplicate, or incomplete values. Some fragmeneted packets may be removed. Some data may also be transformed into normalised forms.

With any predictive model building, a training phase must occur. This often uses features selection such as source and destination addresses, protocol types, length of the packet, timings, flags, etc. Another technique is to use dimension reduction which combines features into subgroups to create a lower number of manipulable dimensions.

Also class balancing: having accurate IDSes even with few anomalous packets vs vast majority normal - overweights normal ones

Lack of realistic modern datasets to develop new IDS techniques. Old and synthetic data are used.

Compensating for and mitigating drift.

Maintaining high network performance (speed) whilst still detecting intrusions and maintaining security.

Lots of time needed for training with some types of predictive models. Tradeoffs of different algorithms and techniques - none at all near perfect.

24
Q

What are classes in IDS?

A

Labelled groups applied to packets, such as normal vs anomalous

25
Q

What is the class balancing problem?

A

The idea that the vast majority of traffic being normal leads to inaccurate prediction algorithms in IDSes. The difference can make ML/DM algorithms favour normality too much to be correct more often, resulting in a high FAR/FPR.

26
Q

How can you gain or split data samples for training and sampling with predictive models?

A
  • Leave One Out (LOO) method uses K-fold partitioning which is a technique to build a model on K-1 folds of the data (splits of the dataset) and evaluate against the final fold and
  • Hold out (K=2, implying one part for training, one for testing)
  • Prospective sampling uses a new sampled dataset separate from the training set.
  • Randomization methods use sample instances without replacement.
27
Q

What is a confusion matrix?

A

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.

28
Q

How can confusion matrices evaluate the accuracy of a predictive data model for IDSes?

A

The column headers are the predicted values (e.g. true and false) and the row headers are the actual values (e.g. true false). Cells hold the frequency at which each combo (e.g. predicted false, was actually true) is seen.

This lets you find the true pos, true neg, false pos, and false neg counts and rates.

29
Q

What are some accuracy measures and their formulae to calculate them?

A

Sensitivity is the True Positive Rate, i.e. the detection rate or recall (TPR) = TP / (TP + FN)

Specificity is the True Negative Rate (TNR) = TN / (TN + FP)

Precision = TP / (TP + FP)

The F-measure is the Harmonic Mean of Precision and Sensitivity = 2TP / (2TP + FN + FN)

The Geometric Mean (G-mean) measure computes accuracy over every class separately and then computes their geometric mean.

G-mean = square root (TPR x TNR)
= square root [(TP / [TP + FN]) x (TN / [TN + FP])]

30
Q

In what forms can a ML algorithm return a prediction?

A

Either into a discrete class/group or as a probability/et of probabilities. For example, could classify a day as “going to be sunny” or “90% chance sunny, 2% cloudy, 8% rain”.

31
Q

What is the discriminating threshold?

A

The cutoff point for a probability value to fall within a certain group or classification. For example, if there is a 95% or more chance a day will be sunny, class it as predicted to be sunny.

32
Q

What is an ROC curve?

A

Receiver Operator Characteristic = a plot of the FPR (x) against TPR (y) for every discriminating threshold value used to classify instances. Since only examining positive rates, y = x is poor: want high y and low x.

33
Q

What is the AUC measure?

A

AUC = area under the curve for ROC graphs. It’s a binary measure of the performance of the model (1 = good, 0 = bad). Since only examining positive rates, y = x is poor: want high y and low x.

34
Q

What are some research issues with IDS?

A

IDS performance depends on the patterns of users and the characteristics of various services and protocols. The use of adaptive thresholds is a way to approach changing traffic profiles.

Anomaly-based systems tend to have high false positive rates (alarms) because of their current model’s inability to adapt to changes in data patterns. Batch learning systems of training followed by testing are slow to react to network traffic changes.

Proposed real-time learning systems would aggregate data from distributed sensors on hosts into a single data warehouse. However, using fixed thresholds to flag anomalies may not be as accurate as adaptable thresholds. Scaleability issues also occur here and various Data Mining, Markov chain models etc. have been tried but all perform poorly when using high volume and realistic data.

35
Q

What did Ishbel and Al Tobi’s IDS research involve?

A

Investigating how well thresholds adapt and their accuracy with different algorithms.

36
Q

What results did Ishbel and Al Tobi’s IDS research give?

A

Changing (tuning) thresholds improve detection rates. Some algorithms are more adaptable (tunable) than others - improve the most by changing (tuning) thresholds.

Threshold tuning reduces the severe effects of traffic variability on detection models.

10% (or less) of a full dataset is required to tune thresholds to get a G-mean accuracy of 90%. Random forest only needed .05%: little data needed for training (or retraining) and a lot can be left for use in testing.

37
Q

What can combat drift?

A

Use MOA to help build changing predictive models by either the strategic training of multiple weak models or to adapt a model once a drift is detected.

38
Q

How have researches sought to combat the class balancing problem?

A
  • undersampling of the majority class or oversampling of the minority class
  • cost function, based approaches to assign costs/ weights to minority instances.
  • one-class anomaly learning methods to build a model using one class