Lecture 17: 12th November 2019 Flashcards Preview

CS4203 Computer Security > Lecture 17: 12th November 2019 > Flashcards

Flashcards in Lecture 17: 12th November 2019 Deck (38)
Loading flashcards...

What are Intelligent Intrusion Detection Systems?

Conceptual IDSes that are accurate enough in their detection and classification of instructions to be called intelligent.


What are some types of IDS?

- Host Based (HIDS)
- Network Based (NIDS)
- Hypervisor Based
- Application Based
- Protocol Based
- Server Based


What are HIDS?

Host-Based IDS (HIDS): These look at logfiles to verify Message digests/checksums of key system files. They are effectively an internal system monitor. An intrusion detection system that monitors the computer infrastructure on which it is installed, analyzing traffic and logging malicious behavior.

A Network-based IDS (NIDS) looks for attack signatures in network traffic whereas s Host-based IDS (HIDS) looks for attack signatures in log files of hosts.


What are NIDS?

Network-Based IDS (NIDS): these can be trained to recognise attack signatures via pattern matching (e.g. expressions, bytecode), frequency or threshold crossing (uncommon port usage). IDSes that are intelligently distributed within networks that passively inspect traffic traversing the devices on which they sit and report back to an administrator.

A Network-based IDS (NIDS) looks for attack signatures in network traffic whereas s Host-based IDS (HIDS) looks for attack signatures in log files of hosts.


What are hypervisor-based IDSes?

A proposed cloud-based IDS to overcome the limitations of network load in a cloud.


What are application-based IDSes?

IDSes which control data exchange per application, such as a web browser or email client.


What are protocol-based IDSes?

IDSes which control one protocol, e.g. all HTTP traffic


What are server-based IDSes?

IDSes shared by a subnetwork or server group


What are some methods of intrusion detection used in IDSes? Which of them require a training phase?

- Misuse Based
- Anomaly Based
- Classification Based
- Combination Based or Hybrid Approaches

last 3 require a training phase


What is Misuse Based intrusion detection?

The misuse-based approach uses a set of signatures representing the patterns of already known attacks to filter malicious activities. They are matched against previously defined patterns - can't be applied to new intrusion types dynamically. Uses general-purpose GPUs.


What is Anomaly Based intrusion detection?

An anomaly-based intrusion detection system is an intrusion detection system for detecting both network and computer intrusions and misuse by monitoring system activity and classifying it as either normal or anomalous. The classification is based on statistics, ML algorithms, or heuristics, rather than discrete patterns or signatures, and attempts to detect any type of misuse that falls out of normal system operation. This is as opposed to signature-based systems, which can only detect attacks for which a signature has previously been created.


What is Classification Based intrusion detection?

IDSes using classification algorithms or methods such as Binary or Multi-Classification, Decision Trees, SVM, NN, Bayes, KNN (K Nearest Neighbour). These require
pre-phase of labelling but can be applied to new intrustion types.


What is Combination Based/Hybrid Approaches intrusion detection?

use the best of each other type of technique but suffer from high computational costs


Which new technologies might give rise to new types of intrusion detection?

cloud computing and IoT


What is batch processing? What is stream processing? Which types of IDS use each?

Batch processing = load in large chunks of data and then process it.

Stream processing = process data as it comes in.

Traditional ones (all discussed) are almost always batch processing but with cloud and IoT (and large-scale applications) stream processing makes more sense - give you real-time results.


What are ensemble methods? When are they used?

Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking). Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

They're used to build single models from big datasets.


What is drift?

Change in a model property over time


What are the two types of drift we talk about?

concept and feature


What is concept drift?

Concept Drift is when a data distribution varies over time (abrupt, incremental, gradual or recurring drifts) and describes the nature of network traffic.

In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes.


What is feature drift?

Feature Drift is when features change over time as changes in data patterns dictate different levels of features (packet types). Feature drift occurs whenever a subset of features becomes, or ceases to be, relevant to the learning task; thus, learners must detect and adapt to these changes accordingly.


What is MOA?

Massive Online Analysis = an open-source framework software that allows to build and run experiments of machine learning or data mining on evolving data streams.


What is the quality of datasets like in IDS?

Poor. Most are outdated and sparse because companies don't want to share their data, which has an adverse effect on the quality of IDSes, for commercial and privacy-related reasons. Datasets from before 2000 are still commonly used today in academia - companies probably have their own in-house datasets.


What are some common problems with IDS?

Some pre-processing to address data cleaning and to format and remove redundant, duplicate, or incomplete values. Some fragmeneted packets may be removed. Some data may also be transformed into normalised forms.

With any predictive model building, a training phase must occur. This often uses features selection such as source and destination addresses, protocol types, length of the packet, timings, flags, etc. Another technique is to use dimension reduction which combines features into subgroups to create a lower number of manipulable dimensions.

Also class balancing: having accurate IDSes even with few anomalous packets vs vast majority normal - overweights normal ones

Lack of realistic modern datasets to develop new IDS techniques. Old and synthetic data are used.

Compensating for and mitigating drift.

Maintaining high network performance (speed) whilst still detecting intrusions and maintaining security.

Lots of time needed for training with some types of predictive models. Tradeoffs of different algorithms and techniques - none at all near perfect.


What are classes in IDS?

Labelled groups applied to packets, such as normal vs anomalous


What is the class balancing problem?

The idea that the vast majority of traffic being normal leads to inaccurate prediction algorithms in IDSes. The difference can make ML/DM algorithms favour normality too much to be correct more often, resulting in a high FAR/FPR.


How can you gain or split data samples for training and sampling with predictive models?

- Leave One Out (LOO) method uses K-fold partitioning which is a technique to build a model on K-1 folds of the data (splits of the dataset) and evaluate against the final fold and

- Hold out (K=2, implying one part for training, one for testing)

- Prospective sampling uses a new sampled dataset separate from the training set.

- Randomization methods use sample instances without replacement.


What is a confusion matrix?

A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. It is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.


How can confusion matrices evaluate the accuracy of a predictive data model for IDSes?

The column headers are the predicted values (e.g. true and false) and the row headers are the actual values (e.g. true false). Cells hold the frequency at which each combo (e.g. predicted false, was actually true) is seen.

This lets you find the true pos, true neg, false pos, and false neg counts and rates.


What are some accuracy measures and their formulae to calculate them?

Sensitivity is the True Positive Rate, i.e. the detection rate or recall (TPR) = TP / (TP + FN)

Specificity is the True Negative Rate (TNR) = TN / (TN + FP)

Precision = TP / (TP + FP)

The F-measure is the Harmonic Mean of Precision and Sensitivity = 2TP / (2TP + FN + FN)

The Geometric Mean (G-mean) measure computes accuracy over every class separately and then computes their geometric mean.

G-mean = square root (TPR x TNR)
= square root [(TP / [TP + FN]) x (TN / [TN + FP])]


In what forms can a ML algorithm return a prediction?

Either into a discrete class/group or as a probability/et of probabilities. For example, could classify a day as "going to be sunny" or "90% chance sunny, 2% cloudy, 8% rain".