[1] Data Mining Flashcards

(104 cards)

1
Q

What is KDD?

A

Knowledge Discovery in Databases; the process of extracting knowledge from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the steps for KDD?

A

[1] Selection - choose which data to use
[2] Pre-processing - handle missing data and errors
[3] Transformation - change the data format; consider feature reduction or extraction
[4] Data mining - apply algorithms to the transformed data to generate results
[5] Interpretation and evaluation - i.e. visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is data mining important?

A

The exponential growth of data exceeds human abilities to understand it themselves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What factors influence the selection of DM algorithms?

A
  • The kind of data i.e. numeric or symbolic
  • Whether the task is supervised or unsupervised
  • What is the desired output i.e. a blackbox or a tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the ways to approach KDD?

A
  • Top-down discovery - analyst suggests hypotheses; results are analysed to see if they support this
  • Bottom-up discovery - system automatically suggests patterns, analyst considers if they are relevant
  • Mixed - analyst provides an area of search, and refine this based on the hypotheses generated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are database queries different from DM?

A

Database queries return a subset of the data or an aggregate of it

DM outputs a KDD object i.e. a rule or tree which dind’t exist before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a data warehouse?

A

A collection of data on a single subject where the number of datasets can change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is KDD applied to websites?

A

Web content mining - collect and analyse the content of pages
Web structure mining - use graph theory to study how pages are connected with hyperlinks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is IR?

A

Information Retrieval i.e. search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

At a high-level, what do classification algorithms do?

A

Find separating curves. Note that SVM provides the equations for these, while neural networks only give the regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What defines AI systems?

A

They are machines that mimic cognitive functions such as learning and problem solving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are fuzzy sets?

A

In contrast to crisp sets, they allow partial membership based on an membership function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do set operators work for fuzzy sets?

A

AND(x,y) -> m(x N y) = min(m(x), m(y))
OR(x,y) -> m(x V y) = max(m(x), m(y))
NOT(x) -> m(~x) = 1 - m(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the process for fuzzy logic?

A

~ Fuzzify the input values into membership functions
~ Use rules to compute fuzzy outputs
~ De-fuzzify the output functions to get crisp outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main DM tasks?

A

Classification - map data into #pre-defined# classes
Regression - map data to real-valued output
Prediction - predict a future state #in time# based on current state
Time Series Analysis - special case of classification, regression and prediction when the attributes vary over time
[C.R.P.T]

Clustering - the groups are pre-defined, so it is unsupervised
Segmentation - special cases of clustering where the DB is partitioned into #dis-jointed# sets
Association Rules - find relationships between instances i.e. items that frequently occur together
Sequence Discovery - find relationships based on time i.e. if X then Y will happen
[C.S.A.S]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the assumptions used for Bayes classifiers?

A
  • Only one hypothesis (class) can occur at a time

- Conditional independence - P(X1, X2 | Y) = P(X1|Y) * P(X2|Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a key practical consideration for Bayes classifiers?

A

When the conditional etc. probabilities are calculated, start the counts at 1 to avoid multiplying by 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What at the names of the probabilities in Bayes classifiers?

A

Posterior - p(hj|xi) - [this is the target]
Prior - p(hj)
Unconditional probability - p(x)
Conditional probability - p(xi|hj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do SVMs do?

A

Find a hyperplane which correctly classifies the data points and #separates them as far as possible#

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In the context of SVMs, what are margins and what types are there?

A

This is a measure of how well they separate the data

A hard margin is the distance to the nearest point; a soft margin penalizes close points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a key limitation of SVMs, and how can it be addressed?

A

They can only classify data if it is linearly separable.

However, kernels transform data into higher dimensions so they do become separable (the actual hyperplanes are always linear)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the limitations of SVM?

A
  • They require selection a good kernel
  • They use a lot of memory and CPU
  • Special consideration is needed for multiple classes i.e. non-binary problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do SVMs handle multiple classes?

A

Transform the problem into a set of binary problems, and either:

  • one-vs-rest - train a classifier for each class to determine if it is or isn’t that class
  • one-vs-many - train a classifier for each pair of classes, and use a voting protocol
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the context of DM, what is segmentation?

A

A special case of clustering where the tuples are disjointed i.e. aren’t neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the capacity of a function?
A measure of its expressive power and flexibility etc.
26
What does shattering mean?
A function f(theta) shatters a set of point X if for all assignments of classes to that point, there exists a theta such that all the points are correctly classified
27
What is the VC Dimension?
A measure of #capacity# for a function based on the largest number of points it can shatter. Note that it might only work for a particular set of locations for the points
28
What is Structural Risk Minimization?
The process of finding models with the lowest *risk* Risk is the chance of prediction error on some unseen data points which are iid for some unknown distribution
29
How is risk found?
The *expected risk* is the true risk, but it can't be determined However, it is bounded by the *empirical risk* (based on points already classified) and the *VC dimension confidence* (based on the number of points and the model's VC)
30
How the risk of a model be reduced?
- Train a better model to reduce to empirical risk - Minimize the VC dimension i.e. use a simpler model - Use more data points, as this lowers the VC dimension confidence
31
Why is Structural Risk Minimization often infeasible?
The VC dimension can only be calculated for a limited number of models
32
What are some key ways to measure the error of an algorithm?
- TSS (total sum of squared error) - MSE - RMSE - R-squared - the ratio of the error to the variance of the response
33
What is the advantage of R-squared, and how is it calculated?
It is easily interprete. R2 = 1 - TSS/Var(y)
34
What is accuracy? What is its opposite?
Accuracy is the portion of instances correctly classified. *Error rate* is the portion of instances incorrectly classified
35
What is TPR?
The portion of all actually true instances which are classified as true. It is also known as sensitivity or recall TPR = TP / (TP + FN)
36
What is sensitivity?
The TPR
37
What is recall?
The TPR
38
What is TNR?
The portion of truly negative instances which are classified as negative It is also known as specificity
39
What is specificity?
TNR
40
How are FPR and FNR related to TPR and TNR
They sum to 1 i.e.e FPR+TPR = 1
41
What is an ROC curve?
It plots how the FPR on the x-axis affects the TPR on the y-axis
42
What is the worst possible result on an ROC curve?
A diagonal line (y=x), if it were below this, predictions could be flipped
43
What is AUC and what are its limits?
It is bounded between 0.5 and 1, with higher values being better However: - it is difficult to calculate when the points are not well spread across the ROC space - it depends a lot on the non-interesting part of the ROC curve
44
What is the distance measure?
The distance from the center of the ROC and the intersection of the ROC curve and the minor diagonal
45
What metrics are used for object detection?
- Detection rate (DR) - False alarm / objects (FA/O) - Extended ROC
46
What is detection rate?
objects found / total objects
47
What is false alarm / objects?
non-objects reported / total objects It can be greater than 100% for difficult problems
48
What is the extended ROC curve?
It has FA/O on the x-axis and DR on the y-axis Unlike ROC, models can perform worse than the diagonal
49
What evaluation metrics are used for information retrieval?
Precision and recall
50
In the context of IR, what is precision?
The quality of records retrieved: relevant documents retrieved / number of documents retrieved
51
In the context of IR, what is recall?
How useful the information retrieved was: number of relevant documents retrieved / total relevant documents in DB
52
What are the disciplines within computer imaging?
* computer vision* - the outputs are for machines | * image processing* - the outputs are for people
53
What are two tasks to do with improving the quality of images?
* image restoration* - return image to its original appearance * image enhancement* - make image better than ever
54
What is dynamic range?
The ratio between brightest and darkest values that can be shown i.e. the contrast ratio
55
How many colors does 8 bit color give?
256 i.e. 0 to 255
56
What are images outside human's perceptual range called?
Multi-spectral
57
Describe JPEG images.
They have 24 bit color (16 million colors); compress well but have artifacts; don't support transparency or animation
58
Describe PNG images.
They use lossless compression. Tehy support 24-bit color but can't be animated
59
Describe GIF images.
Support transparency and animation. They use a palette of 256 colors. They are lossless if less than this many colors
60
Describe TIFF images.
They support a range of color formats i.e. indexed color, 24-bit RGB, and 32-bit RGBA They can have a range of compression methods and support layers
61
Describe BMP images.
They support 8-bit, 16-bit or 24-bit RGB color, but don't scale well
62
What is the image format taught in this course?
PNM (Portable aNyMap). It supports: - binary (.pbm) - grayscale (.pgm) - pixmap (.ppm)
63
What is the format of PNM images?
- A magic number - The width of the image - The height of the image - The maximum grayscale value i.e. 15 or 255 - Optional: comments starting with # - The actual image data: pixels separated by whitespace
64
What are the magic numbers for PNM images?
P2 for PGM in ASCII, P5 for PGM in RAWBITS
65
What are the basic types of image pre-processing?
- Region of Interest Extraction - Image Algebra - Spatial filters
66
What is ROI extraction?
Use cropping, zooming, shrinking, translation or rotation focus on part of the image
67
What is the opposite of zooming?
Shrinking
68
What are the techniques for zooming?
*zero-order hold* repeats pixel values; *first-order-hold* uses averaging or convolution etc.
69
What is image algebra used for?
- Addition: superimpose image - Subtraction: remove additive noise or detect motion - Multiplication: adjust brightness (generally with a constant) - Logical operators: masking
70
What are spatial filters?
Image operations which change each pixel based on its neighborhood. They include: - mean filters - median filters - enhancement filters - image quantization
71
What is special about mean filters?
They are form of *linear filter*, which means that: - If the coefficients sum to 1, the average brightness is maintained - If the coefficients sum to 0, the average brightness is lost - If the coefficients alternate positive and negative, edge information will be returned - If the coefficients are all positive, the image will be blurred
72
What are enhancement features? What are the main types?
They enhance features in an image Laplacian filters enhance the image in all directions equally: - -1 on edges with 8 in the centre - -1 in NESW, 4 in the centre, 0 on the diagonals Difference filters enhance in a particular direction. Note they have two 1s and one -1
73
What is image quantization?
The process of reducing the amount of image data. It includes: - Grey-level reduction - Spatial reduction
74
What is grey-level reduction?
A form of image quantization in which the number of grey levels is reduced. Thresholding makes it binary based on whether the value is above or below the threshold
75
What is spatial reduction?
Image quantization where groups of pixels that are #spatially adjacent# are mapped to a single pixel This can be done with averaging, median or *decimation* in which some of the pixels are simply eliminated
76
What is edge detection?
The process of identifying sharp changes in brightness over short spatial distances This is particularly difficult in noisy images
77
What are the methods for edge detection?
- Roberts operator - Sobel operator - Canny edge detector
78
What is Roberts operator?
It only marks edge points. It does so by checking change of intensity in the diagonal direction. It is generally used for binary images
79
What is the Sobel operator?
Detect edges in the horizontal and vertical directions. The kernels are: [-1 -2 -1; 0 0 0; 1 2 2] for horizontal (S1) The edge magnitude is sqrt(S1^2 + S2^2). The direction in atan(S1/S2)
80
What is the Canny edge detector?
A general process for edge detection: - Smooth the image using Gaussian to minimize the impact of noise - Estimate the gradients using partial derivatives or Sobel - Thin edges using *non-maxima suppression* (suppress pixels that aren't larger than their neighbor in the positive and negative #gradient direction#
81
What is image segmentation?
It divides images into segments (#connected# pixels)
82
How is connectivity of pixels defined?
Four-connectivity, eight-connectivity and six-connectivity (includes top-left and bottom-right)
83
What is grey level segmentation? What are its limitations?
Uses thresholding to segment based on brightness. It is very vulnerable to noise. The threshold can be chosen manually, or automatically using K-means clustering or by minimizing group variance
84
What is a general algorithm for image segmentation?
The *splitting and merging algorithm* [1] Define a *homogeneity test* to measure how similar regions are [2] Split the image into equally sized regions [3] Calculate the homogeneity measure for each region. If the test is passed, attempt to merge with neighbors; Otherwise, split [4] Repeat [3] until all regions pass the homogeneity test
85
What are the variations of the splitting are merging algorithm?
Growing - no splitting is applied | Shrinking - no merging is applied
86
How can changes in brightness over space be analysed?
By transformation from the spatial domain to the frequency domain (spectral domain)
87
What are the affects of transformation to the frequency domain?
The image will have the same size Areas of rapid brightness change correspond to high frequency.
88
How are images transformed to the frequency domain?
A set of basis images is used, where each basis image is the same size of the original image T(u,v) = SumSum I(r,c)B(r,c;u,v) Fourier transforms are generally used; cosine transforms don't use sine functions
89
How is filtering applied to the frequency domain?
Low-pass filters eliminate high-frequencies, blurring the image High-pass frequencies can be used for edge detection
90
What are non-ideal filters?
Those that let some of the signal through if it is close to the threshold
91
What are wavelet transforms?
Transforms that include both frequency and spatial information They break the image into four sub-sampled images by high-pass and low-pass filtering the image in both directions They are commonly used for image compression
92
What is feature extraction for images? What are some key considerations?
Feature extraction is the general process of turning image data into feature vectors The features may be domain specific or domain independent Good features are RST invariant i.e. unaffected by rotation, scale or transformation
93
What are some key ways of doing feature extraction in images?
- Using binary objects of interest - Histogram features - Global features - LBP - SIFT - Pixel Statistics
94
How are features extracted from a binary object of interest?
Features can be extracted based on its: - area - center of area - axis of the least second moment (gives an indication of its orientation) - thinness - the Euler number
95
What is thinness?
A measure of how thin an object is based on the ratio of its area to perimeter: T = 4 pi A/P^2 Note that a circle has a thinness of 1, a very thin object approaches 0
96
What is the Euler number of an object?
The number of objects in an image minus the number of holes
97
What are histogram features?
Global features based on the image's brightness or contrast
98
How do global and local features compare semantically?
Global features generally assume that there is only one object in the image; local filters can be more difficult because you have to implement the case when there are multiple objects Local features generally perform better than global if there is significant clutter or occlusion
99
What is LBP?
Local Binary Pattern is a #global# feature based on: - For each central pixel, identify p points at distance r from the central pixel - Calculate the difference between the central pixel and each pixel - Encode these differences in binary - 1 if the positive values and 0 for negative - Create a histogram of these binary codes across the image
100
What is SIFT?
An algorithm for matching features between images.
101
What should be considered when evaluating image features?
- Resolution - what is the minimum size objects that can be distinguished - Number of features - algorithms take longer if more features are used - Invariances - are the features affected by rotation or lighting changes etc. Redundancy - could fewer features encode the same amount of information? Completeness - can the original image be reconstructed based on the features
102
What are pixel statistics?
The image is divided into regions and aggregates of pixel values are computed for each region
103
What are the basic approaches to image classification?
Template matching - the template is swept across the image to find matches Nearest neighbor - find labelled images most similar to the current image
104
How are detected objects classified?
With a cutout square. It should be large enough to contain any object, but not large enough to contain multiple objects