Discrete&Continuous Data Flashcards

1
Q

Types of Attribute

A

Continuous
Ordinal
Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Instances aren’t labelled

A

Unsupervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Not enough instances are labelled

A

Semi-supervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

instances are all labelled

A

Supervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

instances are ordered

A

sequence learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

nominal learners

A

NB
1-R
DT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

continuous learners

A

KNN
NP
SVM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal Attributes, but Numeric Learner

A

(1) For k-NN and NP: Hamming distance
(2) randomly assign numbers to attribute values
• If scale is constant between attributes, this is not as bad an idea as it sounds! (But still undesirable)
• Worse with higher-arity attributes (more attribute values)
• Imposes an attribute ordering which may not exist
(3) one–hot encoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hamming distances

A

the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

one–hot encoding

A

If nominal attribute takes m values, replace it with m
Boolean attributes

Example:
hot = [1, 0, 0]
mild = [0, 1, 0]
cool = [0, 0, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pros & Cons of one-hot encoding

A

Pro: solve the nominal attribute in continuous learner issue

Con: massively increase the feature space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Numeric Attributes, but Nominal Learner

A

(1) NB
(2) DT
(3) 1-R

Discretization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Naive Bayes

A

• Multivariate NB: attributes are nominal, and can take any
(fixed) number of values

• Binomal (or Bernoulli) NB: attributes are binary (special
case of MV)

• Multinomial NB: attributes are natural numbers,
corresponding to frequencies

• Gaussian NB: attributes are real numbers, use Probability Density Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

numeric attributes for DT

A

(1) Binarisation

(2) Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Binarisation

A

Each node is labelled with ak , and has two branches: one
branch is ak ≤ m, one branch is ak > m.

Info Gain/Gain Ratio must be calculated for each non-trivial “split point” for each attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Con of Binarisation

A

leads to arbitrarily large trees

17
Q

Discretisation

A

the translation of continuous attributes onto nominal attributes

Steps:

  1. decide on the interval (out-of-scope)
  2. map each continuous value onto a discrete value

Types:

  1. Unsupervised (does not know/use the class label)
  2. Supervised (know/use the class label)
18
Q

Unsupervised Discretisation

A

(1) Naive
(2) Equal Size
(3) Equal Frequency
(4) K-Means Clustering

19
Q

Naive Unsupervised Discretisation

A

treat each unique value as a discrete nominal value

20
Q

Pros & Cons of Naive Unsupervised Discretisation

A

Advantages:
• simple to implement

Disadvantages:
• loss of generality
• no sense of ordering
• describes the training data, but nothing more (overfitting)

21
Q

Equal Size Unsupervised Discretisation

A

Identify the upper and lower bounds and partition the overall space into n equal intervals = equal width

min = 64
max = 83
interval: 64-70, 71-75; 80-83;

22
Q

Pros & Cons of Equal Size Unsupervised Discretisation

A

Advantages:
• simple

Disadvantages:
• badly effected by outliers
• arbitrary n

23
Q

Equal Frequency Unsupervised Discretisation

A
sort the values, and identify breakpoints which
produce n (roughly) equal-sized partitions = equal
frequency

1st bin: 1st-4th instances
2nd bin: 5th-8th instances

24
Q

Pros & Cons of Equal Frequency Unsupervised Discretisation

A

Advantages:
• simple

Disadvantages:
• arbitrary n

25
Q

K-Means Clustering

A

(1) Select k points at random (or otherwise) to act as seed
clusters
(2) Assign each instance to the cluster with the nearest
centroid
(3) Compute seed points as the centroids of the clusters of
the current partition (the centroid is the centre, i.e.,
mean point, of the cluster)
(4) Go back to 2, stop when no reassignments (converge)

It may or may not converge at the end but fast converge is fairly typical

One typical improvement runs k-means multiple times (with random seeds), looking for a common clustering and simply ignore runs which don’t converge within τ
iterations

26
Q

Pros & Cons of K-Means Clustering

A

Strengths:
• relatively efficient: O(tkn), where n is # instances, k is #
clusters, and t is # iterations; normally k,t <= n

Weaknesses:
• tends to converge to local minimum; sensitive to seed
instances
• need to specify k in advance
• not able to handle non-convex clusters
• “mean” ill-defined for nominal attributes

27
Q

Supervised Discretisation

A

(1) Naive
(2) v1 improvement
(3) v2 improvement

28
Q

Naive Supervised Discretisation

A

“Group” values into class-contiguous intervals

Steps:
1. Sort the values, and identify breakpoints in class
membership
2. Reposition any breakpoints where there is no change in numeric value
3. Set the breakpoints midway between the neighbouring values

29
Q

Pros & Cons of Naive Supervised Discretisation

A

Advantages:
• simple to implement

Disadvantages:
• no sense of ordering
• usually creates too many categories (overfitting)

30
Q

Improvement on Naive Supervised Discretisation

A

v1:
delay inserting a breakpoint until each “cluster” contains at least n instances of the majority class

v2:
merge neighbouring clusters until they reach a certain size/at least n instances of the majority class

31
Q

Probability Mass Functions (PMF)

A

For a discrete random variable X that takes on a finite or countably infinite number of possible values, we determined P(X = x) for all of the possible values of X, and called it the probability mass function.

32
Q

Probability Density Function (PDF)

A

For continuous random variables, the probability that X takes on any particular value x is 0. That is, finding P(X = x) for a continuous random variable X is not going to work. Instead, we’ll need to find the probability that X falls in some interval (a, b), that is, we’ll need to find P(a < X < b). We’ll do that using a probability density function

33
Q

a popular PDF

A

Gaussian/normal distribution

34
Q

Gaussian distribution

A
  • symmetric about the mean
  • area under the curve = 1
  • to estimate the probability, we need mean µ and standard deviation σ of a distribution X
35
Q

Why Gaussians?

A

• In practice, a normal distribution is a reasonable
approximation for many events

  • This is a consequence of the Central Limit Theorem
  • More careful analysis shows that the mean is almost always normally distributed, but outliers can wreak havoc on our probability estimates