Discrete&Continuous Data Flashcards

Question 1

Q

Types of Attribute

Answer

A

Continuous
Ordinal
Nominal

Question 2

Q

Instances aren’t labelled

Answer

A

Unsupervised ML

Question 3

Q

Not enough instances are labelled

Answer

A

Semi-supervised ML

Question 4

Q

instances are all labelled

Answer

A

Supervised ML

Question 5

Q

instances are ordered

Answer

A

sequence learning

Question 6

Q

nominal learners

Answer

A

NB
1-R
DT

Question 7

Q

continuous learners

Answer

A

KNN
NP
SVM

Question 8

Q

Nominal Attributes, but Numeric Learner

Answer

A

(1) For k-NN and NP: Hamming distance
(2) randomly assign numbers to attribute values
• If scale is constant between attributes, this is not as bad an idea as it sounds! (But still undesirable)
• Worse with higher-arity attributes (more attribute values)
• Imposes an attribute ordering which may not exist
(3) one–hot encoding

Question 9

Q

Hamming distances

Answer

A

the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other

Question 10

Q

one–hot encoding

Answer

A

If nominal attribute takes m values, replace it with m
Boolean attributes

Example:
hot = [1, 0, 0]
mild = [0, 1, 0]
cool = [0, 0, 1]

Question 11

Q

Pros & Cons of one-hot encoding

Answer

A

Pro: solve the nominal attribute in continuous learner issue

Con: massively increase the feature space

Question 12

Q

Numeric Attributes, but Nominal Learner

Answer

A

(1) NB
(2) DT
(3) 1-R

Discretization

Question 13

Q

Types of Naive Bayes

Answer

A

• Multivariate NB: attributes are nominal, and can take any
(fixed) number of values

• Binomal (or Bernoulli) NB: attributes are binary (special
case of MV)

• Multinomial NB: attributes are natural numbers,
corresponding to frequencies

• Gaussian NB: attributes are real numbers, use Probability Density Function

Question 14

Q

numeric attributes for DT

Answer

A

(1) Binarisation

(2) Range

Question 15

Q

Binarisation

Answer

A

Each node is labelled with ak , and has two branches: one
branch is ak ≤ m, one branch is ak > m.

Info Gain/Gain Ratio must be calculated for each non-trivial “split point” for each attribute

Question 16

Q

Con of Binarisation

Answer

A

leads to arbitrarily large trees

Question 17

Q

Discretisation

Answer

A

the translation of continuous attributes onto nominal attributes

Steps:

decide on the interval (out-of-scope)
map each continuous value onto a discrete value

Types:

Unsupervised (does not know/use the class label)
Supervised (know/use the class label)

Question 18

Q

Unsupervised Discretisation

Answer

A

(1) Naive
(2) Equal Size
(3) Equal Frequency
(4) K-Means Clustering

Question 19

Q

Naive Unsupervised Discretisation

Answer

A

treat each unique value as a discrete nominal value

Question 20

Q

Pros & Cons of Naive Unsupervised Discretisation

Answer

A

Advantages:
• simple to implement

Disadvantages:
• loss of generality
• no sense of ordering
• describes the training data, but nothing more (overfitting)

Question 21

Q

Equal Size Unsupervised Discretisation

Answer

A

Identify the upper and lower bounds and partition the overall space into n equal intervals = equal width

min = 64
max = 83
interval: 64-70, 71-75; 80-83;

Question 22

Q

Pros & Cons of Equal Size Unsupervised Discretisation

Answer

A

Advantages:
• simple

Disadvantages:
• badly effected by outliers
• arbitrary n

Question 23

Q

Equal Frequency Unsupervised Discretisation

Answer

A

sort the values, and identify breakpoints which
produce n (roughly) equal-sized partitions = equal
frequency

1st bin: 1st-4th instances
2nd bin: 5th-8th instances

Question 24

Q

Pros & Cons of Equal Frequency Unsupervised Discretisation

Answer

A

Advantages:
• simple

Disadvantages:
• arbitrary n

Question 25

Q

K-Means Clustering

Answer

A

(1) Select k points at random (or otherwise) to act as seed
clusters
(2) Assign each instance to the cluster with the nearest
centroid
(3) Compute seed points as the centroids of the clusters of
the current partition (the centroid is the centre, i.e.,
mean point, of the cluster)
(4) Go back to 2, stop when no reassignments (converge)

It may or may not converge at the end but fast converge is fairly typical

One typical improvement runs k-means multiple times (with random seeds), looking for a common clustering and simply ignore runs which don’t converge within τ
iterations

Question 26

Q

Pros & Cons of K-Means Clustering

Answer

A

Strengths:
• relatively efficient: O(tkn), where n is # instances, k is #
clusters, and t is # iterations; normally k,t <= n

Weaknesses:
• tends to converge to local minimum; sensitive to seed
instances
• need to specify k in advance
• not able to handle non-convex clusters
• “mean” ill-defined for nominal attributes

Question 27

Q

Supervised Discretisation

Answer

A

(1) Naive
(2) v1 improvement
(3) v2 improvement

Question 28

Q

Naive Supervised Discretisation

Answer

A

“Group” values into class-contiguous intervals

Steps:
1. Sort the values, and identify breakpoints in class
membership
2. Reposition any breakpoints where there is no change in numeric value
3. Set the breakpoints midway between the neighbouring values

Question 29

Q

Pros & Cons of Naive Supervised Discretisation

Answer

A

Advantages:
• simple to implement

Disadvantages:
• no sense of ordering
• usually creates too many categories (overfitting)

Question 30

Q

Improvement on Naive Supervised Discretisation

Answer

A

v1:
delay inserting a breakpoint until each “cluster” contains at least n instances of the majority class

v2:
merge neighbouring clusters until they reach a certain size/at least n instances of the majority class

Question 31

Q

Probability Mass Functions (PMF)

Answer

A

For a discrete random variable X that takes on a finite or countably infinite number of possible values, we determined P(X = x) for all of the possible values of X, and called it the probability mass function.

Question 32

Q

Probability Density Function (PDF)

Answer

A

For continuous random variables, the probability that X takes on any particular value x is 0. That is, finding P(X = x) for a continuous random variable X is not going to work. Instead, we’ll need to find the probability that X falls in some interval (a, b), that is, we’ll need to find P(a < X < b). We’ll do that using a probability density function

Question 33

Q

a popular PDF

Answer

A

Gaussian/normal distribution

Question 34

Q

Gaussian distribution

Answer

A

symmetric about the mean
area under the curve = 1
to estimate the probability, we need mean µ and standard deviation σ of a distribution X

Question 35

Q

Why Gaussians?

Answer

A

• In practice, a normal distribution is a reasonable
approximation for many events

This is a consequence of the Central Limit Theorem
More careful analysis shows that the mean is almost always normally distributed, but outliers can wreak havoc on our probability estimates

Brainscape's Knowledge GenomeTM

Discrete&Continuous Data Flashcards

Brainscape's Knowledge Genome^TM