Lecture 4: Decision Trees and k-means clustering Flashcards

1
Q

What is A=>B

A

A implies B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Deduction?

A

Conclusion follows necessary from the premi

From A=>B and A, we conclude that B

Example:
“all men are mortal”

socrates is mortal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Abduction

A

Conclusion is one hypothetical (most probable) explanation for the premises

From A ⇒ B and B, we conclude A
 Ex:
Drunk people do not walk straight.

John does not walk straight.
John is drunk.

Not sound… but may be most likely explanation for B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Induction

A

 Conclusion about all members of a class from the examination of only a few member of the class.
From A ∧ C ⇒ B and A ∧ D ⇒ B, we conclude A⇒B
 We construct a general explanation based on a specific
case.
 Ex:
All CS students in COMP 472 are smart.
All CS students on vacation are smart.
All CS students are smart.
 Not sound
 But, can be seen as hypothesis construction or generalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Inductive Learning?

A

 = learning from examples
 Most work in ML
 Examples are given (positive and/or negative) to train a
system in a classification (or regression) task

 Given a new instance X you have never seen
 You must find an estimate of the function f(X) where f(X) is
the desired output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the framework for inductive learning

A

 Input data are represented by a vector of features (attributes), X
 Each vector X is a list of (attribute, value) pairs.
 Ex: X = [nose:big, teeth:big, eyes:big, moustache:no]
 The number of attributes is fixed (positive, finite)
 Each attribute has a fixed, finite number of possible values
 Each example can be interpreted as a point in a n-dimensional feature space
 where n is the number of attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are 3 common techniques in Machine Learning?

A

Probabilistic Methods
 ex: Naïve Bayes Classifier
Decision Trees
 Use only discriminating features as questions in a big if-then-else
tree
Neural networks
 Also called parallel distributed processing or connectionist systems
 Intelligence arise from having a large number of simple
computational units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does a decision tree work?

A

 Look for features that are very good indicators of the result, place these features (as questions) in nodes of
the tree
 Split the examples so that those with different values for the chosen feature are in a different set
 Repeat the same process with another feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to select attribute in a decision tree?

A

search the space of all decision trees
 always pick the next attribute to
split the data based on its
“discriminating power”
(information gain)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 4 different factors that quanitify the size of tree?

A

 Number of leaves
 Height of the tree
 External Path Length
 Weighted External Path Length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is height of a tree?

A

 Longest path in the tree from the root to a leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is External Path length?

A

 Start at leaf, go up to the root and count the number of
edges
 Do this for every leaf and add up the numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is weighted external path length?

A

Weighted External Path Length
 Idea: not all paths are equally important/likely
 Use the training data to computed a weighted sum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for entropy?

A

H(X) = -(summation from i = i to n) [p(xi)log2p(xi)]
where n = possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Formula to choose the best feature

A

gain(S,A) = H(S) - H(S|A)
= H(S) - (summation) |Sv|*H(Sv) / |S|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits of decision trees?

A

 One of the most widely used learning methods in
practice
 Fast, simple, and traceable (explainable AI!)
 Can out-perform human experts in many problems

17
Q

What is the equation of F-Measure and how does beta affect it?

A

A weighted combination of precision and recall
F = [(B^2 + 1) PR ]/ [(B^2)P + R]

 β represents the relative importance of precision
and recall
 when β = 1, precision & recall have same importance
 when β > 1, precision is favored
 when β < 1, recall is favored

18
Q

What is clustering?

A

The organization of unlabeled data into similarity groups called clusters.
A cluster is a collection of data items which are “similar”
between them, and “dissimilar” to data items in other
clusters

19
Q
A