Exam 3 Flashcards

(43 cards)

1
Q

What is Unsupervised learning (clustering)?

A
  • the class labels of training data are unknown
  • given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do decision trees do?

A

identify ways to split a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a decision tree start with?

A

Root Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What predicts discrete labels?

A

classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What predicts continuous quantity or values?

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does multi-class classification require?

A

requires that a sample only have one class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a small portion of a decision tree called?

A

sub-tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Type of classification algorithms in machine learning? (4)

A
  • linear classifiers
    - k-nearest-neighbors
  • decision trees
  • support vector machines
  • neural networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The data used to view a classification model is called…

A

Training Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In supervised learning, training data includes both ____ and _____

A

input & desired output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Validation data is used for…

A

testing the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For SVM the trick is to do ____ ______ data mapping

A

high dimensional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The effectiveness of SVM depends on…

A
  • section
  • parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SVM are useful alternative to which model?

A

ANN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To divide the data into distinct groups so that points in a group are very similar is the main point of what model?

A

K means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of non-probabilistic binary linear classifiers

A

SVM specifically using the kernel method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In supervised learning, training data is accompanied by…

A

class labels indicating the class of observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The mathematical methods of choosing the best split are… (2)

A

Entropy & Information Gain

19
Q

For decision tree, the splitting method is by…

A

reduction in variance

20
Q

What is Overfitting?

A

Model is too specific to training data and may have poor accuracy for unseen samples

21
Q

Two approaches to avoid overfitting

A

pre-pruning & post-pruning

22
Q

The basic algorithm for decision trees is

A

recursive partitioning (top-down recursive divide-and-conquer manner)

23
Q

Typically the ______ between each pair of adjacent values is considered as a possible split point

24
Q

Random forest used the ____ ____ to construct decision trees

25
Trees represent knowledge in the form of _________ rules
IF-THEN
26
The motivation for SVM is to categorize new unseen objects into two separate groups based on their ______ and _______
Properties & a Set of Known Examples already categorized
27
What is one of the key areas in machine learning?
Kernel Methods
28
What are the two key concepts of SVM?
- maximize the margin - the kernel trick
29
What are supervised learning models of associated learning algorithms that analyze data and recognize patterns?
Support Vector Machines (SVM)
30
How do you choose the best support vector in SVM?
Choose the hyperplane that maximizes the margin between classes
31
What are the vectors points that the margin lines touch known as?
Support Vectors
32
Large value of parameter C = _____ margin
Small
33
Small value of parameter C = _____ margin
Large
34
How is distance measured for KNN?
Euclidean distance
35
What do the KNN algorithm assume?
similar things exist in close proximity
36
What is the K value in KNN?
K is the number of existing data points that will be compared to the new data point
37
How are data points assigned in KNN?
The closest "K" neighbors are compared to the new point and assigned to the category in the majority among the neighbors
38
what happens when K is too small?
could be sensitive to noise
39
What happens if K is too large?
neighborhood might include points from other classes
40
The value of k must be : even or odd?
odd to eliminate ties
41
Which model makes NO ASSUMPTIONS about the data?
KNN
42
Typically choose the value of k which has the lowest ____ _____ in _____ data
error rate; validation
43
When using KNN for prediction, the model uses the....
average of response values