Exam 3 Flashcards
(43 cards)
What is Unsupervised learning (clustering)?
- the class labels of training data are unknown
- given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
What do decision trees do?
identify ways to split a data set
What does a decision tree start with?
Root Node
What predicts discrete labels?
classification
What predicts continuous quantity or values?
regression
What does multi-class classification require?
requires that a sample only have one class
What is a small portion of a decision tree called?
sub-tree
Type of classification algorithms in machine learning? (4)
- linear classifiers
- k-nearest-neighbors - decision trees
- support vector machines
- neural networks
The data used to view a classification model is called…
Training Data
In supervised learning, training data includes both ____ and _____
input & desired output
Validation data is used for…
testing the model
For SVM the trick is to do ____ ______ data mapping
high dimensional
The effectiveness of SVM depends on…
- section
- parameters
SVM are useful alternative to which model?
ANN
To divide the data into distinct groups so that points in a group are very similar is the main point of what model?
K means clustering
Example of non-probabilistic binary linear classifiers
SVM specifically using the kernel method
In supervised learning, training data is accompanied by…
class labels indicating the class of observation
The mathematical methods of choosing the best split are… (2)
Entropy & Information Gain
For decision tree, the splitting method is by…
reduction in variance
What is Overfitting?
Model is too specific to training data and may have poor accuracy for unseen samples
Two approaches to avoid overfitting
pre-pruning & post-pruning
The basic algorithm for decision trees is
recursive partitioning (top-down recursive divide-and-conquer manner)
Typically the ______ between each pair of adjacent values is considered as a possible split point
midpoint
Random forest used the ____ ____ to construct decision trees
gini index