Supervised Learning Flashcards

(37 cards)

1
Q

What is supervised learning?

A

A subcategory of M.L. defined by the use of labeled input/output sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between regression and classification?

A

Regression is used to predict continuous values such as price or income. The goal is to find a best-fit line. Classification is used to predict a discrete class label, goal: decision boundary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of problems can you solve with classification and regression?

A

Regression: weather prediction, housing price prediction
Classification: spam detection, speech recognition, cancer cell identification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is training set error performance unreliable?

A

Doesn’t generalize to unseen data. Perfect training set performance equals overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine learning?

A

A field of artificial intelligence concerned with algorithms that can learn from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Two main branches of Machine Learning?

A

Supervised learning
Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two main branches of Machine Learning?

A

Supervised learning
Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 requirements for machine learning?

A

1) A pattern exists
2) that cannot be pinned down mathematically
3) We have data on it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define data (for M.L)

A

Input - correct output pairs (feature, label)
input - real-valued or categorial
output - real-valued (regression) or categorical (classification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Goal of supervised learning?

A

To model dependency between features and labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Goal of a supervised learning model?

A

To predict labels for new instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a training set?

A

A set of input - output pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classification output value types?

A

Categorical or binary (-1,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regression output value type?

A

Real numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Examples of supervised learning problems?

A

Junk mail:
features - word frequencies
class - junk/not junk

Access Control System:
features - images
class - ID of the person

Medical diagnosis:
features: BMI, age, symptoms, test results
class: diagnostic code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Formal components of learning.

A

Input (x) - e.g. customer application
Output (y) - (approval/denial of application)
Target function: x -> y (ideal credit approval formula)
Data {(x1, y1), … (xn, yn)} (historical records)
Hypothesis: g; X -> Y
Hypothesis set (H): group of functions where we look for our solution
Supervised learning uses test data to learn this function from H that can be applied to new data.

17
Q

Building blocks for an M.L. algorithm?

A

Model class (hypothesis set) e.g.
-linear or quadratic function
-decision tree
-neural network, clustering
Error measure (Score function)
Algorithm - good model defined by the score function
Validation

18
Q

Dangers of overfitting

A

The model memorizes training data and does not generalize beyond it.
100% accuracy on training data, can’t do better than random guessing on new instances.

19
Q

Dangers of underfitting

A

Model not expressive enough, for ex. linear functions on non-linear problems.

20
Q

Approximation-Generalization tradeoff

A

Goal: to approximate target function as closely as possible.
More complex hypothesis set: better chance of approximating target function f
Less complex hypothesis set: better chance of generalizing f outside of the training set

21
Q

Ideal hypothesis set H

A

H = {f}, we already know the target function, no need for M.L.

22
Q

Occam’s Razor

A

the principle that favors the simplest hypothesis (set) that can well explain a given set of observations.

23
Q

Criteria for a good model

A

Interpretability
Computational complexity

24
Q

How to control Hypothesis set complexity?

A

With hyperparameters.
-max degree of polynomials
-no of nearest neighbors
-regularization parameter
-depth of decision tree

25
What kind of methods should you start with?
Simple: linear regression, kNN, naive bayes - easier to understand -less tuning, less risk of overfitting -often just as good as more advanced methods.
26
K Nearest Neighbors
-Classic method (1951) -Classification based on k most similar training instances -parameter k tunes model complexity -can learn complex non-linear functions
27
kNN Classification
Choose the majority class among k nearest neighbors for prediction.
28
kNN Regression
Take the mean value of k nearest neighbors for prediction
29
Disadvantages of kNN predictor
All k nearest neighbors have the same influence on prediction. Maybe closer neighbors should have more influence?
30
Distance measure in kNN
Standard: euclidean distance Others: Manhattan, Mahalanobis, Chebyshev, Hamming
31
Small vs large k in kNN
small: local complex model, depends on a handful of instances large: global, simpler model, averaged over large set of instances
32
kNN, k=1?
Overfitting! 0% training error, but won't generalize
33
Advantages of kNN
-simple -non-linear modeling -simple model complexity tuning (k) -customizable (distance measure, feature/neighbor weighing) -good results in many applications
34
Disadvantages of KNN
-Large computational/memory complexity (O(nm) where m is the dimensionality of the data) -sensitive to scaling -Irrelevant features problematic -black box- -not state of the art
35
Criteria to be balanced in learning.
Fit to data (low error) vs model complexity
36
4 main ingredients of a kNN algorithm.
Distance metric Number of neighbors (k) Weighting function for neighbors Prediction function
37
Method to automatically determine the appropriate k value for kNN.
Cross-validation.