Intro ML Flashcards Preview

CS 446 > Intro ML > Flashcards

Flashcards in Intro ML Deck (6)
Loading flashcards...

Machine Learning, definition

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. Machine learning searches for mappings which generalize.


Machine Learning, formulation

- Learn: input data x^{(i)} from dataset
- Predict: ouput y^{(i)}, here binary classification (sometimes called inference)
Depends on w (parameter to learn/fit) and dataset D={(x^{(i)}, y^{(i)})^N_{i=1}}.


Machine Learning algorithms characterisation

- available annotated data (supervised vs. unsupervised)
- Complexity of model (linear vs. non-linear)
- Structure of output (independent vs. structured)
- Modeling of data ((x(i), y(i)) vs. label only (y(i)) (generative vs. discriminative) – generative learn from both input and output


Nearest neighbor, definition

Dataset: D = {(x(i), y(i))}^N_{i=1}
y=y(k) where
k=arg min_{i∈{1,...,N }} ∥x(i)−x∥^2_2 = arg min_{i∈{1,...,N }} d(x(i),x)


Nearest neighbor, shortcomings

- Sensitive to outliers → improve using multiple neighbors (k-Nearest neighbor) / majority (weighted)
- Defining distance (categorical / text data)
- Can be slow at prediction (# of datapoints & dimensions)
- Storage can be difficult


Nearest neighbor, applications

- recommendation systems
- cluster analysis
- spell check