# Machine Learning Flashcards

What is Machine Learning? (ML)

The study of computer algorithms that improve automatically through experience and the use of data. It’s part of AI

How does ML work?

Machine learning algorithms build a model based on sample data, known as “training data”, in order to make

predictions or decisions without being explicitly programmed to do so.

How do you formalize ML?

ML can be described as a function Y=H(X) where the goal is to find the most simple H which predicts Y using X as input for a given prediction accuracy

What do you call the performance of H in matching Y using X?

The Objective function

How do you find the objective function?

Obj(H) = L(H) + omega(H)

where L is the matching error

and Omega is the regularization term/complexity of H

What does ML consist of in terms of the objective function?

Minimizing the Obj(H) as the best potential compromise between prediction accuracy and complexity

What are the main categories of Machine Learning?

Supervised: classification & regression

Unsupervised: clustering, association & dimension reduction (generalization)

What is the difference between supervised and unsupervised ML?

Supervised: data is pre-categorized

Unsupervised: data is not labeled

What are the main ML application/tasks?

Forecasting and classification

What are the main categories of ML engines

-Linear/non-linear regressions

-Random forests and boosted trees

-Deep learning and neural networks

What is a linear regression?

You model the relationship between two variables Y and X where X explains Y such that:

Y= aX+b

where a=Cov(Y,X)/Var(X)

and B=E(Y)-aE(X)

(remember Y is what you want to predict and X is the explanatory variable)

What do you need for the regression to be complete?

The mean of the residue should be normally distributed with a mean of 0

What are the steps in training AI predictive models?

Building the model

Training the model on sample data

Testing the model on different sample data

What is one of the main challenges in training ML algorithms?

Avoiding overfitting so that it only works on the training data sample

How do you avoid overfitting?

You keep the model as simple as possible (few parameters)