General ML, Evaluation & GDPR Flashcards Preview

AML > General ML, Evaluation & GDPR > Flashcards

Flashcards in General ML, Evaluation & GDPR Deck (35)
Loading flashcards...

What does the statement 'Machine Learning is an ill-posed problem' mean?

- An ill-posed problem is a problem for which a unique solution cannot be determined using only the information that is available.
- In terms of ML, the training set represents only a small sample of possible sets of instances in the domain
- A consistent model cannot be found based on the sample training dataset alone.
- If a predictive model is to be useful, it must be able to make predictions for queries that are not present in the data.
- A predictive model that makes the correct predictions for these queries captures the underlying relationship between the descriptive features and target features and is said to **generalize well**.



Analytics Base Table


Inductive bias

- necessary for learning to occur
- the set of assumptions that defines the model selection criteria of a machine learning algorithm
- two types (restriction, preference)


Two types of inductive bias

1. Restriction bias
2. Preference bias


Restriction Bias

Constrains the set of models that the algorithm will consider during the learning process


Preference Bias

Guides the learning algorithm to prefer certain models over others


No Free Lunch Theorem

There's no single inductive bias that's best to use


What is Predictive Data Analytics?

The art of building and using models that make predictions based on patterns extracted from historical data


Applications of predictive data analytics

- price prediction
- dosage prediction
- risk assessment
- propensity modelling (likelihood of an individual or customer to take different actions)
- diagnosis
- document classification


Consistency of a model?

~ memorizing the dataset
- consistency with noise in the data isn't desirable
- coverage through memorization is never possible in real problems


What is the goal of a predictive model?

A model that generalizes well beyond the dataset and that is invariant to the noise in the datast


What is under-fitting?

Occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target features.


What is over-fitting?

Occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data.


Goldilocks model

Strikes a good balance between under-fitting and over--fitting
- found by using ML algorithms with appropriate inductive biases


2 defining characteristics of ensembles

1. Build multiple different models from the same dataset by inducing each model using a modified version of the dataset
2. Makes a prediction by aggregating the predictions of the different models in the ensemble


What is an ensemble?

A prediction model that is composed of a set of models is called a model ensemble.
- Rather than creating a single model, they generate a set of models and then make predictions by aggregating the output of these models


Motivation behind ensembles

The idea that a committee of experts working together on a problem are more likely to solve it successfully than a single expert working alone


Bayes Optimal Ensemble

- an ensemble of all the hypotheses in the hypothesis space
- on average, no other ensemble can outperform it
- not possible to practically implement a Bayes Optimal Classifier
- no upper limit (Theory of Large Numbers: As the number of samples gets bigger, your estimate will get better).
- Setting the number of ensembles really really high is going to give you good performance.


2 properties of good ensembles

1. Individual models should be strong
2. Correlation between model should be weak


What is the bias/variance trade-off?

TLDR: High bias = underfitting, high-variance = overfitting

The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).


Why is it that local minima ain't so bad after all

In the case of neural nets, local minima are not necessarily that much of a problem.

1. Some of the local minima are due to the fact that you can get a functionally identical model by permuting the hidden layer units, or negating the inputs and output weights of the network etc.

2. Also if the local minima is only slightly non-optimal, then the difference in performance will be minimal and so it won't really matter.

3. Lastly, and this is an important point, the key problem in fitting a neural network is over-fitting, so aggressively searching for the global minima of the cost function is likely to result in overfitting and a model that performs poorly.

Regularization such as weight decay can help combat overfitting


In practice local minima are rarely a problem with large networks. Discuss.

LeCun, Bengio & Hinton (2015) Nature

Regardless of initial conditions, the system nearly always reaches solutions of very similar quality.

Recent theoretical and empirical results suggest -> not a serious issue

Instead, the landscape is packed with a combinatorially large nuber of saddle points where the gradient is zero, and the surface curves up in most dimensions and curves down in the remainder

Analysis seems to show that saddle points with only a few downward curving directions are present in very large numbers, but almost all of them have very similar values of the objective function. So, it doesn't matter if it's get stuck at these points


Recommendations of GDPR paper (Wachter et al. 2018)

1. Add a right to explanation to legally binding Article 22
2. Clarify 'significance...envisaged consequences...logic involved'.
3. Clarify 'solely' for automated processing
4. Clarify 'legal' or 'significant effect' of automated processing
5. Clarify 'necessary for entering or performance of a contract'
6. Clarify if a prohibition is meant by 'right not to be subject to'
7. Implement external auditing mechanism for automated decision-making (counterweight to trade secret)
8. Support further research to alternative accountability mechanisms


GDPR in a nutshell

- 25th May 2018
- replaces 1995 Data Protection directive
- transparency, security, accountability
- standardizing and strengthening the rights of an individual to data privacy

Regulations surrounding profiling, automated decision making


Why might be DL become illegal according to GDPR paper?

1. General data protection
2. Prohibition on profiling/automated decision-making
3. Right to explanation


GDPR - General Data protection

1. Direct personal data
2. Indirect personal data

Onus on data controllers to be responsible

Data subjects can request to have info erased, object to direct marketing, inaccuracies corrected, restrict automated processing, data portability


Article 22 - Prohibition on profiling/automated DM

Allowed under 3 conditions
1. necessary for contract
2. allowed under member state law
3. explicit consent

- right not to be subject to... but what is a 'legal effect' or 'similarly significant effect'?


Right to explanation

* system functionality
* specific decision


What's the verdict - does GDPR mandate a right to explanation?

Consensus is no
Article 22 is vague (maybe intentionally)
Recital 71 - some hope - but just guidance and not legally binding


Broadly speaking, what is the difference between evaluation ML methods in industry versus academia?

Industry - evaluate a model that we would like to deploy for a specific task

Academia - compare ML methods


Reasons for evaluation ML model in industry

1. determine which model is most suitable for a task
2. estimate performance after deployment
3. convince users that model will meet their needs


Reasons for evaluation ML model in academia

1. Evaluate the performance of a new method against existing baselines
2. Determine best ML approach for a problem
3. Perform benchmark experiment

All boils down to comparing multiple approaches on multiple datasets


Key difference between evaluation in industry versus academia

Significance testing


Performances measures for Industry

1. macro-averaging vs micro-averaging
2. hold-out test
3. k-fold CV


Performance measures for academia

Two-fold process:

1. Friedman Aligned Rank Test - test if there's a significant difference between the performances of the algos across the datasets (p < 0.05)
2. Nemenyi Test - If there was a significant difference in part 1, find out where the difference exists between algo-pairings

Nemenyi Test -> Significance matrix, and Critical Differences Plot