Supervised And Un-Supervised Learning Flashcards by Emma Bowles

Supervised learning

Task: learns from labelled training data (each sample/instance/data point consists of an input and a desired output)
Goal: find/train a model to match the output

Algorithms: linear regression, decision tree, ANN

How well did you know this?

Not at all

Perfectly

Unsupervised learning

Task: learns form unlabelled data to describe hidden structure/pattern
Goal: task dependent (eg. Group data points, find the distribution)

Algorithms: K-means, Apriori algorithm

How well did you know this?

Not at all

Perfectly

Reinforcement learning

Learns over time via trial and error using feedback/awards from actions
Goal: learn a policy to make decisions (to maximise rewards)

Algorithms: deep neural networks, Q-learning, deep Q-networks

How well did you know this?

Not at all

Perfectly

Training data is used to…

Learn the parameters of the model

How well did you know this?

Not at all

Perfectly

What is the process for supervised learning

Prepare the training data
There should be input and output datasets
Partition the dataset into two subsets
70% training - 30% testing

Train the model on the relevant training data

Test and evaluate the models on the test set
How accurate the trained model works on the test set

How well did you know this?

Not at all

Perfectly

What is the test set used for

How to generalise to new data
Unbiased estimate of how well the model works

How well did you know this?

Not at all

Perfectly

What does data pre-processing involve

Processing data into a suitable format for machine learning

Use pandas data frame library
Improve model accuracy
Reduce noise and bias

Data collection and data transformation

How well did you know this?

Not at all

Perfectly

What is data collection

Gathering relevant data from sources

How well did you know this?

Not at all

Perfectly

What is data transformation

Encode categorical values
Normalise numerical data
Split data (training, testing)

How well did you know this?

Not at all

Perfectly

What is data cleaning

Handling missing data, imputation
Removing, outliers and duplicates

How well did you know this?

Not at all

Perfectly

What is clustering

Unsupervised learning

Group similar objects
Given: un-labelled dataset D and similarity/distance metric
Goal: find natural partitioning, or groups of similar data points

How well did you know this?

Not at all

Perfectly

K-means clustering

Choose number of clusters ‘k’

Initialise k cluster centroids randomly
Repeat until a max number of iterations
A. Assign each data point to the nearest centroid (based on distance metric)
B. Update the centroids by computing the mean of all their data points assigned

Output the final cluster assignments and cnetroids

How well did you know this?

Not at all

Perfectly

Association Rules

Unsupervised learning

Discover correlation between two or more given variables
Given: a set of records containing items
Goal: produce dependency rules, to predict the occurrence of variable X with variable(s) Y
Algorithms: Apriori algorithm, frequent pattern growth

How well did you know this?

Not at all

Perfectly

Steps of building a k-means clustering model (unsupervised learning example)

Prepare the training
Data preparation
Understand the problem (which approaches will you use)
Read in the dataset
Observe/visualise the dataset

Training a k-means model
- Prepare the training data x
- Create a k-means clustering model

Results
What are the k centres
Show the centroids

How well did you know this?

Not at all

Perfectly

Categorical vs continuous supervised learning

These are labels
If the data is categorical we use classification
If the data is continuous we use regression

How well did you know this?

Not at all

Perfectly

Regression models

Study These Flashcards

Show the relationship between y and x

Regression training

Study These Flashcards

Based on the given data find the function that minimises its mean squared error to ‘fit’ the samples

Polynomial linear regression cons

Study These Flashcards

Can run into issues of overfitting, fits too well with noise/outliers/errors

Pros and cons of regression

Study These Flashcards

Short training time
Easy to interpret
Easy to implement

Sensitive to noises and outliers
Cannot handle complicated relationships

What is classification

Study These Flashcards

Learn to predict to which set an instance belongs based on pre-labels (classified) instances

Decision tree - internal nodes

Study These Flashcards

Decision rules on features (decision variables, input)

Decision tree - branches

Study These Flashcards

Course of decision or action

Decision tree - leaf nodes

Study These Flashcards

A predicted class label (output)

Pros and cons of decision tree

Study These Flashcards

Reasonable training time
Can handle large number of features
Easy to implement
Easy to interpret

Only simple decision boundaries
Problems with lots of missing data
Cannot handle complicated relationships
Over-complex tree : overfitting

What does ANN stand for

Artificial neural networks

Neural Networks history

1943 - first neural network Combine simple inputs 1949 - first learning rule Added fixed weights 1950s-60s - Perceptron Converge to correct weights learning —> thinking The AI winter Mid 80s solved non-linear separable problem with multi layered networks

Basic model of a neural network

Neurons connected by direct weighted paths Each neuron has a threshold t Weighted sum input to the neuron has to be greater than or equal to the threshold for the neuron to fire

Training neural networks

Epoch - the entire training set feed to the neural network The AND function - an epoch consists or four sets of inputs (patterns) to feed into the network ([0,0],[0,1],[1,0],[1,1]) Training value Y - the value that we require the network to produce (Supervised learning) Error, err - the output by the network O differs from the training value T LR = learning rate Xi = inputs to the neuron Wi = weight from input to output While epoch produces an error Check nest input/patterns from epoch Err = Y - O If Err <> 0 then Wi = wi + LR * Xi * Err End if End while

Pros and cons of ANN

Can learn more complicated class boundaries Can be more accurate Can handle large number of features Hard to implement - trial and error for choosing parameters and network structure Slow training time Can overfit the data - find pattens in random noise Hard to interpret

What is deep learning

Based on artificial neural networks, multiple layers of processing are used to extract progressively higher level features from data

Excitatory vs Inhibitory weights

Excitatory weights encourage the neuron to fire Inhibitory weights prevent the neuron from firing

What is linearly separable data

Where data points (classes) can be separated using a linear boundary Only linearly separable functions can be represented by a single layer NN

Supervised And Un-Supervised Learning Flashcards

(32 cards)