Week 6 - Machine Learning I Flashcards
Define Artificial Intelligence (AI)
AI is the science and engineering of making intelligent machines
Define Machine Learning (ML)
ML is a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Define Deep Learning
Deep Learning uses computational structures known as ‘neural networks’ to automatically recognise patterns in data and provide a suitable output, such as a prediction or evidence for a decision
Define Natural Language Processing (NLP)
NLP could be as simple as counting word frequencies to compare different writing styles, to ‘understading’ complete human utterances
What are the types of machine learnings?
- Supervised learning
- Unsupervised learning
- Reinforcement learning
What is supervised machine learning?
Supervised machine learning takes a set of input features (predictors) and output variables (i.e., labelled) and learns a mapping function between input and output
What is unsupervised machine learning?
Unsupervised machine learning learns patterns from unlabeled data (without the explicit output variable)
What is reinforcement machine learning?
The agent (learning system) learns to perform a task by interacting with an unknown environment and learns a policy that maximises the reward. Dialogue generation and machine translation are application areas in NLP for reinforcement learning.
What is the typical process of Machine Learning?
- Obtains the data
- Goes through preprocessing
- Input features
- Output
- Machine learning
What are examples of preprocessing?
Preprocessing includes:
1. converted text to lower case
2. special characters and stopwords are removed
3.removal of URLSs, numbers, punctuation, stopwords, whitespace and stemming
What are examples of input features?
Input features would include:
1. Bag of Words
2. TF - IDF
3. TF - IDF with ngrams
What are examples of output features?
Output features would include:
1. Recruitment and not-recruitment
2. Predatory and non-predatory
What are examples of Machine Learning processes?
- Naive Bayes
- Support vector machine
- Logistic regression
- Neural networks
How many sub-classes of supervised learning are there?
There are two.
Classification - training dataset with input features and discrete output (target or class variables). The goal of classification is to learn the mapping function to map the input to the discrete output
Regression - predict the real or continuous value of output (target variable) given a set of input features called predictors
What classifies as binary classification and multiclass classification?
Binary classification - deals with two possible classes (e.g., spam or not spam, fraudulent or legitimate, hate or not-hate, hate-speech or normal-speech)
Multiclass classification - deals with three or more classes (e.g., fake-news, partially fake, true, unknown // identity theft, cyberstalking, fraudulent sales, legitimate)
What’s the key challenge of supervised learning?
- learn from the data (training data)
- How does it work on ‘unseen’ data
What is a optimal model in ML terms and what are its key challenges?
- Underfitting - too simple model, and it fails to capture the relationship between input and output variables. High training error on training and high test error on unseen (new) data
- Overfitting - low training error and high test error
- Optimal fitting - low training error and low test error
What is the process of applied machine learning in practice 1?
- Problem statement - identify the appropriate task for ML
- Data acquisition
- Clean and pre-process data
What is the process of applied machine learning in practice 2?
- Feature engineering and selection of features
- Select appropriate machine learning algorithms
- Train the model
- Model validation and evaluation: tune the hyperparameters and evaluate the performance
- Interpret the results so you can deploy the model
How can you choose the best algorithm for the task?
You can use the algorithm cheat-sheet in which it looks at which area of data it should be in: classification/ clustering/ regression/ dimensionality reduction
What is Naive Bayes?
Naive Bayes is a simple and fast supervised machine learning algorithm used for classification (binary and multiclass).
This is known as a probabilistic classifier and applies the bayes theorem.
What is Support Vector Machine (SVM)?
SVM is used for classification, regression and outlier detection (one class SVMs) and it performs well for complex (high-dimensional feature) and on small-medium size datasets
What is the goal of SVM?
SVM’s goal is to have the maximum margin hyperplane that provides the largest distance between the two classes and new observations are predicted based on the side of the hyperplane they fall to
How to test if the model in R performs ell on unseen (new) data?
You should split your dataset into training and test sets. Then you test the performance of the model on the new data.