Foundations Flashcards

1
Q

Computer vision

A

Image detection and classification. Near realtime video analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Computer vision ML model

A

Uses neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 components of neural networks

A

Input layer: receives input data
Hidden layer: finds features in the data that have predictive power based on labels
Output layer: Generates the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CNN

A

Convolutional neural networks used in modern day computer vision. Provides for faster training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Feature extraction

A

In CNNs this describes the process of how hidden layers extract different information from images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Image classification

A

What’s in this image. Text detection, OCR and content moderation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Object detection

A

More granular than image classification. How many objects or are there different classes of the object in the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Semantic segmentation

A

Goes down to the pixel level. Identifies which part of the image is the object in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Activity recognition

A

Based around videos. Added time component. Detects changes that occur over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Machine learning (ML)

A

is a modern software development technique and a type of artificial intelligence (AI) that enables computers to solve problems by using examples of real-world data. It allows computers to automatically learn and improve from experience without being explicitly programmed to do so.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

supervised learning

A

every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

unsupervised learning

A

there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

reinforcement learning

A

the algorithm figures out which actions to take in a situation to maximise a reward (in the form of a number) on the way to reaching a specific goal. This is a completely different approach than supervised and unsupervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

model

A

an extremely generic program, made specific by the data used to train it. A more technical definition would be that a machine learning model is a block of code or framework that can be modified to solve different but related problems based on the data provided.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model training algorithms

A

work through an interactive process where the current model iteration is analysed to determine what changes can be made to get closer to the goal. Those changes are made and the iteration continues until the model is evaluated to meet the goals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Model inference

A

is when the trained model is used to generate predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

5 machine learning steps

A
  1. Define the problem
  2. Build the dataset
  3. Train the model
  4. Evaluate the model
  5. Use the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

labels

A

refers to data that already contains the solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Supervised vs unsupervised learning

A

Labelled data is supervised. Unlabelled data is unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

clustering

A

Used in unsupervised learning. Used to determine if there are any naturally occurring groups in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

categorical label

A

Has a discrete value (Sunday, Monday etc). Often used in classification tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

continuous label (regression)

A

No discrete possibility, means you’re likely working with numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Discrete

A

A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week).

24
Q

Using unlabelled data

A

you don’t need to provide the model with any kind of label or solution while the model is being trained.

25
Q

The Four Aspects of Working with Data

A
  1. Data collection
  2. Data inspection
  3. Summary statistics
  4. Data visualisation
26
Q

Data collection

A

SQL queries, web scraping etc. Perhaps run a model over the data to generate the labels. Question: Does the data you’ve collected match the machine learning task and problem you have defined?

27
Q

Data inspection

A

Quality of the data will affect how your model performs. . Look for outliers, missing or incomplete values and maybe use a pre-processor to transform the data into the correct format

28
Q

Summary statistics

A

check that your data is in line with the underlying assumptions of your chosen machine learning model.
Use mean, inner quartile range, standard deviation. This provides insight into the scope, scale, and shape of the dataset.

29
Q

Data visualization

A

Use this to see outliers and trends in the data.

30
Q

Impute

A

referring to different statistical tools which can be used to calculate missing values from your dataset

31
Q

Outliers

A

data points that are significantly different from others in the same sample

32
Q

Dataset splitting

A

How: Randomly split the data

  1. Training dataset - most of the data is here. 80%
  2. Testing dataset - used to evaluate the model against unseen data. i.e. how well the model will generalise to new data
33
Q

What does the model training algorithm do

A

The model training algorithm iteratively updates a model’s parameters to minimize some loss function.

34
Q

Model parameters

A

Model parameters are settings or configurations the training algorithm can update to change how the model behaves

35
Q

Loss function

A

A loss function is used to codify the model’s distance from this/its goal.

36
Q

The end-to-end training process is

A
  1. Feed the training data into the model.
  2. Compute the loss function on the results.
  3. Update the model parameters in a direction that reduces loss.

continue to cycle through these steps until you reach a predefined stop condition. This might be based on a training time, the number of training cycles, or an even more intelligent or application-aware mechanism

37
Q

model selection

A

A process to determine which model to use

38
Q

Hyperparameters

A

settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify.

39
Q

machine learning frameworks

A

Practitioners often use machine learning frameworks that already have working implementations of models and model training algorithms. You could implement these from scratch, but you probably won’t need to do so unless you’re developing new models or algorithms.

40
Q

*Linear models

A

One of the most common models covered in introductory coursework, linear models simply describe the relationship between a set of input numbers and a set of output numbers through a linear function (think of y = mx + b or a line on a x vs y chart).

41
Q

*Tree-based models

A

Tree-based models are probably the second most common model type covered in introductory coursework. They learn to categorize or regress by building an extremely large structure of nested if/else blocks, splitting the world into different regions at each if/else block. Training determines exactly where these splits happen and what value is assigned at each leaf region.

42
Q

*Deep learning models

A

Extremely popular and powerful, deep learning is a modern approach based around a conceptual model of how the human brain functions. The model (also called a neural network) is composed of collections of neurons (very simple computational units) connected together by weights (mathematical representations of how much information to allow to flow from one neuron to the next). The process of training involves finding values for each weight.

43
Q

*neural network structures

A

FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.
CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.
RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.
Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.

44
Q

*Machine Learning Using Python Libraries

A

For more classical models (linear, tree-based) as well as a set of common ML-related tools, take a look at scikit-learn. The web documentation for this library is also organized for those getting familiar with space and can be a great place to get familiar with some extremely useful tools and techniques.
For deep learning, mxnet, tensorflow, andpytorch are the three most common libraries. For the purposes of the majority of machine learning needs, each of these is feature-paired and equivalent.

45
Q

Model accuracy

A

Accuracy is the fraction of predictions a model gets right.

46
Q

*Log loss

A

seeks to calculate how uncertain your model is about the predictions it is generating. In this context, uncertainty refers to how likely a model thinks the predictions being generated are to be correct.

47
Q

Model inference

A

generate predictions on real-world problems using unseen data in the field

48
Q

*root mean square (RMS)

A

RMS can be thought of roughly as the “average error” across your test dataset, so you want this value to be low

49
Q

Hyperplane

A

A mathematical term for a surface that contains more than two planes.

50
Q

Plane

A

A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line.

51
Q

stop words

A

A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.

52
Q

data vectorization

A

A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model

53
Q

k-means & k

A

a common cluster-finding model. k is how many clusters the model will try to find in your dataset.

54
Q

silhouette coefficient

A

A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A score approaching 1 indicates successful identification of discrete non-overlapping clusters.

55
Q

Bag of words

A

A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset

56
Q

CNN

A

are a special type of neural network particularly good at processing images.

57
Q

Neural network

A

a collection of very simple models connected together.

  1. These simple models are called neurons
  2. the connections between these models are trainable model parameters called weights.