Modelling - Past Questions Flashcards

1
Q

What is semantic segmentation?

A

a deep learning algorithm that labels or categorises every pixel in an image?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When you are trying to find items that are similar what algorithm would you use?

A

K-nearest neighbour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the linear learner algorithm show?

A

How a change in an independent variable affects a dependant variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of problem is random cut forest used for predominately?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What sagemaker algorithm supports recommendations?

A

Factorisation Machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What SageMaker algorithm supports regression

A

Linear Learner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What 4 types of problem can XGBoost be used to solve?

A

Regression, Binary Classification, Multi-class classification and Ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What format should the training data be in for XGBoost

A

CSV or libsvm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Random Cut Forest used for?

A

to identify anomalies in data (ie find fraud)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Random Cut Forest find an anomaly?

A

It provides a score for each data point. A low score = similar to most of the data, high score = anomaly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What format should training data for Random Cut Forest be in?

A

CSV or x-recordio-protobuf format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For online testing what type of data should you use?

A

live data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For offline testing what sort of data should you use?

A

historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When you perform offline testing of your models which endpoints should you deploy your trained models to?

A

alpha endpoints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When using online testing which endpoint should you deploy your trained models to?

A

SageMaker endpoint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When trying to select the correct trained model for real-time ml what steps would you take?

A

Deploy your models to SageMaker endpoint, then send a portion of live data to each ,model and finally evaluate each model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is object detection used for?

A

to identify all instances of an object within an image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does object detection give the location of a particular object?

A

It uses a bounding box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What type of ML algorithm is Object detection?

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What format is recommended for Object detection training data ?

A

Apache MxNet recordIO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is incremental training?

A

You seed the training data with a previously trained model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When would object detection not be a good idea?

A

For problems at scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Latent Dirichlet Allocation used for?

A

Discovering a topic in a document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What algorithm would you use to classify millions of high-resolution images?

A

SageMaker built-in Image Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does SageMaker’s built-in Image Classification work?

A

It uses a convolutional Neural Network to classify images that supports multi-label classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a factorisation Machine primarily used for?

A

detect interactions between features ie reactions to ads on a web page or item recommendations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are factorization machines used for?

A

Classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If you want to find all elements of an item in an image and surround it with a bounding box what algorithm would you use?

A

Object Detection Algorithm

29
Q

What is a Neural Topic Model algorithm used for?

A

to group documents into topics using the statistical distribution of words in the documents

30
Q

What do you use XGBoost for?

A

predicting a target variable very quickly and efficently

31
Q

What does XGBoost do with redundant features?

A

It includes them which can lead to performance drag

32
Q

Why is removing redundant features outright a bad idea?

A

There is a risk of information loss

33
Q

How would you solve the issue of redundant features most efficiently and quickly?

A

Principal Component Analysis

34
Q

How does Principal component analysis work?

A

It finds composites of features that are uncorrelated

35
Q

What is online learning?

A

the process of training your model incrementally by giving it data observations as individual observations or in mini-batches

36
Q

What technique can you use within SageMaker to expedite the deployment and operation of your model?

A

Transfer learning

37
Q

What is transfer learning?

A

You start with an off the shelf trained model and apply it to your different but similar observations

38
Q

What is incremental learning?

A

You begin with an existing model you have already trained and extend it with new data.

39
Q

When do you use Out-of-core learning?

A

when training with huge datasets that you can’t load into your servers memory.

40
Q

How does Out-of-core learning work?

A

The algorithm loads some of the data, trains on that subset, loads another subset of observations, trains on that subset and repeats

41
Q

What does the early_stopping hyperparameter do?

A

Decide if the algorithm should be allowed to stop early when training if further training will not be necessary

42
Q

What does the learning_rate hyperparameter do?

A

decides how quickly the model adapts to new or changing data. Values between 0.0 - 1.0

43
Q

What does a learning_rate close to 1.0 do?

A

The model will learn quickly and take into account new observations quickly

44
Q

What does a learning_rate close to 0.0

A

The model will learn slowly and take into account new observations slowly

45
Q

What does the use_pretrained_model hyperparameter do?

A

Defines if you want a pre-trained model to be loaded in before training.

46
Q

What are the three steps needed for deploying a model using Amazon SageMaker Hosting services?

A
  1. Create a model in Amazon SageMAker including the S3 path where the model artefacts are stored and the Docker registry path for the inference image
  2. Create an endpoint config for a HTTPS endpoint
  3. Create a HTTPS endpoint
47
Q

What does IoT Core do?

A

Allows you to send IoT messages to AwS services without managing infrastructure

48
Q

What does IoT Greengrass do?

A

Helps you quickly build edge device software and remotely deploy and manage it.

49
Q

What is IoT Analytics specifically built for?

A

Analysing and enriching highly unstructured IoT data

49
Q

What are Inference Pipelines used for?

A

to define and deploy pre-trained SageMaker algorithms

49
Q

Can Inference pipelines be used with IoT devices?

A

No they do not have the Inference Inference integration

50
Q

If you wanted to enrich data using Kinesis Data Streams would you need any additional steps?

A

Yes you would need lambda functions to perform the enrichment steps.

51
Q

Which Amazing ML services/features would you use to manage multiple experiments at scale?

A

Amazon SageMaker model tracking capability

52
Q

What is Amazon SageMaker Inference pipeline used for?

A

to deploy pre-trained SageMaker algorithms packaged in docker containers.

53
Q

What can you search for in the Amazon SageMaker model tracking capability?

A

key model attributes ie hyperparameter values. algorithms used and tags associated with the models.

54
Q

What does Amazon SageMaker model experiments capability do?

A

It does not exist

55
Q

What does Amazon SageMaker model containers capability do?

A

It does not exist

56
Q

What format must the labelling file be in when using AWS Glue FindMatches Ml Transform?

A

CSV

57
Q

How should the labelling file be structured when using AWS Glue FindMatches ML Transform?

A

The first two columns are the labeling_set_id and the label. Then the rest should match the schema of the data to be processed.

58
Q

What happens if AWS Glue FindMatches ML Transform can’t find a match for a record?

A

it is assigned a unique label

59
Q

How should the labelling file be encoded when using AWS Glue FindMatches Ml Transform?

A

UTF-8 without BOM

60
Q

Does SageMaker support GPU instances for the Random Cut Forest Algorithm?

A

No it does not. It only supports CPU

61
Q

What is K-means?

A

an unsupervised learning algorithm. It attempts to find discrete groupings within data where members of a group are as similar as one another.

61
Q

What is the difference between KNN and K-means?

A

K-Means is unsupervised and KNN is supervised.

62
Q

When do you use logistic regression?

A

When doing supervised classification and the decision boundary is linear.

63
Q

You are building a binary classifier with highly unbalanced data. What three things can you do to improve model performance?

A
  • Collect more data of the class with less data
  • Oversample the class with less data
  • Create more samples using algorithms such as smote
64
Q

How does SMOTE work?

A

uses kNN neighbours approach to exclude members of the majority class which creating synthetic examples similar the the minority class.

65
Q

What is the easiest

A