Things I got wrong - MLS Flashcards by Al Them

Can KDF perform data format transformations?

Yes, e.g. JSON to Parquet

How well did you know this?

Not at all

Perfectly

What type of data is well suited to imputation by deep learning?

Categorical

How well did you know this?

Not at all

Perfectly

What can object2vec turn into embeddings?

Full sentences

How well did you know this?

Not at all

Perfectly

What is incremental training?

Slowly retraining your model with new data

How well did you know this?

Not at all

Perfectly

How does KDF perform data transformations?

With built in lambda functions

How well did you know this?

Not at all

Perfectly

What type of algorithm are support vector machines and what can they be used for?

A supervised ML algorithm that can be employed for both classification and regression

How well did you know this?

Not at all

Perfectly

Can K-Means be used for classification?

No, it is for clustering

How well did you know this?

Not at all

Perfectly

What type of learning is classification?

Supervised learning with labelled data

How well did you know this?

Not at all

Perfectly

If you are overfitting, should you use more or less training data?

More, to introduce a greater diversity in the training data and force the model to generalise more

How well did you know this?

Not at all

Perfectly

When should you use PCA in randomised mode?

For datasets with a large amount of observations and features as it can then use an approximation algorithm to run faster

How well did you know this?

Not at all

Perfectly

If a model has high specificity, it means that nearly all ____ ______ have been weeded out.

If a model has high specificity, it means that nearly all false positives have been weeded out

How well did you know this?

Not at all

Perfectly

What is the equation for tf-idf?

TF x IDF, where:
TF is the amount of times a word appears in a document / total words in the document.
IDF is the the log of the total number of documents / the amount of documents containing the specified word.

How well did you know this?

Not at all

Perfectly

Can Glue transform data into RecordIO-Protobuf?

How well did you know this?

Not at all

Perfectly

If you are underfitting, what 2 actions can you take?

Use more features
Remove regularisation

How well did you know this?

Not at all

Perfectly

What is the fastest command to move data from S3 to Redshift?

COPY, this is much faster than INSERT

How well did you know this?

Not at all

Perfectly

What algorithm should you use to forecast sales for a product that has not been seen before specifically, but for which there are similar existing products?

Study These Flashcards

DeepAR

What type of algorithm is DeepAR?

Study These Flashcards

Supervised RNN

Can you integrate your own scripts with XGBoost?

Study These Flashcards

Yes

What are two scaling techniques that can be used to reduce the effect of outliers in your data?

Study These Flashcards

Robust standardisation and logarithm transformation

Does SageMaker support resource based policies and service linked roles?

Study These Flashcards

Does training with a SageMaker built-in algorithm require an IAM role?

Study These Flashcards

Yes

What is the read/write of each KDS shard?

Study These Flashcards

Read max is 2MB
Write max is 1MB

What two conditions must CSVs going into SageMaker supervised learning algorithms meet?

Study These Flashcards

They should not have a header record and the target variable should be in the first column

Are InvokeEndpoint calls to SageMaker monitored by CloudTrail?

Study These Flashcards

What is a Poisson distribution?

A probability distribution that is used to show how many times an event is likely to occur over a specified period

What part of a dataset is standardised or normalised?

Specific numerical features, NOT the whole thing

What is the difference between standardisation and normalisation?

Standardisation moves the mean to 0, but the standard deviation remains the same. Normalisation makes the values all between 0 and 1.

What is elastic inference?

A way to speed up the throughput and decrease the latency of getting real-time inferences from SageMaker deep learning models

What is AWS Panorama?

Enables you to do computer vision at the edge through AWS devices. Works with existing camera networks

K-fold validation splits data into k equal parts, trains the model k times, each time using a different fold as validation data and the remaining k-1 folds as training data. The model's performance is averaged across all k iterations. Common value for k is 5 or 10.

Things I got wrong - MLS Flashcards

(30 cards)