Things I got wrong - MLS Flashcards
(30 cards)
Can KDF perform data format transformations?
Yes, e.g. JSON to Parquet
What type of data is well suited to imputation by deep learning?
Categorical
What can object2vec turn into embeddings?
Full sentences
What is incremental training?
Slowly retraining your model with new data
How does KDF perform data transformations?
With built in lambda functions
What type of algorithm are support vector machines and what can they be used for?
A supervised ML algorithm that can be employed for both classification and regression
Can K-Means be used for classification?
No, it is for clustering
What type of learning is classification?
Supervised learning with labelled data
If you are overfitting, should you use more or less training data?
More, to introduce a greater diversity in the training data and force the model to generalise more
When should you use PCA in randomised mode?
For datasets with a large amount of observations and features as it can then use an approximation algorithm to run faster
If a model has high specificity, it means that nearly all ____ ______ have been weeded out.
If a model has high specificity, it means that nearly all false positives have been weeded out
What is the equation for tf-idf?
TF x IDF, where:
TF is the amount of times a word appears in a document / total words in the document.
IDF is the the log of the total number of documents / the amount of documents containing the specified word.
Can Glue transform data into RecordIO-Protobuf?
No
If you are underfitting, what 2 actions can you take?
Use more features
Remove regularisation
What is the fastest command to move data from S3 to Redshift?
COPY, this is much faster than INSERT
What algorithm should you use to forecast sales for a product that has not been seen before specifically, but for which there are similar existing products?
DeepAR
What type of algorithm is DeepAR?
Supervised RNN
Can you integrate your own scripts with XGBoost?
Yes
What are two scaling techniques that can be used to reduce the effect of outliers in your data?
Robust standardisation and logarithm transformation
Does SageMaker support resource based policies and service linked roles?
No
Does training with a SageMaker built-in algorithm require an IAM role?
Yes
What is the read/write of each KDS shard?
Read max is 2MB
Write max is 1MB
What two conditions must CSVs going into SageMaker supervised learning algorithms meet?
They should not have a header record and the target variable should be in the first column
Are InvokeEndpoint calls to SageMaker monitored by CloudTrail?
No