Things I got wrong - MLA Flashcards by Al Them

How many variables’ distributions are shown in a histogram?

How well did you know this?

Not at all

Perfectly

What is JSONL?

JSON lines. Used to encode JSON objects with each in a separate line.

How well did you know this?

Not at all

Perfectly

What is Parquet?

A columnar data format which is well suited to batch data processing

How well did you know this?

Not at all

Perfectly

What is a scatter plot useful for?

Testing the correlation between two continuous variables

How well did you know this?

Not at all

Perfectly

What is the main use of Managed Service for Apache Kafka?

Transfer data between different applications and systems

How well did you know this?

Not at all

Perfectly

Can EFS be used outside of EC2 instances?

How well did you know this?

Not at all

Perfectly

What is FSx for Lustre good for with regards to Sagemaker?

Distributed training. Works well due to its high performance

How well did you know this?

Not at all

Perfectly

Is Ground Truth using humans or machines to do the labelling?

Both!

How well did you know this?

Not at all

Perfectly

Do you change data’s format first or filter it first?

You filter it first, then change the format

How well did you know this?

Not at all

Perfectly

What does the “difference in proportions of labels” metric tell you?

The difference in the proportions of positive and negative labels between 2 groups. For example, identify how much the predicted churn changes between different demographics to be aware of potential biases

How well did you know this?

Not at all

Perfectly

What does Cramer’s V tell you?

The association between two categorical variables

How well did you know this?

Not at all

Perfectly

What type of splitting is used when there is a date or time series?

Ordered splitting. This means that the model only has access to historical data, otherwise it wouldn’t make sense to ask it to predict for dates it already knows

How well did you know this?

Not at all

Perfectly

What does Spearman’s rank correlation tell you and when should it be used?

To find the strength and direction of the relationship between two monotonic variables. Doesn’t assume a distribution

How well did you know this?

Not at all

Perfectly

Why might SageMaker processing jobs be advantageous over Glue for ML workloads?

SM processing jobs integrate w/ ML libraries like Tensorflow

How well did you know this?

Not at all

Perfectly

What is the purpose of Sagemaker batch transform?

Allows you to perform inference on large datasets without having to maintain a persistent endpoint

How well did you know this?

Not at all

Perfectly

What is the maximum processing time of Sagemaker serverless?

Study These Flashcards

60 seconds

What is random cut forest used for?

Study These Flashcards

Anomaly detection in high-dimensional data, such as time series or multivariate data

Do VPC gateway endpoints allow data transfer across regions? What about interface endpoints?

Study These Flashcards

Gateway: no, Interface: yes

What is the purpose of Sagemaker IP insights?

Study These Flashcards

Identify IPs that deviate from normal patterns through the analysis of historical logs

Do S3 access points have authentication or access control?

Study These Flashcards

What is Sagemaker DeepAR algorithm used for?

Study These Flashcards

Time series forecasting only

What type of instances are often good for deep learning inference?

Study These Flashcards

Accelerated computing instances w/ GPUs or FGPAs

What is the decorator that you have to use to integrate a custom step into a SageMaker Pipeline?

Study These Flashcards

@step

Can Comprehend be used to redact/anonymise PII? When would you use this over Macie?

Study These Flashcards

Yes, and you’d use it for real-time or near-real-time use cases e.g. customer communications

What is the difference in SageMaker Clarify and SageMaker Model Monitor?

SageMaker Clarify is explainability and bias detection. SageMaker Model Monitor is drift, model performance, managing costs

Is using a notebook for production deployments efficient?

Not really, notebooks are better for getting a feel on how the system works and how you can interact with it

What is model bias?

The idea that the output is being biased to a specific outcome for a specific group, e.g. old people always get bigger loans

What is the feature attribution?

The idea that you can say which metric from the input data caused a certain outcome in the model's results, e.g. you can see that the 'income' field is contributing the most dominantly to the inferences of the model

Where is pipe mode better over fast file mode?

Pipe mode is better if the data should be loaded sequentially, but otherwise fast file is probably more suitable

Is Amazon MWAA serverless?

What ensemble method should you use if you want to assemble multiple different algorithms/model types?

Stacking. Bagging and boosting typically work with one model type at a time.

What is feature splitting?

Breaking a complex input feature into 2 smaller and simpler sub features.

What is one hot encoding?

A technique to convert categorical variables into numerical ones

What is the purpose of AWS X-Ray?

Finding bottlenecks and tracking the flow of execution in microservices architectures

The ROC is the plot of the XXX XXX vs XXX XXX rates at various thresholds.

The ROC is the plot of the true positive vs false positive rates at various thresholds.

What is ordinal encoding?

A technique used to encode categorical data into numerical data

Things I got wrong - MLA Flashcards

(36 cards)