MLOps Question Bank Flashcards

Question 1

Q

How many row-level predictions are saved each day?

Answer

A

Up to 2.4M. We save the first 100k scored records per hour

Question 2

Q

How does DataRobot calculate data drift?

Answer

A

The data drift plot defaults to population stability index (PSI) scores. You can also get KL divergence, Hellinger divergence, Jensen–Shannon divergence, and Histogram intersection scores through the Python client

Question 3

Q

What types of features do we calculate data drift for?

Answer

A

The Feature Drift vs. Feature Importance chart monitors the 10 most impactful numerical or categorical features in your data. This chart excludes any text, percentage, or currency features, which means that you can have less than 10 features plotted

Question 4

Q

How do you change the cutoff for accuracy drift?

Answer

A

You can change the cutoff in the Monitoring tab of the Settings section inside a deployment.

Question 5

Q

How does the AssociationID work? What happens when you upload an actual with a duplicate AssociationID?

Answer

A

The actuals payload must contain the column names associationId and actualValue. Use the optional column wasActedOn to indicate if the prediction was acted on in a way that could have affected the actual outcome. If you submit multiple actuals with the same association ID value, either in the same or a subsequent request, DataRobot uses the latest actuals value

Question 6

Q

At what granularity can we see drift and service health? Hourly, daily, weekly, monthly?

Answer

A

At what granularity can we see drift and service health? Hourly, daily, weekly, monthly?

Question 7

Q

What roles are associated with deployments? How are they different than roles associated with projects?

Answer

A

User/Admin/Consumer? Need to look up.

Question 8

Q

Do we support any language?

Answer

A

We support models built in most languages. Two notable exceptions in 5.2 are SAS and DataRobot models (however, you can use Agents to monitor codegen).

Question 9

Q

Do we support any modeling type (i.e. regression, classification, multi-class)?

Answer

A

We currently only support regression and binary classification. We plan to relex these limitations in future releases.

Question 10

Q

What is the maximum number of environments we can support?

Question 11

Q

Can you adjust the threshold for drift?

Answer

A

Yes - MLOps Agent.The MLOps agent can report predictions data and metrics for all external deployments, whether fully connected, intermittently connected, or completely disconnected from MLOps. If you have deployments running in isolated environments and disconnected from the network, for example, you can provide their data to the agent, and then view and manage them from MLOps.

Question 12

Q

What is the lowest resolution that you can display Data Drift and Accuracy?

Question 13

Q

Do we offer tracking of externally deployed models

Answer

A

Yes - MLOps Agent.The MLOps agent can report predictions data and metrics for all external deployments, whether fully connected, intermittently connected, or completely disconnected from MLOps. If you have deployments running in isolated environments and disconnected from the network, for example, you can provide their data to the agent, and then view and manage them from MLOps.

Question 14

Q

What are the levels of Role Based Access Control (or User Management) provided by MLOps?

Answer

A

Deployment Admin, Owner, User, Consumer

Question 15

Q

How do you change the prediction threshold in MLOps and how does this differ in MMM for AutoML/AutoTS

Answer

A

You currently cannot change the prediction threshold in AutoML / AutoTS for DataRobot models, but you can change it for Custom Models inside MLOps

Question 16

Q

What access does the Deployment Admin user offer users?

Answer

A

A deployment administrator role, assigned by the system administrator, has User role permissions for all existing and newly created deployments within their organization; the deployment admin is also able to approve new deployments facilitating governnance

Question 17

Q

Does MLOps offer governance capabilities? Explain.

Answer

A

Role Based Access Control (User Management), Deployment Admin, Workflow Approvals, Materiality Score (Importantance), Real Time Monitoring

Question 18

Q

What is the alert notification structure for MLOps, what are you able to customize?

Answer

A

Email. And you can customize when alerts are scheduled, who receives them, as well at thresholds for accruacy

Question 19

Q

How does DataRobot calculate accuracy drift?

Answer

A

How does DataRobot calculate accuracy drift?

Question 20

Q

What are some components of service health that MLOps will monitor?

Answer

A

Number of predictions,number of requests, execution time, response time, predictions over certain time, data error, system error, consumers, cache hit rate

Question 21

Q

What is Association ID and how is it used by MLOps?

Answer

A

Association ID is unique identifier for each prediction request. It can be optional or mandatiroy field passed with a prediction request. Once actuals are available they are sent back to DR along with association ID so that DR can tie back the actuals and predictions

Question 22

Q

What are some reasons for Red/Failing model health (service, data, or accuracy)?

Answer

A

At least one 5xx error, At least one higher-importance attribute’s distribution has shifted since the model was deployed, Accuracy has severely declined since the model was deployed

Question 23

Q

Does MLOps support external deployments?

Answer

A

Two components: 1. MLOps Agent 2. Adding External Deployment. You can use the deployment management tool to analyze historical predictions and continuously assess model quality based on prediction output. Note that while not all the tools of model management are available to externally imported datasets, the inventory, data drift, and accuracy tabs provide an excellent starting point.

Question 24

Q

What data from training is used as a baseline to calculate Model Health?

Answer

A

uses the holdout data distribution. Where model is trained on holdout it goes back to the same BP on non holdout data and uses that for baseline. For Custom Models, baseline is not currently provided but it is on the roadmap.

Question 25

Q

What types of model classes does MMM support? What about MLOps

Answer

A

MMM: Binary Classification, Regression, Limited TS. MLOps: Binary Classificationn, Regression

Question 26

Q

Does MLOps support time aware projects?

Answer

A

No TS, but does support OTV partitioning

Question 27

Q

How many features do we currently support drift tracking for? What variable types are supported?

Answer

A

The Feature Details chart provides a histogram that compares the distribution of a selected feature in the training data to the distribution of that feature in the scoring data. From the results you can easily identify details of changed values (which values became more or less frequent), helping to assess the severity of the problem as well as its causes and resolutions. The chart plots up to the top 25 values based on Feature Impact from your training data, as well as Missing and Other counts. The Missing count contains all records with missing feature values (that is, NaN as the value of one of the features). The Other count, which applies to categorical features, includes all features other than the 25 most frequent values.

Question 28

Q

What metric is used to calculate feature important for MLOps?

Answer

A

For DataRobot model deployments, DataRobot uses the Feature Impact score. You must calculate the Feature Impact score for the model before the chart is available. For external models, DataRobot uses the Importance score it calculated when ingesting the data (ACE). This is the same calculation run during EDA2 in the standard model building process.

Question 29

Q

In which quadrant of the Data Drift plot would you see Red/Failing features?

Answer

A

They should explain the four quadrant interpretation of the Data Drift chart. High importance feature(s) are experiencing high drift. It would be advised that users investigate these immediately.

Question 30

Q

What is the Feature Details plot? How should it be framed to users?

Answer

A

The Feature Details chart provides a histogram that compares the distribution of a selected feature in the training data to the distribution of that feature in the scoring data. From the results you can easily identify details of changed values (which values became more or less frequent), helping to assess the severity of the problem as well as its causes and resolutions.

Question 31

Q

How does MLOps handle new categorical levels from your scoring data?

Answer

A

For categorical features, there is an additional New level count. It only displays after you make predictions with a scoring file that has a new value for a feature which the training set did not.

Question 32

Q

What are some of the restrictions around model replacement?

Answer

A

DR -> DR, Custom -> Custom

Question 33

Q

How does DataRobot calculate a baseline for drift?

Answer

A

There are two key components to Model Health: Accuracy and Drift. For DataRobot AutoML & AutoTS models, Model Monitoring uses holdout data for Data Drift AND Accuracy. If a model is trained into its holdout then it goes back to the same BP on non holdout data and uses that for its baseline. For Custom Models, Data Drift is created from that models Training Data. Accuracy baseline is not currently available for Custom Models, it is on the roadmap.

Question 34

Q

What happens when you click on test custom model

Answer

A

Background process checks if it can spin off the model using the environment and also sees if the model returns predictions on the test file uploaded by you

Question 35

Q

What are the mandatory fields for uploading custom models

Answer

A

Name, tar file, target column, target type, if classification what is positive and negative target values

Question 36

Q

In tracking agents what type of models are supported

Answer

A

Currently python and java. R support is coming soon

Question 37

Q

How is feature importance calculated for custom models

Answer

A

ACE methodology on uploaded training data

Question 38

Q

My custom model treats zip_code as a categorical, but it appears as a numeric in the data drift tab. Why?

Answer

A

It’s possible that the DataRobot project created in the background detected zip_code as a numeric field

Question 39

Q

My custom model treats purpose as a categorical, but it’s not appearing the data drift tab. Why?

Answer

A

My custom model treats purpose as a categorical, but it’s not appearing the data drif tab. Why?

Question 40

Q

What are the different ways you can upload actuals back

Answer

A

Use a file or table and use API call to upload. Or you can use AI catalog to upload it from the settings tab

Question 41

Q

What happens if you upload two different actuals for same association ID

Answer

A

The backed data will be overwritten with the latest record. The chart however will reflect the first record. Chart does not get refreshed. This will change in future

Question 42

Q

Is there a limit on number of records/predictions DR will use for dirft tracking

Answer

A

Per hour 100k and overall 1 million per day

Question 43

Q

What is the “prediction was acted on” flag meant to indicate?

Question 44

Q

What metrics are available for Data Drift

Answer

A

PSI for UI, and PSI, KL divergence, Hellinger divergence, Jensen–Shannon divergence and Histogram intersection using API

Question 45

Q

Product distinctions between Model Monitoring and Management (MMM) included with the AutoML and MLOps product?

Answer

A

MMM: deploy, monitor, and manage DataRobt model models, Deployment Admin, Security and User Management, Autoscaling Prediction Servers. MLOps Exclusive: deploy, monitor, and manage DR models + custom models + remote models + DR Scoring Code + Agents. Transfer models to a seperate scoring env, approval workflow and compliance

Brainscape's Knowledge GenomeTM

MLOps Question Bank Flashcards

Brainscape's Knowledge Genome^TM