MLOps Question Bank Flashcards

1
Q

How many row-level predictions are saved each day?

A

Up to 2.4M. We save the first 100k scored records per hour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does DataRobot calculate data drift?

A

The data drift plot defaults to population stability index (PSI) scores. You can also get KL divergence, Hellinger divergence, Jensen–Shannon divergence, and Histogram intersection scores through the Python client

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What types of features do we calculate data drift for?

A

The Feature Drift vs. Feature Importance chart monitors the 10 most impactful numerical or categorical features in your data. This chart excludes any text, percentage, or currency features, which means that you can have less than 10 features plotted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you change the cutoff for accuracy drift?

A

You can change the cutoff in the Monitoring tab of the Settings section inside a deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the AssociationID work? What happens when you upload an actual with a duplicate AssociationID?

A

The actuals payload must contain the column names associationId and actualValue. Use the optional column wasActedOn to indicate if the prediction was acted on in a way that could have affected the actual outcome. If you submit multiple actuals with the same association ID value, either in the same or a subsequent request, DataRobot uses the latest actuals value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

At what granularity can we see drift and service health? Hourly, daily, weekly, monthly?

A

At what granularity can we see drift and service health? Hourly, daily, weekly, monthly?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What roles are associated with deployments? How are they different than roles associated with projects?

A

User/Admin/Consumer? Need to look up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Do we support any language?

A

We support models built in most languages. Two notable exceptions in 5.2 are SAS and DataRobot models (however, you can use Agents to monitor codegen).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do we support any modeling type (i.e. regression, classification, multi-class)?

A

We currently only support regression and binary classification. We plan to relex these limitations in future releases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the maximum number of environments we can support?

A

10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can you adjust the threshold for drift?

A

Yes - MLOps Agent.The MLOps agent can report predictions data and metrics for all external deployments, whether fully connected, intermittently connected, or completely disconnected from MLOps. If you have deployments running in isolated environments and disconnected from the network, for example, you can provide their data to the agent, and then view and manage them from MLOps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the lowest resolution that you can display Data Drift and Accuracy?

A

Daily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Do we offer tracking of externally deployed models

A

Yes - MLOps Agent.The MLOps agent can report predictions data and metrics for all external deployments, whether fully connected, intermittently connected, or completely disconnected from MLOps. If you have deployments running in isolated environments and disconnected from the network, for example, you can provide their data to the agent, and then view and manage them from MLOps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the levels of Role Based Access Control (or User Management) provided by MLOps?

A

Deployment Admin, Owner, User, Consumer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you change the prediction threshold in MLOps and how does this differ in MMM for AutoML/AutoTS

A

You currently cannot change the prediction threshold in AutoML / AutoTS for DataRobot models, but you can change it for Custom Models inside MLOps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What access does the Deployment Admin user offer users?

A

A deployment administrator role, assigned by the system administrator, has User role permissions for all existing and newly created deployments within their organization; the deployment admin is also able to approve new deployments facilitating governnance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Does MLOps offer governance capabilities? Explain.

A

Role Based Access Control (User Management), Deployment Admin, Workflow Approvals, Materiality Score (Importantance), Real Time Monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the alert notification structure for MLOps, what are you able to customize?

A

Email. And you can customize when alerts are scheduled, who receives them, as well at thresholds for accruacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does DataRobot calculate accuracy drift?

A

How does DataRobot calculate accuracy drift?

20
Q

What are some components of service health that MLOps will monitor?

A

Number of predictions,number of requests, execution time, response time, predictions over certain time, data error, system error, consumers, cache hit rate

21
Q

What is Association ID and how is it used by MLOps?

A

Association ID is unique identifier for each prediction request. It can be optional or mandatiroy field passed with a prediction request. Once actuals are available they are sent back to DR along with association ID so that DR can tie back the actuals and predictions

22
Q

What are some reasons for Red/Failing model health (service, data, or accuracy)?

A

At least one 5xx error, At least one higher-importance attribute’s distribution has shifted since the model was deployed, Accuracy has severely declined since the model was deployed

23
Q

Does MLOps support external deployments?

A

Two components: 1. MLOps Agent 2. Adding External Deployment. You can use the deployment management tool to analyze historical predictions and continuously assess model quality based on prediction output. Note that while not all the tools of model management are available to externally imported datasets, the inventory, data drift, and accuracy tabs provide an excellent starting point.

24
Q

What data from training is used as a baseline to calculate Model Health?

A

uses the holdout data distribution. Where model is trained on holdout it goes back to the same BP on non holdout data and uses that for baseline. For Custom Models, baseline is not currently provided but it is on the roadmap.

25
Q

What types of model classes does MMM support? What about MLOps

A

MMM: Binary Classification, Regression, Limited TS. MLOps: Binary Classificationn, Regression

26
Q

Does MLOps support time aware projects?

A

No TS, but does support OTV partitioning

27
Q

How many features do we currently support drift tracking for? What variable types are supported?

A

The Feature Details chart provides a histogram that compares the distribution of a selected feature in the training data to the distribution of that feature in the scoring data. From the results you can easily identify details of changed values (which values became more or less frequent), helping to assess the severity of the problem as well as its causes and resolutions. The chart plots up to the top 25 values based on Feature Impact from your training data, as well as Missing and Other counts. The Missing count contains all records with missing feature values (that is, NaN as the value of one of the features). The Other count, which applies to categorical features, includes all features other than the 25 most frequent values.

28
Q

What metric is used to calculate feature important for MLOps?

A

For DataRobot model deployments, DataRobot uses the Feature Impact score. You must calculate the Feature Impact score for the model before the chart is available. For external models, DataRobot uses the Importance score it calculated when ingesting the data (ACE). This is the same calculation run during EDA2 in the standard model building process.

29
Q

In which quadrant of the Data Drift plot would you see Red/Failing features?

A

They should explain the four quadrant interpretation of the Data Drift chart. High importance feature(s) are experiencing high drift. It would be advised that users investigate these immediately.

30
Q

What is the Feature Details plot? How should it be framed to users?

A

The Feature Details chart provides a histogram that compares the distribution of a selected feature in the training data to the distribution of that feature in the scoring data. From the results you can easily identify details of changed values (which values became more or less frequent), helping to assess the severity of the problem as well as its causes and resolutions.

31
Q

How does MLOps handle new categorical levels from your scoring data?

A

For categorical features, there is an additional New level count. It only displays after you make predictions with a scoring file that has a new value for a feature which the training set did not.

32
Q

What are some of the restrictions around model replacement?

A

DR -> DR, Custom -> Custom

33
Q

How does DataRobot calculate a baseline for drift?

A

There are two key components to Model Health: Accuracy and Drift. For DataRobot AutoML & AutoTS models, Model Monitoring uses holdout data for Data Drift AND Accuracy. If a model is trained into its holdout then it goes back to the same BP on non holdout data and uses that for its baseline. For Custom Models, Data Drift is created from that models Training Data. Accuracy baseline is not currently available for Custom Models, it is on the roadmap.

34
Q

What happens when you click on test custom model

A

Background process checks if it can spin off the model using the environment and also sees if the model returns predictions on the test file uploaded by you

35
Q

What are the mandatory fields for uploading custom models

A

Name, tar file, target column, target type, if classification what is positive and negative target values

36
Q

In tracking agents what type of models are supported

A

Currently python and java. R support is coming soon

37
Q

How is feature importance calculated for custom models

A

ACE methodology on uploaded training data

38
Q

My custom model treats zip_code as a categorical, but it appears as a numeric in the data drift tab. Why?

A

It’s possible that the DataRobot project created in the background detected zip_code as a numeric field

39
Q

My custom model treats purpose as a categorical, but it’s not appearing the data drift tab. Why?

A

My custom model treats purpose as a categorical, but it’s not appearing the data drif tab. Why?

40
Q

What are the different ways you can upload actuals back

A

Use a file or table and use API call to upload. Or you can use AI catalog to upload it from the settings tab

41
Q

What happens if you upload two different actuals for same association ID

A

The backed data will be overwritten with the latest record. The chart however will reflect the first record. Chart does not get refreshed. This will change in future

42
Q

Is there a limit on number of records/predictions DR will use for dirft tracking

A

Per hour 100k and overall 1 million per day

43
Q

What is the “prediction was acted on” flag meant to indicate?

A

??

44
Q

What metrics are available for Data Drift

A

PSI for UI, and PSI, KL divergence, Hellinger divergence, Jensen–Shannon divergence and Histogram intersection using API

45
Q

Product distinctions between Model Monitoring and Management (MMM) included with the AutoML and MLOps product?

A

MMM: deploy, monitor, and manage DataRobt model models, Deployment Admin, Security and User Management, Autoscaling Prediction Servers. MLOps Exclusive: deploy, monitor, and manage DR models + custom models + remote models + DR Scoring Code + Agents. Transfer models to a seperate scoring env, approval workflow and compliance