test3 Flashcards
https://www.examgo.com/exams/microsoft/dp-100/ (69 cards)
You plan to use a Deep Learning Virtual Machine (DLVM) to train deep learning models using Compute Unified Device Architecture (CUDA) computations.
You need to configure the DLVM to support CUDA.
What should you implement?
Intel Software Guard Extensions (Intel SGX) technology
Solid State Drives (SSD)
Graphic Processing Unit (GPU)
Computer Processing Unit (CPU) speed increase by using overcloking
High Random Access Memory (RAM) configuration
HOTSPOT
You have a dataset that contains 2,000 rows. You are building a machine learning classification model by using Azure Learning Studio. You add a Partition and Sample module to the experiment.
You need to configure the module.
You must meet the following requirements:
✑ Divide the data into subsets
✑ Assign the rows into folds using a round-robin method
✑ Allow rows in the dataset to be reused
How should you configure the module? To answer, select the appropriate options in the dialog box in the answer area. NOTE: Each correct selection is worth one point.
Partition and Sample
Partition or sample mode
Assign to Folds ▼
Pick Fold
Sampling
Head
☐ Use replacement in the partitioning
☐ Randomized split
HOTSPOT
You create an Azure Machine Learning workspace and set up a development environment. You plan to train a deep neural network (DNN) by using the Tensorflow framework and by using estimators to submit training scripts.
You must optimize computation speed for training runs.
You need to choose the appropriate estimator to use as well as the appropriate training compute target configuration.
Which values should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Answer Area
Parameter Value
Estimator:
Estimator
SKLearn
PyTorch
Tensorflow
Chainer
Training compute:
12 vCPU, 48 GB memory, 96 GB SSD
12 vCPU, 112 GB memory, 680 GB SSD, 2 GPU, 24 GB GPU memory
16 vCPU, 128 GB memory, 160 GB HDD, 80 GB NVME disk (4000 MBps)
44 vCPU, 352 GB memory, 3.4 GHz CPU frequency all cores
You create a Python script that runs a training experiment in Azure Machine Learning. The script uses the Azure Machine Learning SDK for Python.
You must add a statement that retrieves the names of the logs and outputs generated by the script.
You need to reference a Python class object from the SDK for the statement.
Which class object should you use?
Run
ScripcRunConfig
Workspace
Experiment
You register a file dataset named csvjolder that references a folder. The folder includes multiple com ma-separated values (CSV) files in an Azure storage blob container. You plan to use the following code to run a script that loads data from the file dataset.
You create and instantiate the following variables:
Variable Description
remote_cluster References the Azure Machine Learning compute cluster
ws References the Azure Machine Learning workspace
You have the following code:
from azureml.train. estimator import Estimator
file_dataset = ws.datasets.get(‘csv_folder’)
estimator = Estimator(source_directory=script_folder,
compute_target = remote_cluster,
entry_script =’script.py’)
run = experiment.submit(config=estimator)
You need to pass the dataset to ensure that the script can read the files it references .
Which code segment should you insert to replace the code comment?
inputs=[file_dataset.as_named_input(‘training_files’).to_pandas_dataframe()],
inputs=[file_dataset.as_named_input(‘training_files’).as_mount()],
script_params={‘–training_files’: file_dataset},
inputs=[file_dataset.as_named_input(‘training_files’)],
HOTSPOT
You are using an Azure Machine Learning workspace. You set up an environment for model testing and an environment for production.
The compute target for testing must minimize cost and deployment efforts. The compute target for production must provide fast response time, autoscaling of the deployed service, and support real-time inferencing.
You need to configure compute targets for model testing and production.
Which compute targets should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Environment Compute target
Testing:
Local web service
Azure Kubernetes Services (AKS)
Azure Container Instances
Azure Machine Learning compute clusters
Production:
Local web service
Azure Kubernetes Services (AKS)
Azure Container Instances
Azure Machine Learning compute clusters
You have a dataset that includes confidential data. You use the dataset to train a model.
You must use a differential privacy parameter to keep the data of individuals safe and private.
You need to reduce the effect of user data on aggregated results.
What should you do?
Decrease the value of the epsilon parameter to reduce the amount of noise added to the data
Increase the value of the epsilon parameter to decrease privacy and increase accuracy
Decrease the value of the epsilon parameter to increase privacy and reduce accuracy
Set the value of the epsilon parameter to 1 to ensure maximum privacy
You create a pipeline in designer to train a model that predicts automobile prices.
Because of non-linear relationships in the data, the pipeline calculates the natural log (Ln) of the prices in the training data, trains a model to predict this natural log of price value, and then calculates the exponential of the scored label to get the predicted price.
The training pipeline is shown in the exhibit. (Click the Training pipeline tab.)
Training pipeline
+——————+
| Automobile data |
+——–+———+
|
v
+—————————–+
| Apply Math Operation |
| Replace price with Ln(price)|
+————–+————–+
|
v
+————-+
| Split Data |
|70% train / |
|30% validate |
+—–+——-+
|
+——+——+
| |
v v
+——————+ |
| Train Model | |
| Predict Ln(price)| |
+———+———- |
| |
v |
+——————+ |
| Linear Regression| |
+——————+ |
|
|
v
+————+
|Score Model |
| Get Ln(price) prediction |
+————+
|
v
+———————————-+
| Apply Math Operation |
| Replace Scored Labels with |
| Exp(Scored Labels) |
+—————-+—————–+
|
v
+————————————-+
| Apply SQL Transformation |
| SELECT [Scored Labels] AS |
| predicted_price |
+————————————-+
You create a real-time inference pipeline from the training pipeline, as shown in the exhibit. (Click the Real-time pipeline tab.)
Real-time pipeline
+——————+ +——————+
| Web Service Input| | Automobile data |
+——–+———+ +——–+———+
\ /
\ /
v v
+——————————+
| Apply Math Operation |
| Replace price with Ln(price)|
+—————+————–+
|
v
+——————————+
| MD-Automobile_Price_Regress…|
+—————+————–+
|
v
+——————+
| Score Model |
| Get Ln(price) |
| prediction |
+—–+——+—–+
| |
| v
| +———————+
| | Web Service Output |
| +———————+
|
v
+————————————+
| Apply Math Operation |
| Replace Scored Labels with |
| Exp(Scored Labels) |
+—————-+——————-+
|
v
+——————————————+
| Apply SQL Transformation |
| SELECT [Scored Labels] AS predicted_price|
+——————————————+
You need to modify the inference pipeline to ensure that the web service returns the exponential of the scored label as the predicted automobile price and that client applications are not required to include a price value in the input values.
Which three modifications must you make to the inference pipeline? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
Connect the output of the Apply SQL Transformation to the Web Service Output module.
Replace the Web Service Input module with a data input that does not include the price column.
Add a Select Columns module before the Score Model module to select all columns other than price.
Replace the training dataset module with a data input that does not include the price column.
Remove the Apply Math Operation module that replaces price with its natural log from the data flow.
Remove the Apply SQL Transformation module from the data flow.
You plan to use automated machine learning to train a regression model. You have data that has features which have missing values, and categorical features with few distinct values.
You need to configure automated machine learning to automatically impute missing values and encode categorical features as part of the training task.
Which parameter and value pair should you use in the AutoMLConfig class?
featurization = ‘auto’
enable_voting_ensemble = True
task = ‘classification’
exclude_nan_labels = True
enable_tf = True
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Stratified split for the sampling mode.
Does the solution meet the goal?
Yes
No
HOTSPOT
You use Azure Machine Learning to train and register a model.
You must deploy the model into production as a real-time web service to an inference cluster named service-compute that the IT department has created in the Azure Machine Learning workspace.
Client applications consuming the deployed web service must be authenticated based on their Azure Active Directory service principal.
You need to write a script that uses the Azure Machine Learning SDK to deploy the model.
The necessary modules have been imported.
How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Assume the necessary modules have been imported
deploy_target = __________▼(ws, “service-compute”)
AksCompute
AmlCompute
RemoteCompute
BatchCompute
deployment_config = __________▼.deploy_configuration(cpu_cores=1, memory_gb=1,
AksWebservice
AciWebservice
LocalWebService
_____________________________▼)
token_auth_enabled=True
token_auth_enabled=False
auth_enabled=True
auth_enabled=False
service = Model.deploy(ws, “ml-service”,
[model], inference_config, deployment_config, deploy_target)
service.wait_for_deployment(show_output = True)
HOTSPOT
You are hired as a data scientist at a winery. The previous data scientist used Azure Machine Learning.
You need to review the models and explain how each model makes decisions.
Which explainer modules should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Answer Area
Model type Explainer
A random forest model for predicting the alcohol
content in wine given a set of covariates:
Tabular
HAN
Text
Image
A natural language processing model for
analyzing field reports:
Tree
HAN
Text
Image
An image classifier that determines the quality of
the grape based upon its physical characteristics:
Kernel
HAN
Text
Image
You use the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
Set the regenerate_outputs property of the pipeline to True
Create a ScheduleRecurrance object with a Frequency of auto. Use the object to create a Schedule for the pipeline
Create a PipelineParameter with a default value that references the location where the training data is stored
Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder containing the training data in the path_on_datascore property
HOTSPOT
You need to configure the Edit Metadata module so that the structure of the datasets match.
Which configuration options should you select? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Answer Area
Properties
Project
◿ Edit Metadata
Column
Selected columns:
Column names: Median Value
Launch column selector
Floating point
DateTime
TimeSpan
Integer
Unchanged
Make Categorical
Make Uncategorical
Fields
5
You run a script as an experiment in Azure Machine Learning.
You have a Run object named run that references the experiment run. You must review the log files that were generated during the experiment run.
You need to download the log files to a local folder for review.
Which two code segments can you run to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
run.get_details()
run.get_file_names()
run.get_metrics()
run.download_files(output_directory=’./runfiles’)
run.get_all_logs(destination=’./runlogs’)
You are conducting feature engineering to prepuce data for further analysis.
The data includes seasonal patterns on inventory requirements.
You need to select the appropriate method to conduct feature engineering on the data.
Which method should you use?
Exponential Smoothing (ETS) function.
One Class Support Vector Machine module
Time Series Anomaly Detection module
Finite Impulse Response (FIR) Filter module.
You register the following versions of a model.
Model name Model version Tags Properties
healthcare_model 3 ‘Training context’:’CPU Compute’ value: 87.43
healthcare_model 2 ‘Training context’:’CPU Compute’ value: 54.98
healthcare_model 1 ‘Training context’:’CPU Compute’ value: 23.56
You use the Azure ML Python SDK to run a training experiment. You use a variable named run to reference the experiment run.
After the run has been submitted and completed, you run the following code:
run.register_model(model_path=’outputs/model.pkl’,
model_name=’healthcare_model’,
tags={‘Training context’:’CPU Compute’} )
For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth
one point.
Answer Area
The code will cause a previous version of the saved model to be overwritten.
The version number will now be 4.
The latest version of the stored model will have a property of value: 87.43.
You develop and train a machine learning model to predict fraudulent transactions for a hotel booking website.
Traffic to the site varies considerably. The site experiences heavy traffic on Monday and Friday and much lower traffic on other days. Holidays are also high web traffic days. You need to deploy the model as an Azure Machine Learning real-time web service endpoint on compute that can dynamically scale up and down to support demand .
Which deployment compute option should you use?
attached Azure Databricks cluster
Azure Container Instance (ACI)
Azure Kubernetes Service (AKS) inference cluster
Azure Machine Learning Compute Instance
attached virtual machine in a different region
You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual Machine (DSVM).
What should you recommend?
Apache Drill
Tensorflow
Rattle
Weka
HOTSPOT
You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two datasets.
The image consists of two side-by-side graphs, each showing: A solid curve – represents the true probability distribution A dashed curve – represents a model’s predicted distribution Graph 1 The solid line: Starts from the bottom left Rises steadily to a peak towards the middle-right Then gently declines to the right Shape: Left-skewed or unimodal with right tail The dashed line:
A smaller, narrower peak than the solid line Slightly right-shifted compared to the solid line Indicates: The model underestimates the earlier values and shifts the peak Graph 2 The solid line: Rises sharply near the beginning (left side) Peaks early
Gradually declines toward the right Shape: Right-skewed with a longer right tail The dashed line:
A smaller and more compact peak Also positioned toward the left but not matching the solid curve closely Indicates: Model underestimates the longer tail and over-focuses on the peak
Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic. NOTE: Each correct selection is worth one point.
Answer Area
Question Answer choice
Which type of distribution is shown for the
dataset density curve of Graph 1?
Negative skew
Positive skew
Normal distribution
Bimodal distribution
Which type of distribution is shown for the
dataset density curve of Graph 2?
Negative skew
Positive skew
Normal distribution
Bimodal distribution
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a
unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in
the review screen.
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length
of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1
score, and AUC.
Does the solution meet the goal?
A. Yes
B. No
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a
unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in
the review screen.
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length
of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?
A. Yes
B. No
DRAG DROP
You are producing a multiple linear regression model in Azure Machine Learning Studio.
Several independent variables are highly correlated.
You need to select appropriate methods for conducting effective feature engineering on all the data.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the
answer area and arrange them in the correct order.
Select and Place:
Action
Evaluate the probability function
Remove duplicate rows
Use the Filter Based Feature Selection module
Test the hypothesis using t-Test
Compute linear correlation
Build a counting transform
Answer area
HOTSPOT
You are performing feature scaling by using the scikit-learn Python library for x.1 x2, and x3 features.
Original and scaled data is shown in the following image.
Use the drop-down menus to select the answer choice that answers each question based on the information presented in
the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer Area
Question
Which scaler is used in graph A?
Standard Scaler
Min Max Scale
Normalizer
Which scaler is used in graph B?
Standard Scaler
Min Max Scale
Normalizer
Which scaler is used in graph C?
Standard Scaler
Min Max Scale
Normalizer
Answer choice
Which scaler is used in graph A?
Standard Scaler
Which scaler is used in graph B?
Min Max Scale
Which scaler is used in graph C?
Normalizer