test2 Flashcards
https://www.dumpsbase.com/freedumps/?s=DP+100 (69 cards)
Topic 2, Case Study 2
Case study
Overview
You are a data scientist for Fabrikam Residences, a company specializing in quality private and commercial property in the United States. Fabrikam Residences is considering expanding into Europe and has asked you to investigate prices for private residences in major European cities. You use Azure Machine Learning Studio to measure the median value of properties. You produce a regression model to predict property prices by using the Linear Regression and Bayesian Linear Regression modules.
Datasets
There are two datasets in CSV format that contain property details for two cities, London and Paris, with the following columns:
The two datasets have been added to Azure Machine Learning Studio as separate datasets and included as the starting point of the experiment.
Dataset issues
The AccessibilityToHighway column in both datasets contains missing values. The missing data must be replaced with new data so that it is modeled conditionally using the other variables in the data before filling in the missing values.
Columns in each dataset contain missing and null values. The dataset also contains many outliers. The Age column has a high proportion of outliers. You need to remove the rows that have outliers in the Age column. The MedianValue and AvgRoomsinHouse columns both hold data in numeric format. You need to select a feature selection algorithm to analyze the relationship between the two columns in more detail.
Model fit
The model shows signs of overfitting. You need to produce a more refined regression model that reduces the overfitting.
Experiment requirements
You must set up the experiment to cross-validate the Linear Regression and Bayesian Linear Regression modules to evaluate performance.
In each case, the predictor of the dataset is the column named MedianValue. An initial investigation showed that the datasets are identical in structure apart from the MedianValue column. The smaller Paris dataset contains the MedianValue in text format, whereas the larger London dataset contains the MedianValue in numerical format. You must ensure that the datatype of the MedianValue column of the Paris dataset matches the structure of the London dataset.
You must prioritize the columns of data for predicting the outcome. You must use non-parameters statistics to measure the relationships.
You must use a feature selection algorithm to analyze the relationship between the MedianValue and AvgRoomsinHouse columns.
Model training
Given a trained model and a test dataset, you need to compute the permutation feature importance scores of feature variables. You need to set up the Permutation Feature Importance module to select the correct metric to investigate the model’s accuracy and replicate the findings.
You want to configure hyperparameters in the model learning process to speed the learning phase by using hyperparameters. In addition, this configuration should cancel the lowest performing runs at each evaluation interval, thereby directing effort and resources towards models that are more likely to be successful.
You are concerned that the model might not efficiently use compute resources in hyperparameter tuning. You also are concerned that the model might prevent an increase in the overall tuning time. Therefore, you need to implement an early stopping criterion on models that provides savings without terminating promising jobs.
Testing
You must produce multiple partitions of a dataset based on sampling using the Partition and Sample module in Azure Machine Learning Studio. You must create three equal partitions for cross-validation. You must also configure the cross-validation process so that the rows in the test and training datasets are divided evenly by properties that are near each city’s main river. The data that identifies that a property is near a river is held in the column named NextToRiver. You want to complete this task before the data goes through the sampling process.
When you train a Linear Regression module using a property dataset that shows data for property prices for a large city, you need to determine the best features to use in a model. You can choose standard metrics provided to measure performance before and after the feature importance process completes. You must ensure that the distribution of the features across multiple training models is consistent.
Data visualization
You need to provide the test results to the Fabrikam Residences team. You create data visualizations to aid in presenting the results.
You must produce a Receiver Operating Characteristic (ROC) curve to conduct a
diagnostic test evaluation of the model. You need to select appropriate methods for producing the ROC curve in Azure Machine Learning Studio to compare the Two-Class Decision Forest and the Two-Class Decision Jungle modules with one another.
DRAG DROP -
You need to define an evaluation strategy for the crowd sentiment models.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct
order.
Select and Place:
Actions
Define a cross-entropy function activation.
Add cost functions for each target state
Evaluate the classification error metric.
Evaluate the distance error metric.
Add cost functions for each component metric
Define a sigmoid loss function activation.
Answer Area
DRAG DROP -
You need to correct the model fit issue.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct
order.
Select and Place:
Actions
Add the Ordinal Regression module.
Add the Two-Class Averaged
Perception module.
Augment the data.
Add the Bayesian Linear Regression
module.
Decrease the memory size for L-BFGS.
Add the Multiclass Decision Jungle
module.
Configure the regularization weight.
Answer Area
HOTSPOT -
You need to set up the Permutation Feature Importance module according to the model training requirements.
Which properties should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer Area
Tune Model Hyperparameters
Specify parameter sweeping mode: Random sweep
Maximum number of runs on random sweep: 5
Random seed: 0
Label column
Selected columns
Column names: MedianValue
Launch column selector
Metric for measuring performance for classification
F-score
Precision
Recall
Accuracy
Metric for measuring performance for regression
Root of mean squared error
R-squared
Mean zero one error
Mean absolute error
HOTSPOT -
You need to configure the Permutation Feature Importance module for the model training requirements.
What should you do? To answer, select the appropriate options in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer Area
Permutation Feature importance
Random seed
0
500
Regression - Root Mean Square Error
Regression - R-squared
Regression - Mean Zero One Error
Regression - Mean Absolute Error
HOTSPOT -
You need to configure the Edit Metadata module so that the structure of the datasets match.
Which configuration options should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer Area
Properties
Project
Edit Metadata
Column
Selected columns:
Column names: Median Value
Launch column selector
Floating point
DateTime
TimeSpan
Integer
Unchanged
Make Categorical
Make Uncategorical
Fields
5
HOTSPOT -
You need to identify the methods for dividing the data according to the testing requirements.
Which properties should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer Area
Properties
Project
Partition and Sample
Assign to Folds
Sampling
Head
Partition or sample mode
Use replacement in the partitioning (uncheck)
Randomized split (checked)
Random seed
0
True
False
Partition evenly
Partition with custom partitions
Specify the partitioner method
Partition evenly
Specify number of folds to split evenly into
3
HOTSPOT -
You need to replace the missing data in the AccessibilityToHighway columns.
How should you configure the Clean Missing Data module? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer Area
Properties
Project
Clean Missing Data
Columns to be cleaned
Selected columns:
Column names: AccessibilityToHighway
Launch column selector
Minimum missing value ratio
0
Maximum missing value ratio
1
Cleaning mode:
Replace using MICE
Replace with Mean
Replace with Median
Replace with Mode
Cols with all missing values:
Propagate
Remove
◿ Generate missing value indicator column
Number of iterations
5
You need to visually identify whether outliers exist in the Age column and quantify the outliers before the outliers are removed.
Which three Azure Machine Learning Studio modules should you use? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Create Scatterplot
B. Summarize Data
C. Clip Values
D. Replace Discrete Values
E. Build Counting Transform
DRAG DROP -
You are building an experiment using the Azure Machine Learning designer.
You split a dataset into training and testing sets. You select the Two-Class Boosted Decision Tree as the algorithm.
You need to determine the Area Under the Curve (AUC) of the model.
Which three modules should you use in sequence? To answer, move the appropriate modules from the list of modules to the answer area and arrange them in the correct order.
Select and Place:
Modules
Export Data
Tune Model Hyperparameters
Cross Validate Model
Evaluate Model
Score Model
Train Model
Answer Area
You need to select a feature extraction method.
Which method should you use?
Spearman correlation
Mutual information
Mann-Whitney test
Pearson’s correlation
DRAG DROP -
You create an image classification model in Azure Machine Learning Studio.
You need to deploy the model as a containerized web service.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Actions
Start the container
Create a container image
Create an Azure Batch Al account
Get the http endpoint of the web
service
Register the container image
Train the model
Answer Area
You need to select a feature extraction method.
Which method should you use?
Mutual information
Mood’s median test
Kendall correlation
Permutation Feature Importance
DRAG DROP -
You create an image classification model in Azure Machine Learning Studio.
You need to deploy the model as a containerized web service.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Actions
Start the container
Create a container image
Create an Azure Batch Al account
Get the http endpoint of the web
service
Register the container image
Train the model
Answer Area
DRAG DROP -
You need to implement early stopping criteria as stated in the model training requirements.
Which three code segments should you use to develop the solution? To answer, move the appropriate code segments from the list of code segments to the answer area
and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive the credit for any of the correct orders you select.
Select and Place:
Code segments
early_termination_policy = TruncationSelectionPolicy
(evaluation_interval=1, truncation_percentage=20,
delay_evaluation = 5)
import BanditPolicy
import TruncationSelectionPolicy
early_termination_policy= BanditPolicy (slack_factor =
0.1, evaluation_interval = 1, delay_evaluation = 5)
from azureml.train.hyperdrive
early_termination_policy = MedianStoppingPolicy
(evaluation_interval = 1, delay_evaluation=5)
import MedianStoppingPolicy
Answer Area
Topic 3, Mix Questions
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model.
You configure a HyperDriveConfig for the experiment by running the following code:
You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validation data are stored in a variable named y_test variable, and the predicted probabilities from the model are stored in a variable named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.
Solution: Run the following code:
Does the solution meet the goal?
Yes
No
You are a data scientist creating a linear regression model.
You need to determine how closely the data fits the regression line.
Which metric should you review?
Coefficient of determination
Recall
Precision
Mean absolute error
Root Mean Square Error
You train and register a model in your Azure Machine Learning workspace.
You must publish a pipeline that enables client applications to use the model for batch inferencing. You must use a pipeline with a single ParallelRunStep step that runs a Python inferencing script to get predictions from the input data.
You need to create the inferencing script for the ParallelRunStep pipeline step.
Which two functions should you include? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
run(mini_batch)
main()
batch()
init()
score(mini_batch)
You are evaluating a completed binary classification machine.
You need to use the precision as the evaluation metric.
Which visualization should you use?
scatter plot
coefficient of determination
Receiver Operating Characteristic CROC) curve
Gradient descent
You run an experiment that uses an AutoMLConfig class to define an automated machine learning task with a maximum of ten model training iterations. The task will attempt to find the best performing model based on a metric named accuracy.
You submit the experiment with the following code:
from azureml.core.experiment import Experiment
automl_experiment = Experiment (ws, ‘automl_experiment’ )
automl_run = automl_experiment.submit (automl_config,
show output=True)
You need to create Python code that returns the best model that is generated by the automated machine learning task.
Which code segment should you use?
best_model = automl_run.get_details()
best_model = automl_run.get_output()[1]
best_model = automl_run.get_file_names()[1]
best_model = automl_run.get_metrics()
You plan to run a script as an experiment using a Script Run Configuration. The script uses modules from the scipy library as well as several Python packages that are not typically installed in a default conda environment.
You plan to run the experiment on your local workstation for small datasets and scale out the experiment by running it on more powerful remote compute clusters for larger datasets.
You need to ensure that the experiment runs successfully on local and remote compute with the least administrative effort.
What should you do?
Create and register an Environment that includes the required packages. Use this Environment for all experiment runs.
Always run the experiment with an Estimator by using the default packages.
Do not specify an environment in the run configuration for the experiment. Run the experiment by using the default environment.
Create a config.yaml file defining the conda packages that are required and save the file in the experiment folder.
Create a virtual machine (VM) with the required Python configuration and attach the VM as a compute target. Use this compute target for all experiment runs.
You are a data scientist building a deep convolutional neural network (CNN) for image classification.
The CNN model you built shows signs of overfitting.
You need to reduce overfitting and converge the model to an optimal fit.
Which two actions should you perform? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
Reduce the amount of training data.
Add an additional dense layer with 64 input units
Add L1/L2 regularization.
Use training data augmentation
Add an additional dense layer with 512 input units.
You are creating a new Azure Machine Learning pipeline using the designer.
The pipeline must train a model using data in a comma-separated values (CSV) file that is published on a
website. You have not created a dataset for this file.
You need to ingest the data from the CSV file into the designer pipeline using the minimal administrative effort.
Which module should you add to the pipeline in Designer?
Convert to CSV
Enter Data Manually D
Import Data
Dataset
You are building a regression model tot estimating the number of calls during an event.
You need to determine whether the feature values achieve the conditions to build a Poisson regression model.
Which two conditions must the feature set contain? I ach correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
The label data must be a negative value.
The label data can be positive or negative,
The label data must be a positive value
The label data must be non discrete.
The data must be whole numbers.
You create an Azure Machine Learning compute resource to train models.
The compute resource is configured as follows:
✑ Minimum nodes: 2
✑ Maximum nodes: 4
You must decrease the minimum number of nodes and increase the maximum number of nodes to the following values:
✑ Minimum nodes: 0
✑ Maximum nodes: 8
You need to reconfigure the compute resource.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
Use the Azure Machine Learning studio.
Run the update method of the AmlCompute class in the Python SD
Use the Azure portal.
Use the Azure Machine Learning designer.
Run the refresh_state() method of the BatchCompute class in the Python SDK