Path5.Mod1.a - Run Pipelines - Creating a Component Flashcards

1
Q

build share scale

Components
- What they are
- Three reasons for using them
- How to make them accessible to other users in the Workspace

A
  • Reusable, self-contained scripts that are easily shared with all users in an ML Workspace. Ideally you design a component to perform one step or a specific action relevant to your ML workflow.
  • Reasons:
    1. To build a pipeline (dur!)
    2. To share reusable code
    3. When you’re preparing your code for scale
  • To make them accessible in the Workspace you need to register your Components to the Workspace.

A pipeline is a workflow of ML tasks, related to training an ML Model, with each step or task being a Component. In other words, a pipeline is a workflow made up of Components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Met Int CCE

Three parts to a Component

A
  • Metadata: ex. the Component name, version, etc.
  • Interface: Expected input parameters (ex. dataset, hyperparameters) and expected output (ex. metrics, artifacts)
  • Code/Command/Environment: Specifies where your code is and how to run it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two files required to create a Component, and two ways to create the latter…

A
  • A script containing the workflow you want to execute
  • A YAML file to define the Three Parts of your Component

The YAML file can be created manually or using command_component() as a decorator to create the file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Required Libraries for creating a Component
Know this for the exam!!!

A

azure.identity for credentialing
azure.ai.ml obviously
azure.ai.ml.dsl for the pipeline attribute

from azure.identity import DefaultAzureCredential,
from azure.identity import InteractiveBrowserCredential

from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Given the following code, determine where The Three Parts of a Component are defined:

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: prep_data
display_name: Prepare training data
version: 1
type: command
inputs:
  input_data: 
    type: uri_file
outputs:
  output_data:
    type: uri_file
code: ./src
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
command: >-
  python prep.py 
  --input_data ${{inputs.input_data}}
  --output_data ${{outputs.output_data}}
A

Metadata:
- Component name “prep_data”
- Version 1
- Display name “Prepare training data”
- Type “command”

Interface:
- inputs.input_data.type uri_file
- outputs.output_data.type uri_file

Command, Code, Environment
- code (the location) “./src”
- environment, similar to a Docker’s base image
- command, what to execute when the Component is used:
~~~
>-
python prep
–input_data ${{inputs.input_data}}
–output_data ${{outputs.output_data}}
~~~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

unusual path, usual path

Code samples for Loading (and it’s parameter) and Registering a Component

A

Given the YML file defined earlier, to load it:

from azure.ai.ml import load_component

parent_dir = "./src"
loaded_comp = load_component(source=parent_dir + "/prep.yml")

To register the Component:
prep = ml_client.components.create_or_update(loaded_comp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For referencing web-based data, use this entity. Also the significance of using it

A
from azure.ai.ml import Input

Why use Input though? Because it creates a reference to the data source location, meaning the data remains in its existing location, and we incur no extra storage cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What the command_component() annotation does in the example code below, and what the method’s signature translates to in ML Designer:

import os
from pathlib import Path
from mldesigner import command_component, Input, Output

@command_component(
    name="prep_data",
    version="1",
    display_name="Prep Data",
    description="Convert data to CSV file, and split to training and test data",
    environment=dict(
        conda_file=Path(\_\_file\_\_).parent / "conda.yaml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
)
def prepare_data_component(
    input_data: Input(type="uri_folder"),
    training_data: Output(type="uri_folder"),
    test_data: Output(type="uri_folder"),
):
   // Implementation here
A

Let’s you define a Component’s interface, metadata and code from a Python function. The code will be transformed into a single static specification (YAML) that a pipeline can process.

Note in the image below what the Component will look like in ML Designer; the display name, the inputs/output points, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly