Introduction to Cloud Computing - Unit 5 Flashcards
(59 cards)
What is MLflow?
An open-source platform to streamline the process of developing and deploying ML models in a version-controlled manner.
What components compose MLflow, and what do they mean?
- MLflow Tracking: ability to track chosen parameters and models during different runs
- MLflow Projects: portable projects consisting of ML models
- MLflow Models
- MLflow Model Registry: centralized location for trained models
When packaging ML models into MLflow Projects, which technologies are used?
Conda and Docker
What is a virtual environment in Python?
An isolated context with defined dependencies.
What is the relationship between an experiment and a run in MLflow?
An experiment can hold any number of runs.
How is automatic logging acheived with MLflow in Python?
Through the .autolog()
method.
What is the MLflow UI?
A locally-hosted webpage that allows visualization and comparision of different models.
What is one of the main offers of Databricks?
Fully managed Spark clusters.
What is the source of the word Lakehouses? What are they in the context of Databricks?
A combination of data lakes and data warehouse. They are a unified data store solution for structured and unstructured data.
What does the acryonym ACID have to do with, and what does it stand for?
Database principles to follow:
- atomicity
- consistency
- isolation
- durability
What are Delta Lakes used for?
Big data
What different tables exist in Delta Live Tables?
- bronze: unprocessed raw data
- silver: partly pre-processed, enriched data
- gold: ready-to-use data for business needs
What is Unity Catalog used for, and ontop of?
For data governance ontop of Delta Lake.
Does Databricks have only one environment in which it executes?
No, it depends on the context (SQL, streaming, ML, etc.)
What specialized service does Google Cloud offer for ML?
Specialized VMs
What is the Google Cloud AI Platform?
A centralized collection of services developed for typical ML and DS workflows.
What does the Google Cloud offering AI Platform Pipelines offer?
- regular re-training of data
- CI/CD pipelines
- labeling service
What is Google Vertex AI?
A development environment designed to cover typical data science workflows.
What technologies, in conjunction with Google Vertex AI, can be used to develop an ML model without coding at all?
AutoML and a GUI
What languages does Google Vertex AI Workbench support?
Python, SQL, or R
What is transfer learning in ML?
Adapting a pre-trained ML model to particular use cases.
What is the meaning of the following ML offerings on Google Cloud:
- teachable machine
- natural language AI
- dialogflow
- teachable machine: transfer learning without writing code
- natural language AI: extract numeric information from unstructured text data
- dialogflow: used to create chat- and voice-bots
What is Amazon Elastic MapReduce used for, and with which technologies?
Big data processing using Apache Spark, Hive, Presto, or another big data processing framework.
What does AWS SageMaker offer?
A broad range of services and tools covering the complete lifecycle of a typical data science project.