Domain 1 Solutions Flashcards by Natasha WrightPope

Helps you set up a secure data lake and govern, secure, and globally share data for ML and analytics. Manages fine-grained access control on S3 and metadata in Glue Data Catalog with its own permissions model that augments IAM

Lake Formation

How well did you know this?

Not at all

Perfectly

Preferred storage option

How well did you know this?

Not at all

Perfectly

Used to build, train, and deploy ML models

SageMaker

How well did you know this?

Not at all

Perfectly

A file system service that speeds up training jobs by serving your S3 data to SageMaker at high speeds

FSx for Lustre

How well did you know this?

Not at all

Perfectly

A training data source that directly launches training jobs from service w/out need for data movement for faster training start times

EFS

How well did you know this?

Not at all

Perfectly

Block-level storage device that you can attach to your instances and use as you would use a physical hard drive

EBS

How well did you know this?

Not at all

Perfectly

An ETL service to categorize, clean, enrich, and move data b/w various data stores that’s used for batch ingestions, automates data discovery

Glue

How well did you know this?

Not at all

Perfectly

This batch ingestion service reads from historical data from source systems, such as relational database management systems, data warehouses, and NoSQL databases, at any desired interval

DMS

How well did you know this?

Not at all

Perfectly

Batch ingestion service that automates various ETL tasks that involve complex workflows

Step Functions

How well did you know this?

Not at all

Perfectly

Uses Kinesis Producer Library to write to Kinesis data stream

Kinesis Data Streams

How well did you know this?

Not at all

Perfectly

Batch/compress data to generate incremental views and execute custom transformation logic using Lambda before delivering incremental view to S3

Kinesis Firehose

How well did you know this?

Not at all

Perfectly

Easiest way to process/transform data streaming thru Kinesis Data Streams or Firehose using SQL and provides insights in near real-time from incremental streams before storing in S3

Kinesis Data Analytics

How well did you know this?

Not at all

Perfectly

Used to ingest/analyze video/audio data

Kinesis Video Streams

How well did you know this?

Not at all

Perfectly

A distributed data store optimized for ingesting and processing streaming data in real-time. Used to publish and subscribe to streams of records, effectively store streams of records in the order in which records were generated, and process streams of records in real time

Apache Kafka

How well did you know this?

Not at all

Perfectly

Supports many instance types that have proportionally high CPU with increased network performance, which is well suited for HPC (high-performance computing) applications

EMR

How well did you know this?

Not at all

Perfectly

Customers can store a single source of data in Amazon S3 and perform ad hoc analysis

Athena

Uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning to deliver the best price performance at any scale

Redshift

Provides a protocol of data processing and node task distribution and management and uses algorithms to split datasets into subsets and distribute them across nodes in a compute cluster

Spark

A managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data

EMR

Use to build a visual dashboard for metrics

QuickSight

An open-source Java software framework that supports massive data processing across a cluster of instances. Uses various processing models, such as MapReduce, to distribute processing across multiple instances and also uses a distributed file system called HDFS to store data across multiple instances

Hadoop

A serverless, NoSQL, fully managed database with single-digit millisecond performance at any scale that addresses need to overcome scaling and operational complexities of relational databases

DynamoDB

A service that allows you to visually prepare and clean your data, normalize your data, and run a number of different feature transforms on the dataset without writing code

Glue Data Brew

An agnostic, free, open-source command line tool that works on top of Git repositories

Data Version Control (DVC)

Allows users to leverage Hadoop MapReduce using a SQL interface, enabling analytics at a massive scale, in addition to distributed and fault-tolerant data warehousing

Hive

_____ is an AI service that makes it easy for users to implement image or video analysis workflows into their applications. It aims to leverage Amazon's vast experience in using deep learning for various image-based workloads such as image classification, object detection, detection of text in image, facial recognition, sentiment, and most recently, public safety.

Amazon Rekognition

_____ is an AI service that allows you to quickly extract intelligence from documents such as financial reports, medical records, tax forms, and university application forms beyond simple optical character recognition (OCR). With this, you don't have to build deep learning computer vision models to extract text, forms, or tables from PDF documents; this will do that for you, so you can focus on using the extracted information for downstream business tasks.

Amazon Textract

______ converts speech to text, and leverages the same technologies powering Amazon Alexa but is available as a transcription service that allows you to transcribe your voice data without any prior machine learning knowledge.

Amazon Transcribe

Translates text from various languages.

Amazon Translate

Converts text to speech (TTS)

Amazon Polly

_____ is an AWS service, powered by natural language understanding (NLU) and automatic speech recognition (ASR), that allows users to build and deploy conversational interfaces for their applications. With this, you can build a tailored and personalized experience for your customers to engage with your platform without any deep learning expertise.

Amazon Lex

_____ allows you to build your own search application using natural language that provides highly relevant responses to user queries as you would get from a human expert within your organization.

Amazon Kendra

_____ is a machine learning service that allows businesses to rapidly develop personalized recommendation systems to provide a better customer experience to their end customers.

Amazon Personalize

_____ is an AI service that uses both statistical and deep learning–based algorithms to provide highly accurate forecasts. Similar to personalization, as a major retailer and cloud services provider.

Amazon Forecast

_____ provides a set of natural language processing–based APIs to pretrained and custom models that can extract insights from text. Amazon Comprehend can analyze a document for entities, key phrases, PII, language, sentiment, and syntax.

Amazon Comprehend

_____uses program analysis and machine learning built from millions of lines of Java and Python code from the Amazon codebase to provide intelligent recommendations for improving code performance and quality. It consists of two main services: Reviewer and Profiler.

Amazon CodeGuru

_____ is used to get a secondary human review of a low-confidence prediction from machine learning models. It works out of the box with Amazon Rekognition and Textract, but you can also use it with your own custom ML models. It is usually used when you want to review low-confidence predictions or to audit a random sample of predictions regardless of confidence levels.

Amazon Augmented AI (or A2I)

_____ lets you create robotics applications at scale using the Robot Operating System (ROS) framework and extends this to other cloud services like SageMaker for machine learning. It provides you with a robot development environment on the cloud, with simulation capabilities to test these robots on the cloud.

AWS RoboMaker