Big Data & Serverless Flashcards

(53 cards)

1
Q

What does ETL stand for?

A

Extract, Transform, Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Big data ETL tool that can use open-source software (such has Spark, HBase ect) natively on AWS

A

Amazon EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What Amazon services can you run an EMR cluster on?

A

EC2
EKS
Outpost

processes data and puts in S3 Bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Open source tools that can be run on EMR

A

Spark
Hbase
Hadoop
Presto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to reduce overall cost of EMR EC2 Clusters?

A

Use Spot or RI instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Real-time streaming service for AWS

A

Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Kinesis that provides real-time speed where you have to manage producers and consumers (you must scale shards)

A

Kinesis Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Kinesis that provides nearly real time speed where you don’t have to worry as much about scaling AWS manages scaling

A

Kinesis Firehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Endpoints available for Kinesis Firehouse

A

Elasticsearch
S3
Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Analyze Kinesis data using standard SQL

A

Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Application Requires real-time message delivery which service should you use?

A

Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

TRUE or FALSE Kinesis Data Analytics is Serverless?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interactive query service that makes it easy to analyze data in S3 using SQL

A

Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Serverless data integration service to preform ETL without having to manage servers?

A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to use Athena with Glue?

A

Set up S3 bucket with data

Set up a Glue crawler to analyze data in bucket

Data is put in Glue Catalog

Amazon Athena can run queries on restructured data in the Catalog

Amazon Quicksight to visualize data in dashboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

TRUE or FALSE, Athena is serverless

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Fully managed data visualization service for BI similar to Tableau

A

AWS Quicksight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Managed ETL service for automating movement and transform of your data. Create data-driven workflows and enforces logic you define.

A

AWS Data-Pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to configure notifications and failures in AWS Data-Pipeline?

A

Via Amazon SNS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

AWS Storage and Compute Services that AWS Data-Pipeline can be integrated with

A

DynamoDB
RDS
Redshift
S3

Compute:
EC2
EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

TRUE or FALSE, for AWS Data-Pipeline I cannot use RI instances

A

FALSE, you can use previously existing instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are Data-Pipeline Task Runners

A

EC2 that poll for different tasks when found

23
Q

What are Data-Pipeline Data Nodes

A

Define the locations and types of data for inputs and outputs

24
Q

Popular Use Cases for using Data-Pipeline

A

Processing EMR data with Hadoop Steaming

Importing and Exporting DynamoDB data

Copying CSV files or data between S3 buckets

Exporting RDS data to S3

Copying Data to Redshift

Exporting MySQL data to S3 to generate reports

25
Fully managed streaming service leveraging Apache Kaftka. Easily to use and has great fault tolerance for integrating previously existing apps.
Amazon MSK
26
Fully-managed streaming service for leveraging Apache Kaftka. Easy to use and great for previously existing applications.
Amazon MSK
27
Which operations can you manage with Amazon MSK
Data Plane operations For producing and consuming data
28
TRUE or FALSE, Amazon MSK using KMS keys for SSE encryption at default
TRUE
29
With Amazon MSK where can you send broker logs
Cloudwatch Kinesis Firehose S3
30
Analytics and visualization service that is used to analyze and search logs
Opensearch Elasticsearch
31
Min and Max Memory Size of Lambda functions
128 MB 10480 MB (10 GB)
32
TRUE or FALSE a Lambda function can run inside a VPC
TRUE, a lambda can run inside or outside a VPC
33
Service that allows you to easily find, deploy, and publish your own serverless applications using SAM templates
AWS Serverless Application Repository
34
2 Options to choose from in Serverless Application Repostitory
PUBLISH DEPLOY
35
Default Visibility of templates that you publish in Serverless Application Repository
Private, but can make public to either certain AWS accounts or all
36
TRUE or FALSE, you must have an AWS account to deploy a Serverless Application Repostiroy template
False, you do not need an AWS account
37
Container flow (steps from start to finish of creating a container)
Create a Dockerfile Create Image from Docker file Put Image in Container registry service Launch Container based on Image
38
What container service should I use if I need to run my containers on-prem
Kubernetes
39
Which container service should I use if I need to easily integrate with other AWS services
ECS
40
If I want to run a container and have an issue with cost should I pick EC2 or Fargate?
If long running containers that are 24/7 choose EC2
41
2 Types of Patterns allowed on EventBridge events
Event Pattern Schedule
42
What types of images or artifacts are allowed in Amazon ECR?
Docker Images OCI Images OCI Artifacts
43
Is there a place to get public images in AWS?
Yes, ECR Public
44
How can I prevent a ton of old ECR images getting stored in my repository that are really old and I no longer use
Use a Lifecycle Policyq
45
Security feature in ECR that helps identify software vulnerabilites
Image Scanning
46
Are ECR repoistories global or regional?
Regional
47
TRUE or FALSE, ECR images are only shareable within the region they are in but can be shared across multiple accounts
FALSE, images can be shared both cross-regionally and cross-account
48
TRUE or FALSE, can tags be overwritten in ECR
True and False. If tag mutability is turned on in ECR repostiroy it prevents tags to be overwritten
49
Services ECR integrates with
ECS, EKS, Amazon Linux Containers, on-prem for own containers
50
I want to use EKS but don't want it to be managed by AWS what service can I use?
EKS-D (EKS-Distro) which can be ran anywhere and user is fully responsible
51
I want to run EKS on-prem but want it to be managed by customer but with Amazon EKS efficencies what do I use?
EKS Anywhere
52
Can I run ECS on-prem?
Yes, using ECS Anywhere
53
Requirements from Running ECS Anywhere
Must have SSM agent, ECS Agent, and Docker installed on local server Must register instances as SSM managed instances Create installation script in ECS console (must contain SSM activation keys and commands for required software) Execute scripts on on-prem servers or VMs Deploy containers using EXTERNAL launch type