AWS Data Collections Flashcards

1
Q

AWS Types Collection

A
  • Real - Time Collection
  • Near Real-Time Collections
  • Batch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Real-Time Collections Services

A

Kinesis Data Streams
SQS
IoT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Near Real Time Collections

A

Kinesis Data Firehose
Data Migration Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Batch - Historical Analytics Services

A

Snowball
Data Pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explane Kinesis Data Streams Service

A

Managed service that allows you to collect, process and analyze real-time streaming data from various sources such as IoT, mobile devices, server logs, social networks and other real-time data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Producers Kinesis Data Streams

A

Applications, Client, SDK, KPL, Kinesis Agent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Consumers Kinesis Data Streams

A

Apps (KCL, SDK), Lambda, Kinesis Data Firehose and Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types Capacity Modes Kinesis Data Streams

A

Provisioned Mode
On-Deman Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In Kinesis Data Streams each shard gets in Provisioned Mode

A

1 MB/s or 1000 records per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kinesis In On-demand mode default capacity provisioned

A

4 MB/s in or 4000 records per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Points in Kinesis Data Streams Security

A

IAM - Control Access

Encryption usin HTTPS endpoints

KMS encryption
encryption/decryption of data on client side

VPC Endpoints

Monitor API using CloudTrail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explane Kinesis Producer SDK - PutRecords

A

API’s used PutRecords one and many records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PutRecords uses…

A

Batching and increases less HTTP requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PutRecords use batching…

A

less HTTP requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Kinesis Producer SDK - If we go over the limits

A

ProvisionedThroughputExceeded if we go over the limits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Managed AWS sources for Kinesis Data Streams

A

CloudWatch Logs, AWS IoT, Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

We need to send data asynchronously API to Kinesis…

A

Key Producer Library (KPL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to submit metrics Kinesis Producer Library

A

CloudWatch for monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

KPL Batching some delay with…

A

RecordMaxBufferedTime (default 100 ms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define Features Kinesis Agent

A

Monitor log files send to KDS
Java-based agent
Install Linux server environments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data Collection Services

A

Amazon Kinesis
AWS IoT Core
AWS Snowball
SQS
DMS
Direct Connect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

O que é ProvisionedThroughputExceeded?

A

Exceção que pode ocorrer no Kinesis quando aplicação atinge o limite de provisionamento de taxa de transferência definido para o stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Causes ProvisionedThroughputExceeded Exceptions

A

exceeding MB/s or TPS for any shard

Make sure you don’t have a hot shard (such as your partition key is bad

and too much data goes to that partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Solution for ProvisionedThroughputExceeded Exceptions

A
  • Retries with backoff
  • Increase shards (scaling)
  • Ensure your partition key is a good one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Influency Kinesis Producer Library (KPL) Batching
Introducing some delay with RecordMaxBufferedTime (default 100ms)
26
Kinesis Producer Library – When not to use
* The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user configurable) * Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance
27
Kinesis Agent functions
* Monitor Log files and sends them to Kinesis Data Streams * Java-based agent, built on top of KPL * Install in Linux-based server environments
28
Features Kinesis Agent
* Write from multiple directories and multiple streams * Routing feature based on directory / log file * Pre-process data before sending to streams (single line, csv to json, log to json…) * The agent handles file rotation, checkpointing, and retry upon failures * Emits metrics to CloudWatch for monitoring
29
Elements Kinesis Consumers Classic
* Kinesis SDK * Kinesis Client Library (KCL) * Kinesis Connector Library * 3rd party libraries: Spark, Log4J Appenders, Flume, Kafka Connect… * Kinesis Firehose * AWS Lambda
30
Features Kinesis Consumer SDK - GetRecords
* Classic Kinesis - Records are polled by consumers from a shard * Each shard has 2 MB total aggregate throughput * GetRecords returns up to 10MB of data (then throttle for 5 seconds) or up to 10000 records * Maximum of 5 GetRecords API calls per shard per second = 200ms latency * If 5 consumers application consume from the same shard, means every consumer can poll once a second and receive less than 400 KB/s
31
Kinesis Connector Library write data to:
* Amazon S3 * DynamoDB * Redshift * ElasticSearch
32
Each consumer per shards in Kinesis Enhanced Fan Out
2 MB/s
33
Means consumers and MB/s per shard in Kinesis Enhanced Fan Out
20 consumers and 40 MB/s
34
Tolerate latency in Standard Consumers
~ 200 ms
35
Latency requirements enhanced consumers
~ 70 ms
36
Destiny Kinesis Firehose
S3, Redshift, Elasticsearch, Splunk
37
True or False: Spark / KCL read from KDF
False
38
You can stream CloudWatch Logs into...
* Kinesis Data Streams * Kinesis Data Firehose * AWS Lambda
39
Data Stream write capacity on-demand maximum
200 MiB/sec and 200,000 records/second
40
Data Stream read capacity on-demand maximum per consumer
400 MiB/second
41
Data Stream write capacity in provisioned mode
1 MiB/second and 1,000 records/second
42
Data Stream read capacity in provisioned mode
2 MiB/second
43
SQS Use cases
* Order processing * Image Processing * Auto scaling queues according to messages. * Buffer and Batch messages for future processing. * Request Offloading
44
Kinesis Data Streams use cases
*Fast log and event data collection and processing * Real-Time metrics and reports * Mobile data capture * Real-Time data analytics * Gaming data feed * Complex Stream Processing * Data Feed from “Internet of Things
45
SQS Use cases
* Order processing * Image Processing * Auto-scaling queues according to messages. * Buffer and Batch messages for future processing. * Request Offloading
46
Features Kinesis Auto Scaling
- Is not native to Kinesis - The API call to change the number of shards is UpdateShardCount - Auto-scaling with Lambda
47
IoT Overview
- We deploy IoT devices ('Things') - We configure them and retrieve data from them
48
SQS Limit per message sent
256 KB
49
SQS how to send large messages
Use SQS Extended Client (Java Library)
50
SQS use cases
- Decouple applications - Buffer writes to a database - Handle large loads of messages coming in
51
SQS can be integrated with...
- Auto Scaling through CloudWatch!
52
SQS Max messages per consumers
120.000
53
SQS Message content format
XML, JSON, Unformatted text
54
SQS FIFO queues support maximum messages per second
3,000 messages per second (using batching)
55
SQS Pricing mode
* Pay per API Request * Pay per network usage
56
SQS Types Security
* Encryption in flight using the HTTPS endpoint * SSE (Server Side Encryption) using KMS * IAM policy * SQS queue access policy
57
IoT messages using the protocols types
MQTT, WebSockets or HTTP 1.1 protocols
58
Data Migration Service
Quickly and securely migrate databases to AWS, resilient, self healing
59
DMS Sources
* On-Premise and EC2 instances databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP, DB2 * Azure: Azure SQL Database * Amazon RDS: all including Aurora * Amazon S3
60
DMS Targets
TARGETS: * On-Premise and EC2 instances databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, SAP * Amazon RDS * Amazon Redshift * Amazon DynamoDB * Amazon S3 * ElasticSearch Service * Kinesis Data Streams * DocumentDB
61
DMS Convert your Database’s Schema from one engine to another
Schema Conversion Tool (SCT)
62
Direct Connect (DX)
Provides a dedicated private connection from a remote network to your VPC
63
Use cases Direct Connect
* Increase bandwidth throughput - working with large data sets – lower cost * More consistent network experience - applications using real-time data feeds * Hybrid Environments (on prem + cloud)
64
Direct Connect Gateway
If you want to setup a Direct Connect to one or more VPC in many different regions (same account), you must use a Direct Connect Gateway
65
Direct Connect – Connection Types
Dedicated Connections Hosted Connections
66
Services AWS Snow Family
Snowcone, Snowball Edge, Snowmobile
67
Data Migration Services Snow Family
Snowcone, Snowball Edge, Snowmobile
68
Edge Computing services
Snowcone, Snowball Edge
69
Snowball Edge Storage Optimized capacity
80 TB of HDD capacity
70
Snowball Edge Compute Optimized capacity
42 TB of HDD capacity
71
AWS Snowcone capacity
8 TB
72
Use cases of Edge Computing
* Preprocess data * Machine learning at the edge * Transcoding media streams
73
Snow Family – Edge Computing
Snowcone (smaller) Snowball Edge – Compute Optimized Snowball Edge – Storage Optimized
74
AWS OpsHub
(a software you install on your computer / laptop) to manage your Snow Family Device
75
Amazon MSK is:
Managed Streaming for Apache Kafka
76
MSK – Configurations
* Choose the number of AZ (3 – recommended, or 2) * Choose the VPC & Subnets * The broker instance type (ex: kafka.m5.large) * The number of brokers per AZ (can add brokers later) * Size of your EBS volumes (1GB – 16TB)
77
MSK – Security
- Encryption - Network Security - Authentication & Authorization
78
MSK Authentication & Authorization (important):
* Define who can read/write to which topics * Mutual TLS (AuthN) + Kafka ACLs (AuthZ) * SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ) * IAM Access Control (AuthN + AuthZ)
79
MSK – Monitoring
* CloudWatch Metrics * Prometheus (Open-Source Monitoring) * Broker Log Delivery
80
MSK options Broken Log Delivery
* Delivery to CloudWatch Logs * Delivery to Amazon S3 * Delivery to Kinesis Data Streams
81
MSK Connect
* You can deploy any Kafka Connect connectors to MSK Connect as a plugin
82
MSK Data is Stored on...
EBS volumes
83
Producers Examples MSK
Kinesis, IoT, RDS
84
Consumers examples MSK
EMR, S3, SageMaker, Kinesis, RDS
85
MKS Size of your EBS volumes
1GB - 16TB
86
Componentes do Kinesis Producers
- Kinesis SDK - Kinesis Producer Library (KPL) - Kinesis Agent - Bibliotecas: Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi...
87
O que é o Kinesis Data Stream?
Serviço de streaming da AWS que permite ingestão, processamento e análise de dados em tempo real.
88
O que é o Kinesis Data Streams Producers?
Componente responsável pela ingestão de dados em tempo real.
89
O que é o Kinesis Data Streams Consumers
Componente responsável pelo processamento e análise dos dados. Responsável por ler os dados de 1 ou mais shards
90
O que é o Kinesis Data Analytics
Serviço que permite processar e analisar dados em tempo real utilizando consultas SQL padrão.