Analytics Flashcards

1
Q

In Analytics there are 4 types of analysis you can make. What are their names?

A

-Descriptive analytics
-Diagnostic analytics
-Predictive analytics
-Prescriptive analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Descriptive Analytics?

A

Descriptive analytics focuses on analyzing present and past data to determine what is happening at present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Diagnostic Analytics?

A

Diagnostic Analystic focus on analysing data to determine for what reason something happens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Predictive Analytics?

A

Predictive analytics focuses on determining what might happen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Prescriptive Analytics?

A

Prescriptive analytics are similar to predictive analytics, but instead of only predicting what might happen you also suggests actions to take and what are their consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Amazon CodeWhisperer

A

Amazon CodeWhisperer is an AWS AI Service that generates and comments code ussing LLM tecnology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False: Amazon CodeWhisperer can detect security vulnerabilities in your code

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 5 big Vs of big data and what do they mean?

A

-Volume: The amount of date being ingested
-Variety: The number and types of data sources
-Velocity: The speed with which new data is processed and stored
-Veracity: The degree to which the data can be trusted
-Value: The amount of information that can be extracted from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 most frequent caracterizations of data based on their format?

A

-Structured Data
-Semi-structured Data
-Unstructured Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 4 data processing velocities?

A

-Scheduled
-Periodic
-Near real-time
-Real-time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is AWS Lake Formation?

A

A service that simplifies ingesting, cleaning, cataloging, transforming, and securing data on S3 Data Lakes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS’ main data warehousing solution is called AWS __________

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Complete the following statement regarding ETL on AWS:
When looking at a standard, simplified ETL pipeline on AWS, one should use ________. For customized processes, however, one should use __________.

A

-Glue
-EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 4 main functions of a Data Lake?

A

-Ingest and store data
-Catalog data for searches
-Secure and protect data
-Allow analytics and insights to be run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 6 stages on an Analytics pipeline?

A

-Data Source
-Ingestion
-Data Store
-Cataloging and processing stage
-Search and analytics stage
-Visualization stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 3 main challenges in mantaining a data lake?

A

-Data governance
-Data quality
-Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are AWS Lake Formation’s 4 main features?

A

-Automate building the data lake environment (collecting, moving, cleansing data, etc)
-Store metadata from raw and processed datasets
-Orchestrate ETL jobs, crawlers and triggers using AWS Glue
-Centralize access control to the data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Lake formation security model consists of 3 security roles to be used in managing the lake. What are they?

A

-Lake formation administrator
-Database Creator
-Table Creator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What permissions does the Lake Formation administrator have?

A

-Has full read access to resources
-Has data location permissions
-Can grant or revoke access to resources, including self
-Can create databases
-Can grant permission to create databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What permissions does the Lake Formation Database Creator have?

A

-Has all database permissions on databases that they create
-Has permissions on tables that they create
-Can use console or API to designate database creators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What permissions does the Lake Formation Table Creator have?

A

-Has permissions on tables that they create
-Can grant permissions on tables that they create
-Can view databases containing the tables that they create

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The AWS Service for Data Mesh is called __________

A

DataZone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the main Kinesis services?

A

-Kinesis Data Streams
-Kinesis Video Streams
-Kinesis Firehose
-Kinesis Analytics

24
Q

What are the default and the max Kinesis Data Streams data retention?

A

-Default: 24 hours
-Max: 365 days

25
True or False: Kinesis Data Streams data can only be deleted using the AWS SDK
False, it cannot be deleted at all
26
What are the 2 Kinesis Data Streams provisioning types?
-On-demand -Provisioned
27
Kinesis data streams data is stored inside ______ that make up _______
-Shards -Partitions
28
What are the Kinesis Data Streams Shard parts?
-Partition key: Indicates the shard's partition -Sequence: Indicate the sard location inside the partition -Data: Contains up to 1MB of data
29
What are the Kinesis Data Streams data producers?
-The AWS SDK -The AWS Kinesis Agent -The Kinesist Production Library (KPL)
30
True or False: A Kinesis Data Stream shard can have multiple producers and consumers
False, 1 producer, multiple consumers
31
What are the Kinesis Data Streams data consumers?
-AWS SDK -Kinesis Client Library -Lambda
32
How much data can you write to Kinesis Data Streams per second?
-1MB/s or 1000 messages/s per shard
33
How much data can you read from Kinesis Data Streams per second?
-2MB/s or 5 API calls per second per shard
34
True or False: When using Consumer Enhanced Fan-Out Kinesis Data Streams, You cannot read data using APIs since it's a push model
True
35
AWS Kinesis Firehose can send data to 3 possible destinations: AWS, 3rd party partner or Custom (Any HTTP endpoint). What are the possible AWS destinations for Kinesis Firehose Data?
-S3 -Redshift -OpenSearch
36
Whats the minimum latency fo Kinesis Firehose to write data to it's destination?
60s
37
True or False: Kinesis Firehose updates data in real time
False, near real time (60s delay at least)
38
What AWS Service can you use to peform custom treatments on AWS Kinesis Firehose Data? (Select one): -Glue -Lambda -EMR -Sagemaker
-Lambda
39
How does Kinesis Firehose Buffers work?
The buffer has 2 main parameters BufferSize (in MB) and BufferTime (in seconds). It only writes the data when one of those buffer values is reached
40
True or False: Kinesis Firehose stores data that passes though it for 7 days
False, it does not store data at all
41
What are the main use cases for Kinesis Data Analytics?
-Streaming ETL (only simple transformations) -Continuous Metric Generation -Responsive analytics for certain metrics
42
What languages does Kinesis Data Analytics accept?
Flink and SQL
43
True or False: You can use AWS Lambda do pre-process Kinesis Data Analytics data
True
44
What's the name of the AWS Service used to run Apache Kafka inside of AWS
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
45
Amazon MSK has both provisioned and serverless settings
True
46
What are the differences between Kinesis Data Streams and Amazon MSK regarding: -Message Size -Data organization -Structure resizing -Cryptography
-KDS has a max message size of 1MB, while MSK has a default size of 1MB that can be increased up to 10MB -KDS has data streams with shards, while MSK has Kafka Topics with Partitions -KDS accepts shard splitting and merging, while MSK can only add partitions to a topic - Both support KMS and TLS, but MSK also supports Plaintext in-flight encryption
47
What are the accepted data consumers for Amazon MSK?
-Lambda -Glue -Kinesis Analytics -Custom applications running on EC2, ECS, etc
48
What EMR use cases?
Big Data ML, Big Data Processing, etc
49
True or False: EMR means Elastic MapReduce, and it creates Fargate Hadoop clusters to analyze and process vast amounts of data
False, EMR runs on EC2 clusters
50
What are the EMR node types and what are their functions?
- Master Node: Manage the cluster, coordinate, manage health - Core Node: Run tasks and store data - Task Node (Optional): Just runs task, usually Spot instances
51
What are the EMR purchasing options?
-On-demand -Reserved (Min 1 year) -Spot Instances
52
What are the types of EMR Instance Groups?
-Uniform instance groups: All nodes have same instance type and configurations -Instance fleet: select target capacity, mix instance types and purchasing options
53
True or False: EMR has no auto-scaling for both EMR Instance Groups
False, Uniform Instance Groups have Auto-Scaling
54
True or False: AWS Glue is fully serverless
True
55
What is Amazon Quicksight
It's a BI tool offered by AWS to visualize data on multiple different sources
56
To control acess to dashboard, Quicksight uses _________ and ________
Users and Groups