AWS Data Analytics Flashcards
In a single data dashboard, Amazon ___________ can include AWS data, third-party data, big data, spreadsheet data, SaaS data, B2B data, and more.
Quicksight
CloudWatch detailed monitoring sends data from your EC2 instance to CloudWatch in ______ intervals.
1-minute
____________ is an ETL service that captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.
Kinesis Data Firehose
When Kinesis Data Firehose is configured to send data to Redshift, behind the scenes it has to load the streaming data to _______ first and then issue a ______ command to move the data to Redshift.
S3… COPY…
Within Kinesis Data Analytics, using _________ __________ is a windowing method for analyzing time-based, overlapping groups of data that arrive at inconsistent times by aggregating the data.
stagger windows
What are the three windows you can use to process data in Kinesis Data Analytics?
- Stagger Windows
- Tumbling Windows
- Sliding Windows
___________ includes a built-in ML algorithm that can easily provide reliable forecasts for your data.
Amazon QuickSight
_______ is a fast, open-source, distributed SQL query engine designed for interactive analytic queries over large datasets from multiple sources (built by Facebook).
Presto
AWS Glue ETL scripts can be coded in _________ or _________ .
Python… Scala…
Amazon Redshift automatically integrates with ________ but not with an ________ (for encryption keys).
AWS KMS… HSM…
With Amazon Redshift, you can’t migrate to an _______-encrypted cluster by modifying the cluster. This is only possible if you want to enable _______ encryption.
HSM… KMS…
To load data from S3 to Redshift, you can use a __________ _________ that lists out the specific S3 paths you want to be copied over.
manifest file
Using the AWS Glue crawler for compressed files will cause the run time to ____________.
increase… It will take longer because the crawler has to download and decompress the file before reading it.
AWS Glue ___________ crawls only crawl folders that were added since the last crawler run, which can save significant time and cost.
incremental
To enable permissions between S3 and QuickSight, you would need to configure the permissions from the _________ console.
QuickSight
The _________ process re-sorts rows and reclaims space in either a specified table or all tables in the current database in Amazon Redshift.
VACUUM
If QuickSight connects to the data store by using a ________ ________, the data automatically refreshes when you open an associated dataset, analysis, or dashboard.
direct query
________ ______ is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
Amazon EMR
Can you use AWS Glue triggers to execute a job to run directly after a crawler completes?
No, but you can create an AWS Glue workflow with two triggers: one for the crawler and one for the job. This will achieve the same effect.
The capacity limits of an Amazon Kinesis data stream are defined by the ________ _____ ________ within the data stream.
number of shards
When creating an EMR cluster and you want to have the log files archived to Amazon S3, you must enable this feature __________ (while / after) launching the cluster.
while
Does Amazon SQS support real time streaming of data?
No.
What are the two Amazon EMR cluster types (regarding the time it takes for each to initialize) ?
(1) persistent / long-running
(2) transient
In Kinesis Data Streams, you can create up to _____ registered consumers per stream.
20