Big Data Flashcards
What is Redshift?
A fully managed, petabyte scale data warehouse service in the cloud.
How much information can Redshift hold?
16 petabytes
Is Redshift relational?
Yes
What is typical use case for Redshift?
Business Intelligence
Is Redshift a better RDS?
No, Redshift is not meant to replace RDS’s
What is EMR?
A managed big data platform that allows you to process vast amounts of data (AWS”s ETL tool)
What is Kinesis?
Allows you to ingest process and analyze real time streaming data. (think of it as a huge data highway)
What is Kinesis data streams for?
the real time streaming for ingesting data
What is kinesis data firehose for?
data transfer tool to get information to S3, Redshift, elasticsearch, or spunk
What is the downside to Kinesis data stream?
A lot of work to set up (specify shards and data consumer)
What can kinesis data firehose be thought of as
a simpler data stream
What is Kinesis data analytics?
allows us to analyze data in the pipeline using standard data
When would you choose Kinesis over SQS for messages?
If messages need real time delivery
Does kinesis data stream or kinesis data firehose automatically scale?
data streams
What is AWS Athena?
An interactive query service that makes it easy to analyze data in S3 using SQL. This allows you to query from S3 without uploading it to database
What is AWS Glue?
A serverless service that allows you to perform ETL workloads without managing underlying servers
If you are ever needing serverless SQL, what should you think of?
Athena
What is Quicksight?
A fully managed business intelligence data visualization service
What is AWS data pipeline?
a managed ETL service for automating movement and transformation of your data
What is a pipeline definition in regards to AWS data pipeline?
where you specify the business logic of your data
How do you create dependencies between tasks and activities?
data driven workflows
What service can you use with AWS data pipeline to alert you of any failures?
AWS SNS
Does AWS data pipeline have automatic retries for data driven workflows?
Yes
What does Amazon MSK stand for?
Amazon managed streaming for apache kafka.