Chapter 5 - ANALYTICS: Amazon Athena, Amazon EMR, Amazon Kenesis, Amazon Redshift, AWS Glue, AWS Data Pipeline, Amazon Quick-Sight, AWS Lake Formation, ELASTICSEARCH Flashcards

1
Q

Which AWS service you will use for real time analytics of streaming data such as IoT telemetry data, application logs, and website clickstreams. ?

  1. Amazon Athena
  2. Amazon Kinesis
  3. Amazon Elasticsearch Service
  4. Amazon QuickSight
A
  1. Amazon Athena
  2. Amazon Kinesis
  3. Amazon Elasticsearch Service
  4. Amazon QuickSight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following are Kinesis services? Choose 4.

  1. Kinesis Video Streams
  2. Kinesis Data Streams
  3. Kinesis Data Firehose
  4. Kinesis QuickSight
  5. Kinesis Data Analytics
A
  1. Kinesis Video Streams
  2. Kinesis Data Streams
  3. Kinesis Data Firehose
  4. Kinesis QuickSight
  5. Kinesis Data Analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You want to collect log and event data from sources such as servers, desktops, and mobile devices and then have a custom application continuously process the data, generate metrics, power live dashboards, and emit aggregated data into stores such as Amazon S3. Which is the main AWS service you will use?

  1. Kinesis Data Streams
  2. Kinesis Data Firehose
  3. Kinesis Video Streams
  4. Kinesis Data Analytics
A
  1. Kinesis Data Streams
  2. Kinesis Data Firehose
  3. Kinesis Video Streams
  4. Kinesis Data Analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following are ideal use case for Kinesis Data Streams? Choose 3.

  1. Real time data analytics
  2. Long term data storage and analytics
  3. Log and data feed intake and processing
  4. Real time metrics and reporting
  5. ETL Batch jobs
A
  1. Real time data analytics
  2. Long term data storage and analytics
  3. Log and data feed intake and processing
  4. Real time metrics and reporting
  5. ETL Batch jobs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are features of AWS Redshift? Choose 3.

  1. Fully managed data warehouse service.
  2. Allows you to run complex analytic queries against petabytes of structured data using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution.
  3. Also includes Amazon Athena, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3 data lakes
  4. Also includes Amazon Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3 data lakes
  5. Fully managed data lake service.
A
  1. Fully managed data warehouse service.
  2. Allows you to run complex analytic queries against petabytes of structured data using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution.
  3. Also includes Amazon Athena, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3 data lakes
  4. Also includes Amazon Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3 data lakes
  5. Fully managed data lake service.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You are working as a solution architect for a financial services company which is planning to create a new data warehouse solution leveraging AWS Redshift. The raw data will be fist exported to S3 and EMR cluster and then copied into Redshift. The query results will be exported to another S3 data lake. How can you ensure that all data exchange (COPY, UNLOAD) between Redshift and other AWS resources should not traverse through internet and also to leverage the VPC security and monitoring features?

  1. Use AWS Glue to copy and upload data to Redshift cluster
  2. Use AWS Data pipeline to copy and upload data to Redshift cluster
  3. Enable enhanced VPC routing on your Redshift cluster
  4. Enable VPC flow logs on your Redshift cluster
A
  1. Use AWS Glue to copy and upload data to Redshift cluster
  2. Use AWS Data pipeline to copy and upload data to Redshift cluster
  3. Enable enhanced VPC routing on your Redshift cluster
  4. Enable VPC flow logs on your Redshift cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which AWS service you will use for business analytics dashboards and visualizations?

  1. Amazon Athena
  2. Amazon EMR
  3. Amazon Elasticsearch Service
  4. Amazon QuickSight
A
  1. Amazon Athena
  2. Amazon EMR
  3. Amazon Elasticsearch Service
  4. Amazon QuickSight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You are the solution architect for a national retail chain having stores in major cities. Each store use an on premise application for sales transaction. At the end of the day at 11 pm data from each store should be uploaded to Amazon storage which will be in excess of 30TB of data, the data then should be processed in Hadoop and results stored in data warehouse. What combination of AWS services you will use?

  1. Amazon Data Pipeline, Amazon S3, Amazon EMR, Amazon DynamoDB
  2. Amazon Data Pipeline, Amazon Elastic Block Storage, Amazon S3, Amazon EMR, Amazon Redshift
  3. Amazon Data Pipeline, Amazon S3, Amazon EMR, Amazon Redshift
  4. Amazon Data Pipeline, Amazon Kinesis, Amazon S3, Amazon EMR, Amazon Redshift, Amazon EC2
A
  1. Amazon Data Pipeline, Amazon S3, Amazon EMR, Amazon DynamoDB
  2. Amazon Data Pipeline, Amazon Elastic Block Storage, Amazon S3, Amazon EMR, Amazon Redshift
  3. Amazon Data Pipeline, Amazon S3, Amazon EMR, Amazon Redshift
  4. Amazon Data Pipeline, Amazon Kinesis, Amazon S3, Amazon EMR, Amazon Redshift, Amazon EC2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which AWS Analytics services gives you the ability to process nearly unlimited streams of data?

  1. Amazon Kinesis Streams
  2. Amazon Kinesis Firehose
  3. Amazon EMR
  4. Amazon Redshift
A
  1. Amazon Kinesis Streams
  2. Amazon Kinesis Firehose
  3. Amazon EMR
  4. Amazon Redshift
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following are scenarios where Amazon Quicksight cannot be used?

  1. Highly formatted canned Reports
  2. Quick interactive ad-hoc exploration and optimized visualization of data. Create and share dashboards and KPI’s to provide insight into your data
  3. Analyze and visualize data in various AWS resources, e.g., Amazon RDS databases, Amazon Redshift, Amazon Athena, and Amazon S3.
  4. Analyze and visualize data from on premise databases like SQL Server, Oracle, PostgreSQL, and MySQL
  5. Analyze and visualize data in data sources that can be connected to using JDBC/ODBC connection.
A
  1. Highly formatted canned Reports
  2. Quick interactive ad-hoc exploration and optimized visualization of data. Create and share dashboards and KPI’s to provide insight into your data
  3. Analyze and visualize data in various AWS resources, e.g., Amazon RDS databases, Amazon Redshift, Amazon Athena, and Amazon S3.
  4. Analyze and visualize data from on premise databases like SQL Server, Oracle, PostgreSQL, and MySQL
  5. Analyze and visualize data in data sources that can be connected to using JDBC/ODBC connection.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following AWS services you can leverage to analyze logs for customer facing applications and websites? Choose 2.

  1. Amazon S3
  2. Amazon Elasticsearch
  3. Amazon Athena
  4. Amazon Cloudwatch
A
  1. Amazon S3
  2. Amazon Elasticsearch
  3. Amazon Athena
  4. Amazon Cloudwatch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which AWS service you will use for data warehouse and analytics requirements?

  1. DynamoDB
  2. Aurora
  3. Redshift
  4. S3
A
  1. DynamoDB
  2. Aurora
  3. Redshift
  4. S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which AWS database service will you choose for Online Analytical Processing (OLAP)?

  1. Amazon RDS
  2. Amazon Redshift
  3. Amazon Glacier
  4. Amazon DynamoDB
A
  1. Amazon RDS
  2. Amazon Redshift
  3. Amazon Glacier
  4. Amazon DynamoDB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which AWS service reduces the complexity and upfront costs of setting up Hadoop by providing you with fully managed on-demand Hadoop framework?

  1. Amazon Redshift
  2. Amazon Kinesis
  3. Amazon EMR
  4. Amazon Hadoop
A
  1. Amazon Redshift
  2. Amazon Kinesis
  3. Amazon EMR
  4. Amazon Hadoop
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which of the following use cases is not well suited for Amazon EMR?

  1. Log processing and analytics
  2. Large extract, transform, and load (ETL) data movement
  3. Ad targeting and click stream analytics
  4. Genomics, Predictive analytics, Ad hoc data mining and analytics
  5. Small Data Set and ACID transaction requirements
  6. Risk modeling and threat analytics
A
  1. Log processing and analytics
  2. Large extract, transform, and load (ETL) data movement
  3. Ad targeting and click stream analytics
  4. Genomics, Predictive analytics, Ad hoc data mining and analytics
  5. Small Data Set and ACID transaction requirements
  6. Risk modeling and threat analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

You want to do click stream analysis of website to detect user behavior by analyzing the sequence of clicks a user makes, the amount of time the user spends, where they usually begin the navigation, and how it ends. By tracking this user behavior in real time, you want to update recommendations, perform advanced A/B testing, push notifications based on session length, and much more. Which AWS services you will use to ingest the captured clickstream data and analyze the sessions?

  1. Data ingestion: Kinesis Data Streams, Data sessionization Analytics : Kinesis Data Analytics
  2. Data ingestion: Kinesis Firehose , Data sessionization Analytics : Kinesis Data Analytics
  3. Data ingestion: Kinesis Data Streams, Data sessionization Analytics : AWS Glue
  4. Data ingestion: Kinesis Data Streams, Data sessionization Analytics : AWS EMR
A
  1. Data ingestion: Kinesis Data Streams, Data sessionization Analytics : Kinesis Data Analytics
  2. Data ingestion: Kinesis Firehose , Data sessionization Analytics : Kinesis Data Analytics
  3. Data ingestion: Kinesis Data Streams, Data sessionization Analytics : AWS Glue
  4. Data ingestion: Kinesis Data Streams, Data sessionization Analytics : AWS EMR
17
Q

Which kinesis service is integrated with Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service?

  1. Kinesis Data Streams
  2. Kinesis Data Firehose
  3. Kinesis Quicksight
  4. Kinesis Data Analytics
A
  1. Kinesis Data Streams
  2. Kinesis Data Firehose
  3. Kinesis Quicksight
  4. Kinesis Data Analytics
18
Q

Your company has recently migrated on-premise application to AWS and deploying them in VPCs. As part of the proactive monitoring and audit purpose they want to continuously analyze the Cloudtrail event logs to collect different operational metrics in real time. For example:

  • Total calls by IP, service, API call, IAM user
  • Amazon EC2 API failures (or any other service)
  • Anomalous behavior of Amazon EC2 API (or any other service)
  • Top 10 API calls across all services

Which AWS services you will use?

  1. S3, Kinesis Data Analytics, Lambda, DynamoDB
  2. EC2, S3, Kinesis Data Analytics, DynamoDB
  3. EC2, S3, Kinesis Data Analytics, Lambda, DynamoDB
  4. Kinesis Data Firehose, S3, Kinesis Data Analytics, Lambda, DynamoDB
A
  1. S3, Kinesis Data Analytics, Lambda, DynamoDB
  2. EC2, S3, Kinesis Data Analytics, DynamoDB
  3. EC2, S3, Kinesis Data Analytics, Lambda, DynamoDB
  4. Kinesis Data Firehose, S3, Kinesis Data Analytics, Lambda, DynamoDB
19
Q

Which of the following are features of EMR HDFS File System? Choose 4.

  1. It is a distributed, scalable, and portable file system for Hadoop. HDFS is an implementation of the Hadoop FileSystem API, which models POSIX file system behavior.
  2. It allows clusters to store data in Amazon S3.
  3. Instance store and/or EBS volume storage is used for HDFS data.
  4. Amazon EBS volumes attached to EMR clusters are ephemeral: the volumes are deleted upon cluster and instance termination.
  5. HDFS is common choice for persistent clusters.
  6. HDFS is common choice for transient clusters.
A
  1. It is a distributed, scalable, and portable file system for Hadoop. HDFS is an implementation of the Hadoop FileSystem API, which models POSIX file system behavior.
  2. It allows clusters to store data in Amazon S3.
  3. Instance store and/or EBS volume storage is used for HDFS data.
  4. Amazon EBS volumes attached to EMR clusters are ephemeral: the volumes are deleted upon cluster and instance termination.
  5. HDFS is common choice for persistent clusters.
  6. HDFS is common choice for transient clusters.
20
Q

Scenario for Q20-Q21. ABC Tolls, operates toll highways throughout the country. Customers that register with ABC Tolls receive a transceiver for their automobile. When the customer drives through the tolling area, a sensor receives information from the transceiver and records details of the transaction to a relational database. Their current solution stores records in a file system as part of their batch process. ABC Tolls has a traditional batch architecture. Each day, a scheduled extract-transform-load (ETL) process is executed that processes the daily transactions and transforms them so they can be loaded into their Amazon Redshift data warehouse. Then next day, the ABC Tolls business analysts review the data using a reporting tool. In addition, once a month (at the end of the billing cycle) another process aggregates all the transactions for each of the ABC Tolls customers to calculate their monthly payment. ABC Tolls would like to make some modifications to its system. Q20-Q21 are each specific to one requirement.

The first requirement comes from its business analyst team. They have asked for the ability to run reports from their data warehouse with data that is no older than 30 minutes. The ABC Tolls engineering team determines that their current architecture needs some modifications to support these requirements. They have decided to build a streaming data ingestion and analytics system to support this requirement. Which of the following statement is correct to meet this requirement? Choose 2.

  1. Create a Kinesis Firehose delivery stream and configure it so that it would copy data to their Amazon Redshift table every 15 minutes.
  2. Use the Amazon Kinesis Agent on servers to forward their data to Kinesis Firehose.
  3. Create a Kinesis data stream and configure it so that it would copy data to their Amazon Redshift table every 15 minutes.
  4. Use the Amazon Kinesis Agent on servers to forward their data to Kinesis data stream.
A
  1. Create a Kinesis Firehose delivery stream and configure it so that it would copy data to their Amazon Redshift table every 15 minutes.
  2. Use the Amazon Kinesis Agent on servers to forward their data to Kinesis Firehose.
  3. Create a Kinesis data stream and configure it so that it would copy data to their Amazon Redshift table every 15 minutes.
  4. Use the Amazon Kinesis Agent on servers to forward their data to Kinesis data stream.
21
Q

Scenario for Q20-Q21. ABC Tolls, operates toll highways throughout the country. Customers that register with ABC Tolls receive a transceiver for their automobile. When the customer drives through the tolling area, a sensor receives information from the transceiver and records details of the transaction to a relational database. Their current solution stores records in a file system as part of their batch process. ABC Tolls has a traditional batch architecture. Each day, a scheduled extract-transform-load (ETL) process is executed that processes the daily transactions and transforms them so they can be loaded into their Amazon Redshift data warehouse. Then next day, the ABC Tolls business analysts review the data using a reporting tool. In addition, once a month (at the end of the billing cycle) another process aggregates all the transactions for each of the ABC Tolls customers to calculate their monthly payment. ABC Tolls would like to make some modifications to its system. Q20-Q21 are each specific to one requirement.

ABC Tolls is also developing a new mobile application for its customers. While developing the application, they decided to create some new features. One feature will give customers the ability to set a spending threshold for their account. If a customer’s cumulative toll bill surpasses this threshold, ABC Tolls wants to send an in-application message to the customer to notify them that the threshold has been breached within 10 minutes of the breach occurring. Which AWS services they will use to achieve this feature such that solution is scalable and cost effective?

  1. Kinesis Analytics, Kinesis Streams, EC2, SNS, DynamoDB
  2. Kinesis Analytics, Kinesis Streams, Lambda, SQS, DynamoDB
  3. Kinesis Analytics, Kinesis Streams, EC2, SQS, DynamoDB
  4. Kinesis Analytics, Kinesis Streams, Lambda, SNS, DynamoDB
A
  1. Kinesis Analytics, Kinesis Streams, EC2, SNS, DynamoDB
  2. Kinesis Analytics, Kinesis Streams, Lambda, SQS, DynamoDB
  3. Kinesis Analytics, Kinesis Streams, EC2, SQS, DynamoDB
  4. Kinesis Analytics, Kinesis Streams, Lambda, SNS, DynamoDB
22
Q

You want to leverage Amazon Web Services to build an end-to-end log analytics solution that collects, ingests, processes, and loads both batch data and streaming data, and makes the processed data available to your users in analytics systems they are already using and in near real-time. The solution should be highly reliable, cost-effective, scales automatically to varying data volumes, and requires almost no IT administration. This solution should be extensible to use cases like analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications such as digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. Which services you will use to build this log analytics solution?

  1. Kinesis Firehose, Kinesis Analytics, S3, Elasticsearch, Kibana
  2. Kinesis Firehose, Kinesis Analytics, RDS Aurora, Elasticsearch, Kibana
  3. Kinesis Firehose, Kinesis Analytics, DynamoDB, Elasticsearch, Kibana
  4. EC2, Lambda, S3, Elasticsearch, Kibana
A
  1. Kinesis Firehose, Kinesis Analytics, S3, Elasticsearch, Kibana
  2. Kinesis Firehose, Kinesis Analytics, RDS Aurora, Elasticsearch, Kibana
  3. Kinesis Firehose, Kinesis Analytics, DynamoDB, Elasticsearch, Kibana
  4. EC2, Lambda, S3, Elasticsearch, Kibana
23
Q

You want to leverage AWS native components to do Clickstream analytics by collecting, analyzing, and reporting aggregate data about which webpages someone visits and in what order in your website. The clickstream analytics solution should provide these capabilities: Streaming data ingestion, which can process millions of website clicks (clickstream data) a day from global websites. Near real-time visualizations and recommendations, with web usage metrics that include events per hour, visitor count, web/HTTP user agents (e.g., a web browser), abnormal events, aggregate event count, referrers, and recent events. You want to build a recommendation engine on a data warehouse. Analysis and visualizations of your clickstream data both real time and analytical. Which AWS native services you will use to build this solution?

  1. Amazon IoT core, Amazon Elasticsearch Amazon S3, Amazon RDS, Amazon Redshift, Amazon Quicksight, Amazon Athena
  2. Amazon Kinesis Data Firehose, Amazon Elasticsearch Amazon S3, Amazon Redshift, Amazon Quicksight, Amazon Athena
  3. Amazon IoT core, Amazon Elasticsearch Amazon S3, Amazon DynamoDB, Amazon Redshift, Amazon Quicksight, Amazon Athena
  4. Amazon EC2, Amazon Elasticsearch Amazon S3, Amazon Redshift, Amazon Quicksight, Amazon Athena
A
  1. Amazon IoT core, Amazon Elasticsearch Amazon S3, Amazon RDS, Amazon Redshift, Amazon Quicksight, Amazon Athena
  2. Amazon Kinesis Data Firehose, Amazon Elasticsearch Amazon S3, Amazon Redshift, Amazon Quicksight, Amazon Athena
  3. Amazon IoT core, Amazon Elasticsearch Amazon S3, Amazon DynamoDB, Amazon Redshift, Amazon Quicksight, Amazon Athena
  4. Amazon EC2, Amazon Elasticsearch Amazon S3, Amazon Redshift, Amazon Quicksight, Amazon Athena
24
Q

Which of the following are features for EMR EMRFS File System? Choose 3.

  1. EMRFS is common choice for persistent clusters.
  2. EMRFS is common choice for transient clusters.
  3. Is an implementation of the Hadoop file system used for reading and writing regular files from Amazon EMR directly to Amazon S3.
  4. Provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
A
  1. EMRFS is common choice for persistent clusters.
  2. EMRFS is common choice for transient clusters.
  3. Is an implementation of the Hadoop file system used for reading and writing regular files from Amazon EMR directly to Amazon S3.
  4. Provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
25
Q

Which AWS services is fully managed ETL (extract, transform, and load) service that is serverless, makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores?

  1. AWS EMR
  2. AWS Datapipeline
  3. AWS Glue
  4. AWS Data Migration Service
A
  1. AWS EMR
  2. AWS Datapipeline
  3. AWS Glue
  4. AWS Data Migration Service
26
Q

Which of the following is not a use case for using AWS Glue?

  1. To build a data warehouse to organize, cleanse, validate, and format data.
  2. Run serverless queries against your Amazon S3 data lake.
  3. Create event-driven ETL pipelines with AWS Glue.
  4. Create business intelligence dashboard on top of data warehouse.
A
  1. To build a data warehouse to organize, cleanse, validate, and format data.
  2. Run serverless queries against your Amazon S3 data lake.
  3. Create event-driven ETL pipelines with AWS Glue.
  4. Create business intelligence dashboard on top of data warehouse.
27
Q

You are using Kinesis data stream and kinesis data analytics for ingestion and real time clickstream analytics of new launched website. The solution was working fine when there were limited users, but as the popularity of website increased you are observing performance issues and exceptions in the logs. What you should do to improve the performance?

  1. Increase the number of shards in the kinesis stream.
  2. Decrease the number of shards in the kinesis stream.
  3. Replace the kinesis data stream by kinesis data firehose.
  4. Replace the kinesis data stream by lambda.
A
  1. Increase the number of shards in the kinesis stream.
  2. Decrease the number of shards in the kinesis stream.
  3. Replace the kinesis data stream by kinesis data firehose.
  4. Replace the kinesis data stream by lambda.
28
Q

You are building an ETL solution for daily sales report analysis. All the regional headquarter in the country upload their sales data between 7pm-11pm to a S3 bucket. Upon upload each file should be transformed and loaded into a data warehouse. What services you will use to design this solution in a most cost effective way? Choose 2.

  1. Configure S3 event notification to trigger a lambda function which will kick start ETL job whenever a file is uploaded.
  2. Use AWS Glue for ETL and Redshift for Data warehouse
  3. Use AWS Data Pipeline for ETL and Redshift for Data warehouse
  4. Use AWS Glue for ETL and Amazon EMR for Data warehouse
A
  1. Configure S3 event notification to trigger a lambda function which will kick start ETL job whenever a file is uploaded.
  2. Use AWS Glue for ETL and Redshift for Data warehouse
  3. Use AWS Data Pipeline for ETL and Redshift for Data warehouse
  4. Use AWS Glue for ETL and Amazon EMR for Data warehouse
29
Q

You are launching an Amazon Redshift a cluster in a virtual private cloud (VPC). What information you need to provide apart from VPC id?

  1. Redshift Subnet Group
  2. DW Subnet Group
  3. Cluster Subnet Group
  4. DB Subnet Group
A
  1. Redshift Subnet Group
  2. DW Subnet Group
  3. Cluster Subnet Group
  4. DB Subnet Group
30
Q

What are differences between AWS Glue vs. AWS Data Pipeline? Choose 3.

  1. AWS Glue provides a managed ETL service that runs on a serverless Apache Spark environment. AWS Data Pipeline provides a managed orchestration service that gives you greater flexibility in terms of the execution environment, access and control over the compute resources that run your code, as well as the code itself that does data processing.
  2. AWS Data Pipeline, you don’t have to worry about configuring and managing the underlying compute resources. AWS Glue launches compute resources in your account allowing you direct access to the Amazon EC2 instances or Amazon EMR clusters.
  3. AWS Glue, you don’t have to worry about configuring and managing the underlying compute resources. AWS Data Pipeline launches compute resources in your account allowing you direct access to the Amazon EC2 instances or Amazon EMR clusters.
  4. AWS Glue ETL jobs are Scala or Python based. AWS Data Pipeline, you can setup heterogeneous set of jobs that run on a variety of engines like Hive, Pig, etc.
  5. AWS Data Pipeline provides a managed ETL service that runs on a serverless Apache Spark environment. AWS Glue provides a managed orchestration service that gives you greater flexibility in terms of the execution environment, access and control over the compute resources that run your code, as well as the code itself that does data processing.
A
  1. AWS Glue provides a managed ETL service that runs on a serverless Apache Spark environment. AWS Data Pipeline provides a managed orchestration service that gives you greater flexibility in terms of the execution environment, access and control over the compute resources that run your code, as well as the code itself that does data processing.
  2. AWS Data Pipeline, you don’t have to worry about configuring and managing the underlying compute resources. AWS Glue launches compute resources in your account allowing you direct access to the Amazon EC2 instances or Amazon EMR clusters.
  3. AWS Glue, you don’t have to worry about configuring and managing the underlying compute resources. AWS Data Pipeline launches compute resources in your account allowing you direct access to the Amazon EC2 instances or Amazon EMR clusters.
  4. AWS Glue ETL jobs are Scala or Python based. AWS Data Pipeline, you can setup heterogeneous set of jobs that run on a variety of engines like Hive, Pig, etc.
  5. AWS Data Pipeline provides a managed ETL service that runs on a serverless Apache Spark environment. AWS Glue provides a managed orchestration service that gives you greater flexibility in terms of the execution environment, access and control over the compute resources that run your code, as well as the code itself that does data processing.
31
Q

Which of the following are components of AWS Data Pipeline? Choose 3.

  1. Pipeline definition
  2. Pipeline
  3. Data Catalog
  4. Task Runner
A
  1. Pipeline definition
  2. Pipeline
  3. Data Catalog
  4. Task Runner
32
Q

Which of the following are not components of AWS Glue? Choose 2.

  1. AWS Glue Data Node
  2. AWS Glue Console
  3. AWS Glue Data Catalog
  4. AWS Glue Job Scheduler
  5. AWS Glue Crawlers and Classifiers
  6. AWS Glue ETL Operations
  7. AWS Glue Jobs System
A
  1. AWS Glue Data Node
  2. AWS Glue Console
  3. AWS Glue Data Catalog
  4. AWS Glue Job Scheduler
  5. AWS Glue Crawlers and Classifiers
  6. AWS Glue ETL Operations
  7. AWS Glue Jobs System
33
Q

Which of the following are features of Amazon Athena? Choose 3.

  1. Is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
  2. Query jobs are executed on a clusters of EC2 instances.
  3. Is serverless, so there is no infrastructure to setup or manage, and you can start analyzing data immediately.
  4. Uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Apache Parquet and Avro.
  5. Is an interactive query service that makes it easy to analyze data in Amazon RDS using standard SQL
A
  1. Is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
  2. Query jobs are executed on a clusters of EC2 instances.
  3. Is serverless, so there is no infrastructure to setup or manage, and you can start analyzing data immediately.
  4. Uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Apache Parquet and Avro.
  5. Is an interactive query service that makes it easy to analyze data in Amazon RDS using standard SQL
34
Q

Which of the following data sources is not supported by Amazon QuickSight?

  1. Amazon RDS, Amazon Aurora, Amazon Redshift
  2. Amazon Athena and Amazon S3
  3. Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
  4. EBS and EFS
  5. Connect to on-premises databases like SQL Server, MySQL and PostgreSQL
  6. Import data from SaaS applications like Salesforce
A
  1. Amazon RDS, Amazon Aurora, Amazon Redshift
  2. Amazon Athena and Amazon S3
  3. Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
  4. EBS and EFS
  5. Connect to on-premises databases like SQL Server, MySQL and PostgreSQL
  6. Import data from SaaS applications like Salesforce