AWS Glue | Using AWS Glue vs. other AWS services Flashcards

1
Q

Do I have to use both AWS Glue Data Catalog and Glue ETL to use the service?

Using AWS Glue vs. other AWS services

AWS Glue | Analytics

A

No. While we do believe that using both the AWS Glue Data Catalog and ETL provides an end-to-end ETL experience, you can use either one of them independently without using the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When should I use AWS Glue vs. AWS Data Pipeline?

Using AWS Glue vs. other AWS services

AWS Glue | Analytics

A

AWS Glue provides a managed ETL service that runs on a serverless Apache Spark environment. This allows you to focus on your ETL job and not worry about configuring and managing the underlying compute resources. AWS Glue takes a data first approach and allows you to focus on the data properties and data manipulation to transform the data to a form where you can derive business insights. It provides an integrated data catalog that makes metadata available for ETL as well as querying via Amazon Athena and Amazon Redshift Spectrum.

AWS Data Pipeline provides a managed orchestration service that gives you greater flexibility in terms of the execution environment, access and control over the compute resources that run your code, as well as the code itself that does data processing. AWS Data Pipeline launches compute resources in your account allowing you direct access to the Amazon EC2 instances or Amazon EMR clusters.

Furthermore, AWS Glue ETL jobs are Scala or Python based. If your use case requires you to use an engine other than Apache Spark or if you want to run a heterogeneous set of jobs that run on a variety of engines like Hive, Pig, etc., then AWS Data Pipeline would be a better choice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should I use AWS Glue vs. Amazon EMR?

Using AWS Glue vs. other AWS services

AWS Glue | Analytics

A

AWS Glue works on top of the Apache Spark environment to provide a scale-out execution environment for your data transformation jobs. AWS Glue infers, evolves, and monitors your ETL jobs to greatly simplify the process of creating and maintaining jobs. Amazon EMR provides you with direct access to your Hadoop environment, affording you lower-level access and greater flexibility in using tools beyond Spark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When should I use AWS Glue vs AWS Database Migration Service?

Using AWS Glue vs. other AWS services

AWS Glue | Analytics

A

AWS Database Migration Service (DMS) helps you migrate databases to AWS easily and securely. For use cases which require a database migration from on-premises to AWS or database replication between on-premises sources and sources on AWS, we recommend you use AWS DMS. Once your data is in AWS, you can use AWS Glue to move and transform data from your data source into another database or data warehouse, such as Amazon Redshift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When should I use AWS Glue vs AWS Batch?

Using AWS Glue vs. other AWS services

AWS Glue | Analytics

A

AWS Batch enables you to easily and efficiently run any batch computing job on AWS regardless of the nature of the job. AWS Batch creates and manages the compute resources in your AWS account, giving you full control and visibility into the resources being used. AWS Glue is a fully-managed ETL service that provides a serverless Apache Spark environment to run your ETL jobs. For your ETL use cases, we recommend you explore using AWS Glue. For other batch oriented use cases, including some ETL use cases, AWS Batch might be a better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly