AWS Glue | Extract, transform, and load (ETL) Flashcards

1
Q

If I am already using Amazon Athena or Amazon Redshift Spectrum and have tables in Amazon Athena’s internal data catalog, how can I start using the AWS Glue Data Catalog as my common metadata repository?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

Before you can start using AWS Glue Data Catalog as a common metadata repository between Amazon Athena, Amazon Redshift Spectrum, and AWS Glue, you must upgrade your Amazon Athena data catalog to AWS Glue Data Catalog. The steps required for the upgrade are detailed here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What programming language can I use to write my ETL code for AWS Glue?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

You can use either Scala or Python.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can I customize the ETL code generated by AWS Glue?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

AWS Glue’s ETL script recommendation system generates Scala or Python code. It leverages Glue’s custom ETL library to simplify access to data sources as well as manage job execution. You can find more details about the library in our documentation. You can write ETL code using AWS Glue’s custom library or write arbitrary code in Scala or Python by using inline editing via the AWS Glue Console script editor, downloading the auto-generated code, and editing it in your own IDE. You can also start with one of the many samples hosted in our Github repository and customize that code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can I import custom libraries as part of my ETL script?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

Yes. You can import custom Python libraries and Jar files into your AWS Glue ETL job. For more details, please check our documentation here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can I bring my own code?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

Yes. You can write your own code using AWS Glue’s ETL library, or write your own Scala or Python code and upload it to a Glue ETL job. For more details, please check our documentation here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can I develop my ETL code using my own IDE?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

You can create and connect to development endpoints that offer ways to connect your notebooks and IDEs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can I build end-to-end ETL workflow using multiple jobs in AWS Glue?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

In addition to the ETL library and code generation, AWS Glue provides a robust set of orchestration features that allow you to manage dependencies between multiple jobs to build end-to-end ETL workflows. AWS Glue ETL jobs can either be triggered on a schedule or on a job completion event. Multiple jobs can be triggered in parallel or sequentially by triggering them on a job completion event. You can also trigger one or more Glue jobs from an external source such as an AWS Lambda function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does AWS Glue monitor dependencies?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

AWS Glue manages dependencies between two or more jobs or dependencies on external events using triggers. Triggers can watch one or more jobs as well as invoke one or more jobs. You can either have a scheduled trigger that invokes jobs periodically, an on-demand trigger, or a job completion trigger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does AWS Glue handle errors?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

AWS Glue monitors job event metrics and errors, and pushes all notifications to Amazon CloudWatch. With Amazon CloudWatch, you can configure a host of actions that can be triggered based on specific notifications from AWS Glue. For example, if you get an error or a success notification from Glue, you can trigger an AWS Lambda function. Glue also provides default retry behavior that will retry all failures three times before sending out an error notification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can I run my existing ETL jobs with AWS Glue?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

Yes. You can run your existing Scala or Python code on AWS Glue. Simply upload the code to Amazon S3 and create one or more jobs that use that code. You can reuse the same code across multiple jobs by pointing them to the same code location on Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can I use AWS Glue to ETL streaming data?

Extract, transform, and load (ETL)

AWS Glue | Analytics

A

AWS Glue ETL is batch oriented, and you can schedule your ETL jobs at a minimum of 5 min intervals. While it can process micro-batches, it does not handle streaming data. If your use case requires you to ETL data while you stream it in, you can perform the first leg of your ETL using Amazon Kinesis, Amazon Kinesis Firehose or Amazon Kinesis Analytics, and then store data to either Amazon S3 or Amazon Redshift and trigger a Glue ETL job to pick up that dataset and continue applying additional transformations to that data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly