Data Engineering - Batch Processing for ML Flashcards

1
Q

Define Batch Processing

A

Processing usually performed to a specific schedule. Data is often waiting for the next batch to be processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name a service that is commonly used for batch processing in ML for ETL

A

Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What type of service is Glue?

A

An Extract, Transform Load (ETL) service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name the steps of AWS glue

A

Gleu crawler followed by data placed in Glue database and tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name some built-in data classifiers Glue offers?

A

Parquet, JSON, BSON, XML, CSV, PostgreSQL, MySQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What if the data is not in a format that Glue has built-in classifier for?

A

You can build a custom classifier using a GROK pattern, XML tag, JSON or CSV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe AWS Database migration service for data ingestion

A

Design to transfer data between databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which source database can be ingested by AWS Database migration tool?

A

RDS, EC2 instance and on premises

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is the AWS Database migration service so reliable?

A

It transfers by transactions so if any transfer fails it can roll back any records in transit. You can be confident all data has been transferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When can AWS Database migration service be used?

A
  • Once off migration
  • Configured to move data on schedule
  • Continous data replication where data is transferred from the siurce as soon as its made.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly