AWS Data Pipeline Flashcards

1
Q

Is Datapipeline a serverless product?

A

Yes. data pipe requires no infrastructire to be deployed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For Datapipeline, What are data nodes?

A

Data nodes are the names and locations and formats of your data, like S3, DdesynamoDB, Redshift. You can think of them as the reader module that reads in the data to the pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For Data pipeline, what inputs can I use?

A

Redshift
RDS
DynamoDB
S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For Datapipeline, What are the schedules?

A

It is part of a pipeline definition and defines when the definition is to be executed. This can be bassed on pipeline activation or based time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For Datapipeline, What are the actions?

A

This would be is the data available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When you run your Data pipeline is it realtime?

A

No, things are scheduled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For Datapipeline, What are the resources

A

These are EC2 instances and EMR clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For Datapipeline, What are the resources

A

These are EC2 instances and EMR clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Datapipeline, What are actions?

A

these are ways you can receive updated on the For Datapipeline, status.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For Datapipeline, What are the actions?

A

these are ways you can receive updated on the For Datapipeline, status.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For Datapipeline, can i process data in real time?

A

No, Datapipeline is about processing data but not in a real-time fashion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What cna I do with Datapipelne?

A
  • Export data from DynamoDB to s3
  • Import data from s3 to DynamoDB
  • RDS MySQL to s3
  • RDS incramental copy to s3
  • RDS s3 to RDS Table
  • RDS Table to Redshift
  • RDS incremetal to Redshift
  • Load data from Redshift to RDS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

I what to stream live data from RDS to Redshift, how cna I use Datapipeline?

A

You can not do it with Datapipeline, Datapipeline is only used on a schedule and not for streaming live.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

I need to batch shift and transform data from RDS to S3, how cna I do this?

A

You can use Datapipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is pre-condition?

A

You check for the existence of your data,

  • S3 files or directories exist
  • DynamoDB table exists
  • RDS queries
  • Redshift queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

I wnat to get an SNS message when a Data pipeline activity fails, how cna I achieve this?

A

You cna set SNS to send messages from DP activities fail.

17
Q

What is a Datapipeline activity?

A

It is a something thet works on your data,

  • EMR Job
  • Pig
  • Hive
  • Hadoop
  • Shell command
  • SQL
18
Q

What is the schedule?

A

It is when the Datapipeline should run, it is part of the Datapipeline definition, and you cna define parameters like,
- Start time
- Interval
- End
Example: Start today at noon and run for 1hr.

19
Q

How do I set up a cluster for datapipeline to run?

A

You do not set up a cluster, Datapipeline is a serverless product.

20
Q

What is AWS datapipeline?

A

automate and schedule regular data movement and data processing activities in AWS

21
Q

Where can the data sources be located?

A
  • on-prem

- AWS

22
Q

Can I set up a schedule for the datapipeline to run?

A

Yes as part of the datapipeline definition you can set up a schedule to run, this could be like every hour or a data time, but there are lots of options.

23
Q

When would I use datapipeline vs AWS Glue?

A

TBD