Designing data pipelines Flashcards

Chapter 3

1
Q

What is a graph in the concept of DAGs?

A

A graph is a set of nodes linked by edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

three common types of data pipelines are:

A
  1. Data warehousing pipelines
  2. Stream processing pipelines
  3. Machine learning pipelines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when using GCP Cloud Dataproc for transformations

A

with cloud dataproc, transformations canbe written in spark or hadoop supported language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

when using GCP Cloud Dataflow for transformations

A

when using cloud dataflow you write transformations using the Apache Beam model, which provides a unified batch and stream processing model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Apache Beam has explicit support for pipeline constructs including:

A
  1. Pipelines
  2. PCollection
  3. PTransform
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a streaming window and what are its types?

A

A window is a set of consecutive data points in a stream. Windows have a fixed width and a way of advancing. Windows that advance by a number of data points less than the width of the window are called sliding windows; windows that advance by the length of the window are tumbling windows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When are sliding windows used?

A

Sliding windows are used when you want to show how an aggregate - such as the average of the last three values - change over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When would you use tumbling windows?

A

Tumbling windows are used when you want to aggregate data over a fixed period of time, for example, for the last minute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

GCP has several services that are commonly used components of pipelines, including:

A
  1. Cloud pub/sub
  2. cloud dataflow
  3. cloud dataproc
  4. cloud composer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are messaging queues?

A

Messaging queues are used in distributed systems to decouple services in a pipeline. This allows one service to produce more output than the consuming service can process without adversely affecting the consuming service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly