Lecture 9 Flashcards

1
Q

Black box approach for big data analysis

A
  • Users issue analysis queries with real-time semantics
  • Streams of data updates, time-varying rates, generated in real-time
  • Streams of results data
  • Processing in near real-time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Distributed Stream Processing System?

A
  • Queries consists of operators
  • Operators form graphs
  • Operators process streams of tuples on-the-fly
  • Operators span nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we build a stream processing platform in the cloud?

A

Intra-query parallelism

Provisioning for workload peaks unnecessarily conservative

  • Dynamic scale out:

Increase resources when peaks appear

Failure resilience:

  • *Active** fault-tolerance needs 2x resources
  • *Passive** fault-tolerance leads to long recovery times
  • Hybrid fault-tolerance

Low resource overhead with fast recovery

“Both mechanisms must support stateful operators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which one is positive Stateless vs Stateful Operators?

A

Stateless

  • Failure of recovery
  • Scale out

Stateful

X Failure recovery

X Scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Diagrams for processing state, routing state, buffer state

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Checkpoint?

A

Takes snapshot of state and makes it externally available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Backup?

A

Moves copy of state from one operator to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Partition?

A

Splits state in semantically correct fashion for parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly