Stream processing Flashcards
some examples of streaming data (3)
- log files generated by customers using a mobile application
- social network activity
- e-commerce purchases
what is stream processing
a processing mode where individual records or a small set of records are processed continuously, producing a simple response
can streaming data be processed by batch processing?
yes
what is bounded data?
datasets that are finite in size
what is unbounded data?
datasets that are (at least theoretically) infinite in size and new data can arrive and be made available at any point of time
What are streaming systems designed with in mind?
Unbounded data
What is a data surge?
a sudden and significant increase in the volume of data flowing through a streaming data processing system.
For real-time systems, why is failing to produce a processing result within a time window as bad as not producing
a result at all?
The events may become “insignificant” and the insights or trends produced may no longer be valid or accurate
Examples of streaming data (4)
- Messages from social platforms (e.g. Twitter)
- Internet traffic going through a network device such as a switch
- Readings from an IoT device
- Interactions of users with a web application
Frameworks for the ingestion of unbounded data (7)
- Apache Kafka
- Apache Flume
- Amazon Kinesis Firehose
- AWS IoT Events
- Azure Event Hub
- IoT Hub
- Google Pub/Sub
What are streams
sequences of immutable records that arrive at some point in time
Other phrases for streams (3)
- event streams
- event logs
- message queues
What type of dataset are streams?
datasets in motion
What type of dataset are tables?
datasets at rest
what are the components of processing elements (PE)? (3)
- input queue
- computing element
- output queue
why are input queues needed? (3)
- to maintain the order of the incoming events/data
- to signal systems to slow down when the rate at which data is arriving might exceed the processing capacity of the stream processing system
- to decouple the data source from the processing components allowing for more flexibility in management
What is a spout in Apache Storm?
Elements that generate streams from external sources
What is a stream in Apache Storm?
Unbounded stream of tuples
What is a bolt in Apache Storm?
A processing element that consumes and generates streams
What is a topology in Apache Storm?
a flow of spouts, streams and bolts
what does the order of tuples in a stream represent?
the time at which they arrive at the streaming system
what does the term at-least-once processing mean?
a guarantee that each message or event in a system will be processed at least once, but potentially more than once - ensuring that no data is lost in the event of failure
what is event time?
the time at which one event has been generated by a source
what is processing time?
the time at which events are seen by the stream processing system