What are two common approaches for processing data?
Batch and stream processing
What is batch processing?
Processing data in large, discrete blocks typically on an interval or after meeting some threshold.
What are two characteristics of batch processing?
Latency and throughput, generally batch processing will introduce latency (while waiting for it to be collected) and is high throughput.
What are two pros of batch processing?
Efficiency and simplicity.
It can be resource efficient for systems will large volumes of data (batches can be better optimized) and is generally simpler to implement than stream processing.
What are the two major cons of batch processing?
Delay in insights and inflexibility.
Since batches can typically require some amount of data before processing, there’s usually a delay in results (making it less practical for real-time scenarios) and it typically isn’t flexible enough to handle immediate changes or changes based on the data.
What is stream processing?
Stream processing involves continually processing data as soon as it arrives.
What are two characteristics of stream processing?
Immediate processing and real-time suitability
What are the two pros of stream processing?
Real-time analysis and dynamic data handling.
Since data is processed in real-time it allows systems to immediately provide insights and actions. It’s also more adaptable to changing data and conditions.
What are the two cons of stream processing?
Complexity and resource-intensity.
Stream processing is generally more complex/complicated than batch processing and can require significantly more resources to process data as it arrives.
When might you use batch processing? What about stream processing? Can you provide some real-world examples?
Batch is preferred in scenarios where you have all of the data available, such as financial reporting (e.g. weekly, daily etc.)
Stream processing is preferred in scenarios where real-time insights are required. Situations like fraud detection, analytics, etc.