Selecting appropriate storage technologies Flashcards
Chapter 1
Four stages of data lifecycle
- Ingest
- Store
- Process and analyze
- Explore and visualize
Define ingestion stage
Acquiring data and bringing data into GCP
Define storage stage
persisting data into a storage system from which it can be accessed for later stages of hte data lifecycle
process and analyze
transforming data into a usable format for analysis applications
Explore and visualize
insights are derived from analysis and presented in tables, charts and other visualizations for use by others.
Three broad ingestion modes
- Application data
- Streaming data
- Batch data
How is application data generated?
Generated by applications including mobile apps, pushes and backend services
What does application data include?
- user generated (ej. name, address),
- data generated by the app (ej. logs),
- event data (ej. clickstream)
Examples of services that can ingest application data
- Compute Engine
- Kubernetes Engine
- App Engine
Examples of locations that Application data can be written to:
- Stackdrive Logging
- Managed databases such as: Cloud SQL or Cloud Datastore
What are examples of types of streaming data?
- sensor data,
- event data
What is the event time?
It is the often included streaming data timestamp that indicates the time that the data was generated
What is the process time?
When streaming data, some applications will also track the time that data arrives at the beginning of the ingestion pipeline. This is the process time.
Why may time-series data require some additional processing early in the ingestion process?
If a stream of data needs to be in time order for processing, then late arriving data will need to be inserted in the correct position in the stream. This can require buffering of data for a short period of time in case the data arrives out of order.
What is google cloud pub/sub?
is a fully-managed, scalable, global and secure messaging service that allows you to send and receive messages among applications and services
How Cloud Pub/Sub ingestion aid streaming data?
Streaming data is well suited for Cloud Pub/Sub ingestion because it can buffer data while applications process the data.
What happens during streaming data spikes when application instances cannot keep up with the rate at which data is arriving?
When this happenst he data can be preserved in a cloud Pub/sub topic and processed later after applciation instances have a chance to cath up.
How is Cloud Pub/Sub set up in a way that is accessible and scalable?
Cloud Pub/sub has global endpoints and uses GCP’s global frontend load balancer to support ingestion. The messaging service scales automatically to meet the demands of the current workload.