Exam Preparation Flashcards
(133 cards)
What are the four stages of data lifecycle?
ingest, storage, process and Analyze, and explore and Visualise.
What is streaming data?
Streaming data is a set of data that is sent in small messages that are transmitted continuously from the data source. Streaming data may be telemetry data, which is data generated at regular intervals, and event data, which is data generated in response to a particular event. Stream ingestion services need to deal with potentially late and missing data. Streaming data is often ingested using Cloud Pub/Sub.
What is bulk data?
Batch data is ingested in bulk, typically in files. Examples of batch data ingestion include uploading files of data exported from one application to be processed by another. Both batch and streaming data can be transformed and processed using Cloud Dataflow.
What are the technical considerations to consider when choosing a data store?
These factors include the volume and velocity of data, the type of structure of the data, access control require- ments, and data access patterns.
Know the three levels of structure of data.
Unstructured, semi-structured and structured.
What products store structured data in GCP?
CloudSQL and CloudSpanner for transactional
BigQuery for analytical
What products store semi-structured data in GCP?
If data access requires full index- ing, Cloud Datastore, else BigTable.
What products store unstructured data in GCP?
Cloud Storage
What are the four types of NoSQL databases?
Four types of NoSQL databases are key-value, document, wide-column, and graph databases
What are some concerns about streaming data?
Stream ingestion services need to deal with potentially late and missing data
What tool can transform batch and streaming data?
Both batch and streaming data can be transformed and processed using Cloud Dataflow.`
What SQL does CloudSQL support?
Cloud SQL supports MySQL, PostgreSQL, and SQL Server (beta).
How is CloudSQL initially setup for availability?
Cloud SQL instances are created in a single zone by default, but they can be created for high availability and use instances in multiple zones
How can you improve reads in CloudSQL?
Use read replicas.
What is CloudSpanner?
Cloud Spanner is a horizontally scalable relational database that automatically replicates data
What are the three types of replicas in CloudSpanner?
Three types of replicas are read-write replicas, read-only replicas, and witness replicas.
How can you avoid hot-spotting in CloudSpanner?
Avoid hotspots by not using consecutive values for primary keys.
What kind of configuration does CloudSpanner have?
Cloud Spanner is configured as regional or multi-regional instances
What is BigTable?
Cloud Bigtable is a wide-column NoSQL database used for high-volume databases that require sub-10 ms latency (fast write).
What use-cases are their for BigTable?
Cloud Bigtable is used for IoT, time-series, finance, and similar applications.
How do you make BigTable highly available?
For multi-regional high availability, you can create a replicated cluster in another region. All data is replicated between clusters.
How is data stored in Bigtable?
Data is stored in Bigtable lexicographically by row-key, which is the one indexed column in a Bigtable table.
How do you improve reads in BigTable?
Keeping related data in adjacent rows can help make reads more efficient.
What is Cloud Firestore?
Cloud Firestore is a document database that is replacing Cloud Datastore as the managed document database.