Exploring Data Transformation with Google Cloud Flashcards
(16 cards)
Structured data
Is highly organized and well-defined.
Is typically stored in a table.
Includes spreadsheets and databases.
Is easy to analyze.
Unstructured data
Doesn’t have a predefined data model.
Isn’t organized in a predefined manner.
Categories:
Text, like documents and presentations
Data files, like images, audio, and video
Semi-structured data
Is organized into a hierarchy.
Lacks full differentiation or order.
Includes examples like emails, HTML, JSON, XML.
Doesn’t have a formal structure.
Contains tags for easier analysis.
Examples: Firestore and Cloud Big Table
Relational database
Stores data points in tables, rows, and columns
that have a clearly defined schema.
Is highly consistent and reliable
Suited for large amounts of structured data.
Is designed for business data processing.
Is designed for storing online transactional data.
Non-relational database
Doesn’t use a tabular format.
Follows a flexible data model.
Ideal for data with changing organization.
Ideal for applications with diverse data types.
What are GCP relational database services?
Cloud SQL, Cloud Spanner, AlloyDB, Bare Metal Solution (Oracle)
What are GCP non-relational database services?
Firestore, Firebase Realtime DB, Bigtable, Datastore, Memorystore
What is GCP’s data warehouse?
BigQuery. Store structured, cleaned, processed data for fast analytics and reporting.
What are GCP’s data lakes?
Cloud Storage + Dataproc / Dataflow / BigQuery.
Stores all types of data — raw, unstructured, semi-structured, and structured
What are Cloud Storage’s 4 primary storage classes?
Standard storage: Hot data
Nearline storage: Once per month
Coldline storage: Once every 90 days
Archive storage: Once a year
What is GCP Autoclass?
Autoclass automatically transitions objects to
appropriate storage classes
Explain the difference between Cloud SQL and Cloud Spanner
Use Cloud SQL if you want a familiar database with managed ops for small to medium apps. Use Cloud Spanner if you need global scale, high availability, and performance at enterprise scale — and can justify the cost/complexity.
What is BigQuery?
Acts as a storage and analytics service. Data in BigQuery is encrypted at rest by default.
BigQuery seamlessly integrates with the partner ecosystem.
BigQuery works in a multicloud environment.
BigQuery has built-in machine learning features.
Explain the 2 managed database migration services
Database Migration
Service (DMS): Easily migrate your databases
to Google Cloud.
Datastream: It’s used to synchronize data
across databases, storage
systems, and applications.
What is Pub/Sub?
Ingests hundreds of millions
of events per second.
What is Dataflow?
Unifies streaming and batch data analysis
and builds cohesive data pipelines.