Exploring Data Transformation with Google Cloud Flashcards

(16 cards)

1
Q

Structured data

A

Is highly organized and well-defined.
Is typically stored in a table.
Includes spreadsheets and databases.
Is easy to analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unstructured data

A

Doesn’t have a predefined data model.
Isn’t organized in a predefined manner.
Categories:
Text, like documents and presentations
Data files, like images, audio, and video

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Semi-structured data

A

Is organized into a hierarchy.
Lacks full differentiation or order.
Includes examples like emails, HTML, JSON, XML.
Doesn’t have a formal structure.
Contains tags for easier analysis.
Examples: Firestore and Cloud Big Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relational database

A

Stores data points in tables, rows, and columns
that have a clearly defined schema.
Is highly consistent and reliable
Suited for large amounts of structured data.
Is designed for business data processing.
Is designed for storing online transactional data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-relational database

A

Doesn’t use a tabular format.
Follows a flexible data model.
Ideal for data with changing organization.
Ideal for applications with diverse data types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are GCP relational database services?

A

Cloud SQL, Cloud Spanner, AlloyDB, Bare Metal Solution (Oracle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are GCP non-relational database services?

A

Firestore, Firebase Realtime DB, Bigtable, Datastore, Memorystore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is GCP’s data warehouse?

A

BigQuery. Store structured, cleaned, processed data for fast analytics and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are GCP’s data lakes?

A

Cloud Storage + Dataproc / Dataflow / BigQuery.

Stores all types of data — raw, unstructured, semi-structured, and structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Cloud Storage’s 4 primary storage classes?

A

Standard storage: Hot data
Nearline storage: Once per month
Coldline storage: Once every 90 days
Archive storage: Once a year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is GCP Autoclass?

A

Autoclass automatically transitions objects to
appropriate storage classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the difference between Cloud SQL and Cloud Spanner

A

Use Cloud SQL if you want a familiar database with managed ops for small to medium apps. Use Cloud Spanner if you need global scale, high availability, and performance at enterprise scale — and can justify the cost/complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is BigQuery?

A

Acts as a storage and analytics service. Data in BigQuery is encrypted at rest by default.
BigQuery seamlessly integrates with the partner ecosystem.
BigQuery works in a multicloud environment.
BigQuery has built-in machine learning features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the 2 managed database migration services

A

Database Migration
Service (DMS): Easily migrate your databases
to Google Cloud.

Datastream: It’s used to synchronize data
across databases, storage
systems, and applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Pub/Sub?

A

Ingests hundreds of millions
of events per second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Dataflow?

A

Unifies streaming and batch data analysis
and builds cohesive data pipelines.