Selecting appropriate storage technologies Flashcards

Question 1

Q

Four stages of data lifecycle

Answer

A

Ingest
Store
Process and analyze
Explore and visualize

Question 2

Q

Define ingestion stage

Answer

A

Acquiring data and bringing data into GCP

Question 3

Q

Define storage stage

Answer

A

persisting data into a storage system from which it can be accessed for later stages of hte data lifecycle

Question 4

Q

process and analyze

Answer

A

transforming data into a usable format for analysis applications

Question 5

Q

Explore and visualize

Answer

A

insights are derived from analysis and presented in tables, charts and other visualizations for use by others.

Question 6

Q

Three broad ingestion modes

Answer

A

Application data
Streaming data
Batch data

Question 7

Q

How is application data generated?

Answer

A

Generated by applications including mobile apps, pushes and backend services

Question 8

Q

What does application data include?

Answer

A

user generated (ej. name, address),
data generated by the app (ej. logs),
event data (ej. clickstream)

Question 9

Q

Examples of services that can ingest application data

Answer

A

Compute Engine
Kubernetes Engine
App Engine

Question 10

Q

Examples of locations that Application data can be written to:

Answer

A

Stackdrive Logging
Managed databases such as: Cloud SQL or Cloud Datastore

Question 11

Q

What are examples of types of streaming data?

Answer

A

sensor data,
event data

Question 12

Q

What is the event time?

Answer

A

It is the often included streaming data timestamp that indicates the time that the data was generated

Question 13

Q

What is the process time?

Answer

A

When streaming data, some applications will also track the time that data arrives at the beginning of the ingestion pipeline. This is the process time.

Question 14

Q

Why may time-series data require some additional processing early in the ingestion process?

Answer

A

If a stream of data needs to be in time order for processing, then late arriving data will need to be inserted in the correct position in the stream. This can require buffering of data for a short period of time in case the data arrives out of order.

Question 15

Q

What is google cloud pub/sub?

Answer

A

is a fully-managed, scalable, global and secure messaging service that allows you to send and receive messages among applications and services

Question 16

Q

How Cloud Pub/Sub ingestion aid streaming data?

Answer

A

Streaming data is well suited for Cloud Pub/Sub ingestion because it can buffer data while applications process the data.

Question 17

Q

What happens during streaming data spikes when application instances cannot keep up with the rate at which data is arriving?

Answer

A

When this happenst he data can be preserved in a cloud Pub/sub topic and processed later after applciation instances have a chance to cath up.

Question 18

Q

How is Cloud Pub/Sub set up in a way that is accessible and scalable?

Answer

A

Cloud Pub/sub has global endpoints and uses GCP’s global frontend load balancer to support ingestion. The messaging service scales automatically to meet the demands of the current workload.

Question 19

Q

how is batch data ingested?

Answer

A

Batch data is ingested in bulk, typically in files.

Question 20

Q

What GCP services are often. used for batch uploads?

Answer

A

Google cloud storage is typically used for batch uploads. It may also be used in conjunction with Cloud Transfer Service and transfer Appliance when uploading large volumes of data.

Question 21

Q

What are the minimum three things that should be considered when choosing a storage system?

Answer

A

How is data accessed?
How access controls need to be implemented
How long data will be stored

Question 22

Q

Examples of databases to use when requiring to query for specific records using a set of filtering parameters

Answer

A

Cloud SQL, Cloud Datastore

Question 23

Q

Examples of options when needing to access data in bulk

Answer

A

Cloud storage

Question 24

Q

Options for when you need to access files using filesystem operations

Answer

A

Cloud filestore

Question 25

Q

What is nearline storage?

Answer

A

Nearline storage is used for data that is accessed less than once per 30 days.

Question 26

Q

what is coldline strogae?

Answer

A

Coldline storage is used to store data access less than once per year.

Question 27

Q

what is a service that is suited to transform both stream and batch data?

Answer

A

Cloud Dataflow

Question 28

Q

What are services that are useful for data analysis?

Answer

A

Cloud dataflow
Cloud Dataproc
BigQuery
Cloud ML Engine

Question 29

Q

What is Cloud Datalab?

Answer

A

Cloud Datalab which is based on Jupyter Notebooks is a GCP tool for exploring, analyzing, and visualizing data sets.

Question 30

Q

what are the 5 technical aspects of data?

Answer

A

Volume
Velocity
Variation
Access
Security

Question 31

Q

An individual item in cloud storage can be up to ___TB.

Question 32

Q

Cloud Bigtable can store up to ___ TB per node when using a hard disk drive and ___ TB per node when using SSDs.

Answer

A

8TB, 2.5TB

Question 33

Q

In General Cloud SQL is a good choice for applications that need:

Answer

A

relational database,
serve requests in a single region

Question 34

Q

What is velocity of data?

Answer

A

Velocity of data is the rate at which it is sent to and processed by an application

Question 35

Q

What are examples of low velocity and high velocity

Answer

A

low velocity: human entered data

high velocity: machine generated data such as IoT

Question 36

Q

What is structured data?

Answer

A

Structured data has a fixed set of attributes that can be modeled in a table of rows and columns

Question 37

Q

What is semi-structured data?

Answer

A

Semi-structured data has attributes like structured data, but the set of attributes can very from one instance to another.

Question 38

Q

Examples of row oriented storage

Answer

A

Cloud SQL and Cloud Spanner

Question 39

Q

How do wide-column databases organizes information?

Answer

A

Rahter than using indexes to allow efficeint lookup of rows with needed data, wide column databases organize data so taht rows with similar row keys are closer together.

Question 40

Q

Wide column databases are used for use cases with the following:

Answer

A

High volumes of data
Need for low-latency writes
More write operations tahn read operations
Limited range of queries - in other words no ad hoc queries
Look up by a single key

Question 41

Q

what are the four types of NoSQL databases available in GCP?

Answer

A

key-value
Document
Wide column
Graph

Question 42

Q

What is a key-value data store?

Answer

A

databases that use associative arrays of dictionaries as the basic datatype.

Question 43

Q

When to use key value data store and when to use document database?

Answer

A

In situations where items in the JSON structure should be searchable, a document database would a better option.

Question 44

Q

What is the most significant difference between a wide-column database and relational tables?

Answer

A

Wide column databases are often sparse, with the exception of IoT and other time series databases that have few columns that are almost always used.

Question 45

Q

What is a Graph database?

Answer

A

based on modeling entities and relationships as nodes and links in a graph or network. Social networks are a good example of a use case for graph databases.

Question 46

Q

which google cloud storage services are used with the different structure types?

Answer

A

Structured data is stored in cloud SQL and cloud panner if it is used with a transaction processing system

Big query is used for analytical applications of structured data.

Semi-structured data is stored in Cloud Datastore if data acess requires full indexing; otherwise, it can be sotred in Bigtable.

Brainscape's Knowledge GenomeTM

Selecting appropriate storage technologies Flashcards

Chapter 1

Brainscape's Knowledge Genome^TM