Storage & Databases Flashcards
(99 cards)
What is the difference between Block Storage and Object Storage?
- Block storage is fixed-sized raw storage capacity
- Block storage stores data in volumes that can be shared and mounted; SAN, iSCSI and local disks
- Block storage is most common for applications and databases
- Object storage does not require a guest OS to exist, accessible via API’s
- Object storage grows as needed
- Object storage is redundant and can be replicated
- Unstructured data like music, image, video
- Log files and database dumps - Large data sets - Archive files
What are all the Google Cloud ‘Storage Options’?
- Cloud Storage - not structured, no mobile sdk
- Cloud Storage for Firebase - not structured, needs mobile sdk
- BigQuery - structured, analytics, read-only
- Cloud Bigtable - structured, analytics, updates with low-latency
- Cloud Datastore - structured, not analytics, non-relational, no mobile sdk
- Cloud Firestore for Firebase - structured, not analytics, non-relational, needs mobile sdk
- Cloud SQL - structured, not analytics, relational, no-horizontal scaling
- Cloud Spanner - structured, not analytics, relational, needs scaling
What are the three blocks that the Internet Assigned Number Authority (IANA) has reserved for private internets?
- 10.0.0.0 - 10.255.255.255 (10/8 prefix)
- 172.16.0.0 - 172.31.255.255 (172.16/12 prefix)
- 192.168.0.0 - 192.168.255.255 (192.168/16 prefix)
What is persistent disk, it’s features and what is it good/used for?
- Fully-managed, block storage for VM’s and containers
- Good for Compute Engine and Kubernetes Engine
- Good for snapshots of data backup
- Used for VM disks
- Used for sharing read-only data across multiple VMs
Features:
- Durable, independent volumes, 64 TB size, online resize
What is Cloud Storage and what is it good for?
- A scalable, fully-managed, highly reliable, and cost-efficient object / blob store.
- Good for: Images, pictures, and videos, Objects and blobs, Unstructured data
- Workloads: Sotring and streaming multimedia, stroring custom data analytics pipelines
- Achive, backup and disaster recovery
What are the different storage classes for Cloud Storage and what are they good for?
- Multi-Regional: across geographic regions
- Regional: ideal for compute, analytics, and ML workloads in a particular region
- Nearline: backups, low-cost, once a month access
- Coldline: archive, lowest-cost, once a year access
What is Bigtable?
- Massively scalable NoSQL
- Single table that can scale to billions of rows and thousands of columns
- Stores terabytes or petabyes of data
- Ideal for single-keyed data with very low latency
- Ideal data source for MapReduce operations
What is Bigtable good for?
Cloud Bigtable is ideal for applications that need very high throughput and scalability for non-structured key/value data, where each value is typically no larger than 10 MB. Cloud Bigtable also excels as a storage engine for batch MapReduce operations, stream processing/analytics, and machine-learning applications.
You can use Cloud Bigtable to store and query all of the following types of data:
- Marketing data such as purchase histories and customer preferences.
- Financial data such as transaction histories, stock prices, and currency exchange rates.
- Internet of Things data such as usage reports from energy meters and home appliances.
- Time-series data such as CPU and memory usage over time for multiple servers.
What is Cloud Spanner?
- fully-managed, horizontally distibuted relational database service
- handles massive transactional loads
- Uses Paxos algorithm to shard data across hundreds of data centers
- Mission critical, relaional database service with transactional consistency, global scale and high availability
- Cloud Spanner is ideal for relational, structured, and semi-structured data that requires high availability, strong consistency, and transactional reads and writes.
What is Cloud Datastore?
- highly scalable NoSQL ‘document’ database for your applications
- non-relational
- automatic sharding and replication
- highly-available and durable, scales automatically to handle load
- ACID, SQL-like queries, indexes, etc
- RESTful interfaces
What is Dataproc
Dataproc
Apache Hadoop software is an open source framework that allows for the distributed storage and processing of large datasets across clusters of computers using simple programming models. Hadoop is designed to scale up from a single computer to thousands of clustered computers, with each machine offering local computation and storage. In this way, Hadoop can efficiently store and process large datasets ranging in size from gigabytes to petabytes of data.
What are the different storage classes for GCP?
Standard, Nearline, Coldline, Archive
Why use nearline?
Nearline storage is a low-cost, highly durable storage service for storing infrequently accessed data. Nearline storage is a better choice than Standard storage in scenarios where slightly lower availability, a 30-day minimum storage duration, and costs for data access are acceptable trade-offs for lowered at-rest storage costs.
Nearline storage is ideal for data you plan to read or modify on average once per month or less. For example, if you want to continuously add files to Cloud Storage and plan to access those files once a month for analysis, Nearline storage is a great choice.
Why use regional over multi regional?
Lower cost.
To comply with specific legal restrictions.
Only needs to be read by a specific VM in a region.
Has higher availability. 99.9% vs 99.5%
You cannot change a bucket to regional from multi-regional (T/F)
T/F You cannot change a bucket to regional from multi-regional.
You permanently set a geographic location for storing your object data when you create a bucket.
- You cannot change a bucket’s location after it’s created, but you can move your data to a bucket in a different location.
You can select from the following location types:
A region is a specific geographic place, such as São Paulo.
A dual-region is a specific pair of regions, such as Tokyo and Osaka.
A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.
All Cloud Storage data is redundant across at least two zones within at least one geographic place as soon as you upload it.
Additionally, objects stored in a multi-region or dual-region are geo-redundant. Objects that are geo-redundant are stored redundantly in at least two separate geographic places separated by at least 100 miles.
Default replication is designed to provide geo-redundancy for 99.9% of newly written objects within a target of one hour. Newly written objects include uploads, rewrites, copies, and compositions.
Turbo replication provides geo-redundancy for all newly written objects within a target of 15 minutes. Applicable only for dual-region buckets.
Cloud Storage stores object data in the selected location in accordance with the Service Specific Terms.
How do you change the storage class of an object?
The storage class set for an object affects the object’s availability and pricing model.
- You can change the storage class of an existing object either by rewriting the object or by using Object Lifecycle Management.
- gsutil rewrite -s nearline -k -r gs://bucket
How do you set the default storage class of a bucket?
What if you don’t set it?
When you create a bucket, you can specify a default storage class for the bucket. When you add objects to the bucket, they inherit this storage class unless explicitly set otherwise.
If you don’t specify a default storage class when you create a bucket, that bucket’s default storage class is set to Standard storage.
Changing the default storage class of a bucket does not affect any of the objects that already exist in the bucket.
How to change the default storage class for a bucket. When you upload an object to the bucket, if you don’t specify a storage class for the object, the object is assigned the bucket’s default storage class.
Two ways to use - gcloud and gsutil
gcloud storage buckets update gs://BUCKET_NAME –default-storage-class=STORAGE_CLASS
Use the gsutil defstorageclass set command:
gsutil defstorageclass set STORAGE_CLASS gs://BUCKET_NAME
example gsutil defstorageclass set nearline gs://help_bucket
Where:
- STORAGE_CLASS is the new storage class you want for your bucket. For example, nearline.
- BUCKET_NAME is the name of the relevant bucket. For example, my-bucket.
The response looks like the following example:
Setting default storage class to “nearline” for bucket gs://my-bucket
Can you share a disk between VMs
You can attach an SSD persistent disk in multi-writer mode to up to two N2 virtual machine (VM) instances simultaneously so that both VMs can read and write to the disk.
To enable multi-writer mode for new persistent disks, create a new persistent disk and specify the –multi-writer flag in the gcloud CLI or the multiWriter property in the Compute Engine API.
What are some of the different storage options compute engine instances?
- Zonal persistent disk: Efficient, reliable block storage.
- Regional persistent disk: Regional block storage replicated in two zones.
- Local SSD: High performance, transient, local block storage.
- Cloud Storage buckets: Affordable object storage.
- Filestore: High performance file storage for Google Cloud users.
If you are not sure which option to use, the most common solution is to add a persistent disk to your instance.
When you configure a persistent disk, you can select one of the following disk types.
- Standard persistent disks (pd-standard) are backed by standard hard disk drives (HDD).
- Balanced persistent disks (pd-balanced) are backed by solid-state drives (SSD). They are an alternative to SSD persistent disks that balance performance and cost.
- SSD persistent disks (pd-ssd) are backed by solid-state drives (SSD).
- Extreme persistent disks (pd-extreme) are backed by solid-state drives (SSD). With consistently high performance for both random access workloads and bulk throughput, extreme persistent disks are designed for high-end database workloads. Unlike other disk types, you can provision your desired IOPS. For more information, see Extreme persistent disks.
How can you share a persistent disk across VMs?
Share a zonal persistent disk between VM instances
- Connect your instances to Cloud Storage.
- Connect your instances to Filestore.
- Create a network file server on Compute Engine.
- Create a persistent disk with multi-writer mode enabled and attach it to up to two instances.
How do you create a HA File Server with two GCE Instances and regional disks?
Database HA configurations typically have at least two VM instances. Preferably these instances are part of one or more managed instance groups:
- A primary VM instance in the primary zone
- A standby VM instance in a secondary zone
A primary VM instance has at least two persistent disks: a boot disk, and a regional persistent disk. The regional persistent disk contains database data and any other mutable data that should be preserved to another zone in case of an outage.
A standby VM instance requires a separate boot disk to be able to recover from configuration-related outages, which could result from an operating system upgrade, for example. You cannot force attach a boot disk to another VM during a failover.
The primary and standby VM instances are configured to use a load balancer with the traffic directed to the primary VM based on health check signals. This configuration is also known as a hot standby.