Scaling NoSQL Databases Flashcards

1
Q

What is HBase based on?

A

Google Big Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In one sentence, what is HBase?

A

HBase is a column-oriented non-relational database management system that runs on top of Hadoop Distributed File System (HDFS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does HBase scale?

A

HBase system is designed to scale linearly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What query technology does HBase work well with?

A

HBase works well with Hive, a query engine for batch processing of big data, to enable fault-tolerant big data applications. This is because it comprises of a set of standard tables with rows and columns, much like a traditional database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the structure of an HBase key, value object?

A
  • HBase keys are byte arrays.
  • The key is composed of a row key, column family, column qualifier, timestamp, and a delete marker.
    1. The row key is the primary identifier and is used to uniquely identify a row in an HBase table.
    2. The column family is a way to group related columns together.
    3. The column qualifier is the specific column within the column family.
    4. The timestamp is associated with each version of a cell to support versioning.
    5. The delete marker is used to mark a cell for deletion.
  • HBase values are also byte arrays.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is/how does an HBase Cluster work?

A

In an HBase cluster, data is horizontally partitioned into regions based on row keys, and each region is managed by a separate region server.
The cluster consists of multiple region servers, each responsible for serving a subset of the data.
The Apache ZooKeeper coordinates and manages maintaining metadata, handling failover and monitors health.
The master manages the overall cluster, assigns regions to region servers, handles schema changes and coordinates administrative tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Cassandra based on?

A

Google Big Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Cassandra?

A

High performance column based Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does Cassandra reach performance?

A

Performance is reached through manual definition of tables and how to store the data during creation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Cassandra redundancy differ from HBase?

A

HBase replicates blocks using Hadoop HDFS, while Cassandra takes care of the replication factor itself using the Gossip protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does Cassandra handle redundancy?

A

Cassandra achieves redundancy and replication by distributing data across nodes, storing multiple copies (replicas) on different nodes using consistent hashing. ACK sent when replication is done (based on replication factor set)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does MongoDB handle scaling?

A

Sharding enables horizontal scaling, dividing data across multiple servers.
The WiredTiger storage engine efficiently manages data storage and retrieval, while the MongoDB query language facilitates flexible data querying.
Indexes enhance query performance.
Replica sets ensure data availability and fault tolerance through data replication across multiple nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is good and bad with MongoDB replica sets?

A

Great for total read redundancy
Potential issues with large amounts of writes due to propagation (replication)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In short, what is MongoDB?

A

Document based NoSQL Database
Data format: JSON

MongoDB’s architecture consists of databases, collections, and documents. Data is organized into flexible, JSON-like documents within collections, and collections are grouped into databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Redis?

A

Super fast in-memory key value-based NoSQL database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are 4 key features of Redis?

A

Transactions
Pub/Sub
Keys with a limited time-to-live
Automatic failover

17
Q

What is a typical use-case for Redis?

A

Often used to synchronize states between Kubernetes Pods as it is fast, resilient, and configurable.

18
Q

Is a Redis database scalable?

A

Yes. It can be used in many ways, including:
- A simple, singular DB
- A HA DB, that has 1 or more replicas
- A Clustered DB, that is several partitioned DB’s
- A HA Clustered DB, that has replicas of the partitions.

19
Q

What file formats are stored in the HDFS when using HBase?

A

When using HBase, it always stores the files in HFile-format, this means that you don’t need to worry about whether to use Parquet or Avro.