(F) NoSQL/NewSQL Databases Flashcards

1
Q

What is the main concept of NoSQL and NewSQL

A

NoSQL: type of database designed to stored and retrieve large volumes of unstructured or semi-structured data without requiring a fixed schema, offering scalability and flexibility.
Examples key-value, document, and graph, focusing on high scalability and performance for large volumes of unstructured data.

NoSQL characteristics:
- It is flexible and easy to scale and designed for high performance. Do not required fixed table schemas and usually avoid join operations

Examples of NoSQL:
Key-value like DynamoDB
document stores like MongoDB
Graph databases like Neo4J
wide-column stores like Cassandra
============================
NewSQL: NewSQL databases combine the scalability of NoSQL systems for online transaction processing (OLTP)and maintaining the ACID guarantees of a traditional database system.

Examples of NewSQL:
Google Spanner and CockroachDB

NewSQL characteristics:
- It is designed to handle large volume of transactions typical of big data application while providing strong consistency and durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Architectures of NoSQL and NewSQL

A

NoSQL architecture: This architecture is designed for flexibility and easy scaling because it utilizes a distributed architecture meaning data is partitioned across multiple servers

NewSQL architecture: This architecture is designed for high availability and strong consistency across all nodes. A combination of NoSQL and relational database. Uses an advanced transaction mechanisms and query optimizer to ensure high perfomance and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Challenges with distribution

A

The primary challenge in distributed databases is maintaining a balance between consistency, availability, and partition tolerance (CAP theorem).

NoSQL system often prioritizes scalability and availability over consistency, Challenges include managing data replication, dealing with network partitions, and ensuring data consistency across nodes.

====================
In NewSQL databases, the challenges with distribution largely revole around their attempt to bridge the gap between the high availability and scalability of NoSQL systems and the strong consistency and transactional guarantees of traditional SQL databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the advantages/disadvantages of NoSQL and NewSQL

A

Advantage of NoSQL:
- The major advantage of NoSQL databases is their ability to scale horizontally and handle large volumes of data efficiently without complex transactions and schema rigidity

Disadvantage of NoSQL:
- lack the strong consistency and transactional support of traditional SQL databases. and complex transactions can be problematic

========================
Advantages of NewSQL
- combines the scalability of NoSQL with the strong consistency and transaction integrity of SQL

Disadvantage of NewSQL: can be more complex to manage and deploy, and may not achieve the same level of operational simplicity as NoSQL in some cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is CAP theorem?

A

The CAP theorem asserts that in a distributed system, it is impossible to achieve consistency, availability, and partition tolerance all at the same time. This principle implies that any distributed database can only provide two of these three guarantees concurrently, but not all three. Thus, system designers must make trade-offs based on which system characteristics are most critical to their specific application’s needs.

CAP theorem is the desirable properties of a distributed system
- Consistency: All nodes see the same data values at the same time
- Availability: Every request made to the system receives a response, ensuring that users can access the system and receive a reply, even if some parts of the system are experiencing failures.
- Partition tolerance: the system keeps functioning in case of network failure

Sidenote: Partition tolerance refers to the ability of a distributed system to continue operating and providing services even if individual components (nodes) in the system fail or become unreachable due to network partitions or communication failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sync types

A

Eventual consistency: updates to the database are propagated to all nodes eventually, allowing for higher availability and partition tolerance but less consistency

Strong consistency: All nodes see the same data values at the same time (single coherent system)

Sidenote: Partition tolerance refers to the ability of a distributed system to continue operating and providing services even if individual components (nodes) in the system fail or become unreachable due to network partitions or communication failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is In-memory DBs

A

In-memory database provide fast data processing by storing all data in RAM. They are designed for high-speed transactions and real-time analytics but require sufficient memory to hold all data, which can limit their use to scenarios where speed is prioritized over storage cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is in-column DBs

A

In-column database store data in columns rather than rows, which is optimal for analytics and operations that need need to access large volumes of data in a single column. It provides an efficient data compression and fast query performance by loading only necessary columns into memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

5 Categories of data store (like key-value, document, tabular…)

A

Key-value
- Advantage: optimize for high performance and scalability; simple architecture ideal for quick reads and writes
- Disadvantage: lacks the ability to perform complex queries; relationships between data are not natively handled
-Example systems: DynamoDB, SimpleDB, Cassandra
-Requirements: access by key, flexibility (no schema)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2nd. Categories of data store (like key-value, document, tabular…)

A

Document
-Advantage: Flexibility with schema-less data models; ideal for storing and retrieving semi-structured data
-Disadvantage: Lack of support for complex transactions and joins which is challenging to maintain data consistency
Example systems: MongoDB, CouchDB
-Requirements: access by key, flexibility (no schema), very high scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3rd. Categories of data store (like key-value, document, tabular…)

A

Tabular
-Advantage: Efficient for structured data with dynamic partitioning, suitable for large datasets like those in big data application
-Disadvantage: lack of advance query capabilities like those found in SQL, which may necessitate additional processing
Example systems: BigTable
-Requirements: very big collections, scalability,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

4th. Categories of data store (like key-value, document, tabular…)

A

Graph
-Advantage: efficient at traversing and querying highly connected data, such as complex network y
Disadvantage: May require more resources and expertise to optimize and manage, especially in distributed setups
-Example systems: Neo4J, Sparcity
-Requirements: efficient storage and management of large graphs

y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

5th. Categories of data store (like key-value, document, tabular…)

A

NewSQL
-Advantage: Merges the scalability of NoSQL with the transaction consistency of SQL databases
-Disadvantage: can be complex and costly to implement, often requiring expertise to manage the blend of old and new database technologies
-Example systems: CockroachDB, VoltDB
-Requirements: ACID transactions, flexibility and scalability SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly