U5 Flashcards
(421 cards)
As its name suggests, “big data” is huge and fast-growing data. Big data initially:
With the dramatic increase in data, database costs, e.g., hardware, software, and operating costs, have accordingly increased. Hence,
attributed to search engines and social networks is now making its way into enterprises.
There exist several challenges while working with big data, including?
how to store it and how to process it.
Among these challenges is enabling the databases to meet the needs of?
high concurrent reading and writing with low latency.
there is an immense need to lower the costs of storing big data?
Because, with the dramatic increase in data, database costs,
e.g., hardware,
software,
and operating costs,
have accordingly increased.
The traditional relational databases, e.g., structured query language (SQL), are a?
collection of data items with pre-defined relationships between them.
These items are organized as a set of tables with:
columns and rows.
Unfortunately, these relational databases have some inherent limitations which emerge with:
the rapid growth of data.
In these cases, relational databases are:
widely prone to deadlocks and other concurrency issues.
These situations lead to rapid declines in?
the efficiency of reading and writing.
Furthermore, the multi-table correlation mechanism that exists in —————————————-represents a major limitation of database scalability. To overcome these problems, ——————–databases were proposed instead of the traditional database. NoSQL is an —————–term for ——————- databases which do not use the SQL structure.
relational database
NoSQL
umbrella
non-relational
NoSQL databases are useful for ?
applications that deal with very large semi-structured and unstructured data.
Unlike relational databases, NoSQL databases are designed to ?
scale horizontally and can be hosted on a cluster of processors.
In most of these databases, each row is a ?
key-value pair.
NoSQL databases contains truly elastic databases, e.g., MongoDB and Cassandra, which allows?
the addition/removal of nodes to/from a cluster without any observable down-time for the clients.
To this end, routing algorithms are used to decide when to move the inter-related data chunks, for instance, ?
when data must be moved to newly added node B. During the copying process, the data is served from the original node A. When the new node B has an up-to-date version of the data, the routing processes start to send requests to the node B.
In general, there are some important aspects related to distributed databases that need to be thoroughly addressed, including :
scalability,
availability,
and consistency.
First, scaling is typically achieved through?
“sharding” to meet the data volume.
Sharding is ?
a type of database partitioning that separates very large databases into smaller, faster, more easily managed parts, referred to as data shards.
NoSQL databases support an auto-sharding mode in which?
the shards are automatically balanced across the nodes on a cluster.
Additional nodes can be easily added as ?
necessary to the cluster to align with data volume.
Second, availability can be achieved via replication, i.e., ?
master-slave replication or peer-to-peer replication.
With master-slave replication, two types of nodes are typically implemented including:
a master node where all the write operations go to the master node.
Data can be read from any node, either a —————————– If a master node goes down, a slave node gets promoted to a ————–, and continues to replicate to the —————.
master or a slave.
master node
third node
When a failed master node is resurrected, it joins the cluster as a slave. Alternatively,?
peer-to-peer replication is slightly complex where all the nodes receive read/write requests.