Distributed Databases Flashcards

Question 1

Q

Distributed Databases

Answer

A

Distributed DBMS provide access to data at all sites.
Lets say we have one store in Liverpool. This store might eventually spread to Manchester or London etc.

The concrete definition of a distributed database if a collection of multiple logically interrelated databases which is distributed over a computer network.

Question 2

Q

Advantages of distributed databases

Answer

A

help provide us access to these different data sites
we don’t have to specify where the data is from and we can just grab it from wherever
can gives many users access to large datasets
answer to queries faster by distributing tasks over the nodes
easier to scale (just add new node)

Question 3

Q

Fragmentation

Answer

A

Split database in different parts which we can store at different nodes.

Question 4

Q

Horizontal Fragmentation

Answer

A

Fragmenting the database from top to bottom (rows).
Data is stored as tuples.

Question 5

Q

Vertical Fragmentation

Answer

A

Fragmenting the database based upon columns.
Data is stored as columns in other databases.

Question 6

Q

Fragmentation Transparency

Answer

A

The user does not see this fragmentation, just the full relations.

Question 7

Q

Entire Relation

Answer

A

Union of the fragments.

Question 8

Q

Fragmentation Advantage

Answer

A

Using these fragmentations can help resilience; if there is a failure in one store, there are other stores which hold the fragments of the database.

Question 9

Q

Types of replication

Answer

A

full replication
no replication
partial replication

Question 10

Q

Full Replication

Answer

A

Each fragment is stored at every sight.

Question 11

Q

No Replication

Answer

A

Each fragment is stored at a unique site.

Question 12

Q

Partial Replication

Answer

A

Limit number of copies of each fragment, where we replicate only some fragments.

Question 13

Q

Types of transparency

Answer

A

fragmentation transparency
replication transparency
locations transparency
naming transparency

Question 14

Q

Fragmentation Transparency

Answer

A

Fragmentation is transparent to others.

Question 15

Q

Replication Transparency

Answer

A

Ability to copy data items at different sites where the replication is transparent to others.

Question 16

Q

Locations Transparency

Answer

A

The location where data is stored is transparent to users.

Question 17

Q

Naming Transparency

Answer

A

A given name has the same meaning everywhere in the system. Like the relation names must be the same everywhere.

Question 18

Q

Concurrency Control in DDBMS

Answer

A

Locks are the main contributor in terms of concurrency.

Question 19

Q

Types of locking distributions for DDBMS

Answer

A

one computer grants all locks
one computer grant many locks but with backups
many computers with different authorities
many computers with different authorities but with backups

Question 20

Q

One computer grants all locks

Answer

A

If the computer fails, we have to restart everything that is running, since we do not have backups.
There are too many transactions for one computer to handle.

Question 21

Q

One computer grant many locks but with backups

Answer

A

Solves the restarting problem since we have backups, but we need to keep everything synced.

Question 22

Q

Many computers work together to grant keys

Answer

A

It is not sure which computer to ask for which key.

Question 23

Q

Many computers work together to grant keys

Answer

A

The previous problem still remains, but now we also have to sync.

Question 24

Q

Voting

Answer

A

1) Each site with a copy of an item has a local lock that it can grant transactions for that item.
2) If a transaction gets over half the local locks for an item (since sites can hold the same item/lock), it receives a global lock on that item. If it does get this lock, it must tell the sites with a copy that it has the global lock.
3) If the transaction takes too long to receive the global lock, it must stop trying to get it.
4) The only drawback is that it requires a lot more communication.

Question 25

Q

Recovery in DDBMS

Answer

A

Global transactions have faulty ways of working, since node faults could result in the whole system having to be rolled back (which could not happen, breaking atomicity).
Therefore we use distribution commits.

Question 26

Q

Two-phase commits protocol

Answer

A

Contains coordinators that execute at some node and decides if and when local transactions can commit.
Logging is used here to log messages sent to and received from other nodes at each node locally.

Question 27

Q

Phases in TPCP

Answer

A

Phase 1 -> Decide whether we commit or abort.
Phase 2 -> Either commit or abort.

Question 28

Q

Phase 1

Answer

A

After receiving a message from the coordinator, lets say prepare T, the nodes individually have to decide whether they are ready to commit or not:
- if they are all ready, we go into a pre-committed state and send back ready T
- if there is an abort, we send back don’t commit T to the coordinator and abort the local transaction

Question 29

Q

Phase 2

Answer

A

Once phase 1 is successful, the coordinator sends the message commit T to all nodes.
If the coordinator receives at least one don’t commit message, or nothing at all from one node for some time, it sends a message back to all methods saying abort T.

Question 30

Q

Phase 1 Logging

Answer

A

The stages:
- <prepare> to log
- send prepare T as message
The node then can either send don't commit or ready.
- if the node sends don't commit, we first log <don't commit T> and then send message to coordinator, which will eventually instruct the node to abort T.
- if the node is ready, it log <ready>, and sends ready T to the coordinator</ready></prepare>

Question 31

Q

Phase 2 Logging

Answer

A

If some nodes don’t respond, or have responded don’t commit, the coordinator logs <abort> and sends that message to the nodes.
Otherwise, it logs <commit> and sends the message to commit.</commit></abort>

Question 32

Q

Three-phase commit protocol

Answer

A

The issue with two phase is that if the coordinator and some transaction crashes, all the while everyone else is in the pre-committed state.
Depending if we commit or abort, we could break ACID properties. Therefore, we split phase 2 into 2 parts:
- Phase 2(a) -> prepare to commit. Send the decision to all nodes, and nodes go into prepare to commit state.
- Phase 2(b) -> the old phase 2.
Essentially, if the coordinator goes down, then the nodes will receive nothing, and therefore they will know to abort.

Question 33

Q

Query Processing

Answer

A

The issue with query processing is that if we have a distributed database, then the data we want in our query can only be found other databases. Therefore we send a request, resulting in slow queries.
So instead, we use joins to help us, specifically a semi-join.

Question 34

Q

Semi-join

Answer

A

⋉
Used to represent the rows in a specific table that would be joined using join methods.
The specific table depends if we use left or right semi-join.

Question 35

Q

Left semi-join

Answer

A

R ⋉ S
Means that all tables that would have joined in R with some row in S are displayed.

Question 36

Q

Right semi-join

Answer

A

R ⋊ S
Means that all tables that would have joined in S with some row in R are displayed.

Brainscape's Knowledge GenomeTM

Distributed Databases Flashcards

Brainscape's Knowledge Genome^TM