Distributed Databases Flashcards

1
Q

Distributed Databases

A

Distributed DBMS provide access to data at all sites.
Lets say we have one store in Liverpool. This store might eventually spread to Manchester or London etc.

The concrete definition of a distributed database if a collection of multiple logically interrelated databases which is distributed over a computer network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Advantages of distributed databases

A
  • help provide us access to these different data sites
  • we don’t have to specify where the data is from and we can just grab it from wherever
  • can gives many users access to large datasets
  • answer to queries faster by distributing tasks over the nodes
  • easier to scale (just add new node)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fragmentation

A

Split database in different parts which we can store at different nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Horizontal Fragmentation

A

Fragmenting the database from top to bottom (rows).
Data is stored as tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Vertical Fragmentation

A

Fragmenting the database based upon columns.
Data is stored as columns in other databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fragmentation Transparency

A

The user does not see this fragmentation, just the full relations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Entire Relation

A

Union of the fragments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fragmentation Advantage

A

Using these fragmentations can help resilience; if there is a failure in one store, there are other stores which hold the fragments of the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of replication

A
  • full replication
  • no replication
  • partial replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Full Replication

A

Each fragment is stored at every sight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

No Replication

A

Each fragment is stored at a unique site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Partial Replication

A

Limit number of copies of each fragment, where we replicate only some fragments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of transparency

A
  • fragmentation transparency
  • replication transparency
  • locations transparency
  • naming transparency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fragmentation Transparency

A

Fragmentation is transparent to others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Replication Transparency

A

Ability to copy data items at different sites where the replication is transparent to others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Locations Transparency

A

The location where data is stored is transparent to users.

17
Q

Naming Transparency

A

A given name has the same meaning everywhere in the system. Like the relation names must be the same everywhere.

18
Q

Concurrency Control in DDBMS

A

Locks are the main contributor in terms of concurrency.

19
Q

Types of locking distributions for DDBMS

A
  • one computer grants all locks
  • one computer grant many locks but with backups
  • many computers with different authorities
  • many computers with different authorities but with backups
20
Q

One computer grants all locks

A

If the computer fails, we have to restart everything that is running, since we do not have backups.
There are too many transactions for one computer to handle.

21
Q

One computer grant many locks but with backups

A

Solves the restarting problem since we have backups, but we need to keep everything synced.

22
Q

Many computers work together to grant keys

A

It is not sure which computer to ask for which key.

23
Q

Many computers work together to grant keys

A

The previous problem still remains, but now we also have to sync.

24
Q

Voting

A

1) Each site with a copy of an item has a local lock that it can grant transactions for that item.
2) If a transaction gets over half the local locks for an item (since sites can hold the same item/lock), it receives a global lock on that item. If it does get this lock, it must tell the sites with a copy that it has the global lock.
3) If the transaction takes too long to receive the global lock, it must stop trying to get it.
4) The only drawback is that it requires a lot more communication.

25
Q

Recovery in DDBMS

A

Global transactions have faulty ways of working, since node faults could result in the whole system having to be rolled back (which could not happen, breaking atomicity).
Therefore we use distribution commits.

26
Q

Two-phase commits protocol

A

Contains coordinators that execute at some node and decides if and when local transactions can commit.
Logging is used here to log messages sent to and received from other nodes at each node locally.

27
Q

Phases in TPCP

A

Phase 1 -> Decide whether we commit or abort.
Phase 2 -> Either commit or abort.

28
Q

Phase 1

A

After receiving a message from the coordinator, lets say prepare T, the nodes individually have to decide whether they are ready to commit or not:
- if they are all ready, we go into a pre-committed state and send back ready T
- if there is an abort, we send back don’t commit T to the coordinator and abort the local transaction

29
Q

Phase 2

A

Once phase 1 is successful, the coordinator sends the message commit T to all nodes.
If the coordinator receives at least one don’t commit message, or nothing at all from one node for some time, it sends a message back to all methods saying abort T.

30
Q

Phase 1 Logging

A

The stages:
- <prepare> to log
- send prepare T as message
The node then can either send don't commit or ready.
- if the node sends don't commit, we first log <don't commit T> and then send message to coordinator, which will eventually instruct the node to abort T.
- if the node is ready, it log <ready>, and sends ready T to the coordinator</ready></prepare>

31
Q

Phase 2 Logging

A

If some nodes don’t respond, or have responded don’t commit, the coordinator logs <abort> and sends that message to the nodes.
Otherwise, it logs <commit> and sends the message to commit.</commit></abort>

32
Q

Three-phase commit protocol

A

The issue with two phase is that if the coordinator and some transaction crashes, all the while everyone else is in the pre-committed state.
Depending if we commit or abort, we could break ACID properties. Therefore, we split phase 2 into 2 parts:
- Phase 2(a) -> prepare to commit. Send the decision to all nodes, and nodes go into prepare to commit state.
- Phase 2(b) -> the old phase 2.
Essentially, if the coordinator goes down, then the nodes will receive nothing, and therefore they will know to abort.

33
Q

Query Processing

A

The issue with query processing is that if we have a distributed database, then the data we want in our query can only be found other databases. Therefore we send a request, resulting in slow queries.
So instead, we use joins to help us, specifically a semi-join.

34
Q

Semi-join

A


Used to represent the rows in a specific table that would be joined using join methods.
The specific table depends if we use left or right semi-join.

35
Q

Left semi-join

A

R ⋉ S
Means that all tables that would have joined in R with some row in S are displayed.

36
Q

Right semi-join

A

R ⋊ S
Means that all tables that would have joined in S with some row in R are displayed.