Advanced Databases COPY Flashcards

(101 cards)

1
Q

What does reliability in a system refer to?

A

The ability of a system to maintain data integrity and consistency and to recover from failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Durability in the context of transactions?

A

Ensures that once a transaction is committed, the changes to the database are persistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Atomicity.

A

Requires that for any transaction, either all operations are executed successfully, or none are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are logging and recovery mechanisms important?

A

They are vital for achieving durability and atomicity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does <start T> indicate?

A

Transaction T has started execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does <commit T> mean?

A

Transaction T has completed successfully and will make no further changes to database items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does <abort> signify?</abort>

A

Transaction T could not complete successfully. No changes made by T will be copied to disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the role of the coordinator in 2PC?

A

Involves a coordinator and one or more worker nodes to ensure atomicity of transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the Voting Phase in 2PC.

A

Coordinator sends a ‘prepare T’ message to all worker nodes, which execute their part and send back a ‘vote-commit T’/’vote-abort T’ message.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens in the Decision Phase of 2PC?

A

Coordinator analyzes votes from workers and makes a commit or abort decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Commit Decision?

A

If all workers vote-commit, the coordinator commits the transaction and sends commit messages to all workers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an Abort Decision?

A

If any worker votes-abort, the coordinator aborts the transaction and informs all workers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What triggers the Termination Protocol?

A

Activated when a timeout occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Cooperative Termination Protocol?

A

Assumes participants are aware of each other and tries to find out the coordinator’s decision after a timeout.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Recovery Protocol?

A

Initiated when a coordinator or participant restarts after a crash.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Presumed-Abort variant of 2PC.

A

Allows the coordinator to forget about transactions if the global decision is to abort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Presumed-Commit variant?

A

Assumes that if no information about a transaction is in memory, it must have been committed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Fill in the blank: A log is a _______.

A

persistent, append-only record of changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False: The Voting Phase is the second phase of 2PC.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the <ready> log record indicate?</ready>

A

It indicates that a worker is ready to commit or abort changes locally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a distributed system?

A

A system consisting of multiple machines that are far away from each other, controlled by the same organization, typically in different data centers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a characteristic of a distributed system?

A

Characteristics include:
* Multiple machines (>50)
* Homogeneous data format (relational)
* Same hardware across machines
* No reliance on a central site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the goal of fragmentation in a distributed system?

A

To break down large databases into smaller, more manageable units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the requirements for fragments in fragmentation?

A

The fragments should be:
* Disjoint
* Complete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does disjoint mean in the context of fragmentation?
No tuple of the global relation appears in more than one fragment.
26
What does complete mean in the context of fragmentation?
Every tuple in the original global relation must belong to at least one of the fragments.
27
What are the goals of distributing a system?
Goals include: * Reduce overall execution time * Balance I/O load evenly across nodes.
28
What are the 2 approaches for fragmentation?
29
What factors should be considered when distributing a system?
Factors include: * Number and capacity of available nodes * Query workload * Network topology and distance.
30
What is a reconstruction program?
31
What is localisation, in the context of distributed query processing?
It deals with how a query formulated against a global, non-distributed database schema operates on fragments of data across different nodes.
32
What is a naive approach in localisation, in the context of distributed query processing?
Involves substituting localization programs for relations in the original query, potentially leading to expensive operations.
33
What are reduction techniques in distributed databases?
Techniques employed to optimize queries by reducing the amount of data transferred across the network.
34
What reduction techniques are used for horizontal fragmentation?
Reduction with selection Reduction with join
35
What reduction techniques are used for vertical fragmentation?
Redution with projection
36
What is selection reduction?
Applied when a query includes a selection operation (WHERE), identifying irrelevant fragments to avoid accessing them.
37
What is join reduction?
Optimizes join operations by identifying pairs of fragments whose join will be empty and avoiding unnecessary joins.
38
What is projection reduction?
Aims to reduce data processed and transferred by eliminating unnecessary fragments for a query involving a projection operation.
39
For distributed joins, where do we perform the join?
40
What is semijoin reduction?
Moves only the part of one relation needed for the join.
41
Fill in the blank: The machines in a distributed system should be _______.
self-sufficient.
42
True or False: In a distributed system, there should be reliance on a central site.
False.
43
What is the purpose of a localization program in distributed databases?
To reconstruct global relations using expressions derived from fragments.
44
What does Two-Phase Lock (2PL) guarantee in distributed databases?
Serialisability, the highest isolation level ## Footnote 2PL is essential for preserving isolation in distributed DBs.
45
What are the two types of Two-Phase Locking?
* Centralised 2PL (C2PL) * Distributed 2PL (D2PL) ## Footnote C2PL uses a single site for lock management, while D2PL has lock managers at each site.
46
How does Centralised 2PL (C2PL) manage lock requests?
The transaction manager sends a lock request to the central lock manager ## Footnote The central lock manager decides whether to grant the lock.
47
What is a major drawback of Centralised 2PL?
Single point of failure and potential bottleneck ## Footnote Every lock request and release must go through the central lock manager.
48
In Distributed 2PL (D2PL), where are lock requests sent?
To the local lock manager at each participant site ## Footnote Each participant site has its own lock manager.
49
What occurs once a data processor finishes executing an operation in D2PL?
It sends an 'end of operation' message back to the transaction manager ## Footnote This informs the transaction manager that the operation is complete.
50
What is deadlock in the context of distributed systems?
Occurs when 2 or more transactions are waiting for each other to release a lock on an item ## Footnote Deadlocks can severely impact system performance.
51
What are the three conditions for deadlock occurrence?
* Concurrency * Hold * Wait ## Footnote These conditions must be met for a deadlock to occur.
52
What is a Wait-For Graph (WFG)?
A directed graph that represents the dependencies between transactions ## Footnote It helps in detecting deadlocks.
53
What indicates the presence of a deadlock in a Wait-For Graph?
The presence of a cycle ## Footnote A cycle in the WFG signifies that transactions are waiting indefinitely.
54
What are the two types of Wait-For Graphs?
* Local WFG * Global WFG ## Footnote Local WFGs are per site, while Global WFGs consider all transactions.
55
What is one technique for deadlock prevention?
Transaction pre-declaration ## Footnote Transactions declare all accessed data items in advance. TM only locks if all items are available.
56
What is resource ordering in deadlock avoidance?
Requiring transactions to always access resources in a predefined order ## Footnote This can be challenging in dynamic databases.
57
What does transaction prioritisation involve?
Using timestamps to decide which transaction to abort when a lock request is denied ## Footnote Rules like WAIT-DIE and WOUND-WAIT are examples of this approach.
58
What happens in the WAIT-DIE rule?
An older transaction waits for a younger one, while a younger one is aborted if it requests a lock held by an older one ## Footnote This helps to manage transaction priorities.
59
What is Centralised Deadlock Detection?
One site acts as the deadlock detector, checking for cycles in the Global Wait-For Graph ## Footnote Each site sends its Local WFG to the central detector.
60
What is one drawback of Centralised Deadlock Detection?
Single point of failure ## Footnote If the deadlock detector fails, deadlocks may go undetected.
61
What is the function of Hierarchical Deadlock Detection?
Organizing deadlock detectors in a hierarchy for monitoring ## Footnote Local detectors report to higher-level detectors.
62
What is Distributed Deadlock Detection?
Responsibility for deadlock detection is shared among sites ## Footnote This allows for a more robust detection mechanism.
63
What is a simple method for deadlock resolution?
Using timeouts to abort transactions waiting too long for a resource ## Footnote This method assumes that long wait times indicate a deadlock.
64
What is replication in the context of data management?
Replication is an extension of the fragmentation problem that involves creating multiple copies of data across different geographical locations.
65
List the reasons for using replication in data management.
* Latency reduction * Availability * Resilience * Performance
66
How does replication reduce latency?
By storing copies of the data closer to users in different geographical areas.
67
What is the impact of replication on system availability?
It increases availability by having multiple copies; if one replica fails, data can be accessed from other replicas.
68
Explain how replication contributes to resilience.
If a node containing a replica fails, transactions can be rerouted to other nodes with copies of the required data.
69
What benefit does replication provide in terms of performance?
It balances the read workload across multiple replicas, reducing bottlenecks and increasing throughput.
70
What is a key feature of each replica in a replicated system?
Each replica has its own transaction management system.
71
Define Strong Mutual Consistency.
All copies of an item have the same value after the execution of an update transaction.
72
Define Weak Mutual Consistency.
All copies of an item will eventually have the same value after the execution of an update transaction.
73
What does the term 'epsilon' refer to in data consistency?
Epsilon defines a bound on the allowed inconsistency, specifically the number of missing writes.
74
True or False: Inconsistent reads are never allowed in a replicated system.
False
75
Fill in the blank: The first read is made before A is updated in the right DB but is allowed because only _______ write missed.
1
76
What is the Time Bound consistency criterion?
It allows reads as long as they are within a defined bound of time units.
77
What is Time Drift in the context of data consistency?
It considers the average/combined temporal difference across multiple items accessed in the same transaction.
78
What is the significance of mutual consistency not being equal to serialisability?
Mutual consistency means replicas have the same value, but does not ensure that updates occurred in a single, step-by-step order.
79
Define Conflict-Free Replicated Data Types (CRDT).
A type of data object that can be used in replicated systems to facilitate lazy distribution of updates without leading to conflicts.
80
List the three key properties that make operations on CRDTs conflict-free.
* Associative * Commutative * Idempotent
81
How do CRDTs benefit replicated systems?
Updates can be lazily distributed without immediate propagation, ensuring the final state is consistent regardless of update order.
82
What is the advantage of lazy distributed replication?
It is practical for systems with network latency or intermittent connectivity.
83
What is churn in the context of decentralised systems?
The dynamic rate of participation of nodes within the network.
84
How does churn affect decentralised systems?
It makes fragmentation and replication of data more challenging.
85
What characterizes unstructured P2P networks?
Extreme autonomy with no control of the network topology.
86
What is a key design principle of unstructured P2P networks?
The lack of a central overseer.
87
List two advantages of unstructured P2P networks.
* Very resilient * Supports maximum autonomy
88
List two disadvantages of unstructured P2P networks.
* Unpopular items are not replicated enough * Enormous communication cost
89
What is the purpose of a routing table in decentralised systems?
To manage data and queries.
90
What does a routing table describe?
The range of hash keys stored in a specific node.
91
What is the challenge of maintaining a full routing table at each node?
It becomes expensive due to the need for synchronisation.
92
What is an overlay network?
A network where nodes connect to virtual nodes to control topology.
93
Name three overlay network geometries.
* Tree * Hypercube * Ring
94
What is the Patry Algorithm?
Each node maintains a routing table that stores the address of one node representative of a different prefix.
95
What is a Sybil attack?
Creation of multiple identities by an attacker to manipulate voting processes.
96
How does Proof of Work (PoW) counter Sybil attacks?
By shifting validation from identity count to computational effort.
97
What is a drawback of Proof of Work?
Difficulty tuning the puzzle can affect transaction confirmation speed.
98
What is Proof of Stake (PoS)?
A system where participants stake their own economic value to become validators.
99
What happens to validators in PoS if they are found cheating?
They lose their stake.
100
What are some requirements of Proof of Stake?
* Requires validators to reveal identity * Openly auditable transactions
101
What is needed to mitigate channel attacks in PoS?
Cryptography.