Compare the concept of a transaction with that of a coarse-grained atomic action. What are their differences and similarities?
A transaction and a coarse-grained atomic action both consist of several operations, but these are treated as belonging together, and hence both are atomic. They both give the appearance of indivisibility. The difference is that a transaction also has the property of failure atomicity:
- if there is a failure part-way through the execution of the transaction, all the partly done operations are undone, so that it is as if nothing has happened;
- if the transaction completes, the results of the transaction are made durable (stored in persistent storage).
What are the desirable properties for transactions?
There are four desirable (ACID) properties:
What are the transaction states in the transaction state model?
The transaction model specifies the following transaction states:
When a transaction is in the Partially Committed state, it has not fully completed. What are the possible next states? Describe the circumstances under which it would progress to each of them.
When a transaction is Partially Committed it can progress to the Committed state in the case where all the data involved in the transaction is written to disk and made durable.
On the other hand, if there is a problem in making the data durable, the transaction is said to have Failed. The effects of the transaction are then undone, i.e. rolled back, and only then is the transaction completely finished when it enters the Aborted state.
What is the role of a TP system?
transaction processing system (also known as a TP system or a TP monitor)
The task of the TP system is to manage the correct and efficient execution of transactions.
What is a serial schedule?
A serial schedule is one in which transactions execute strictly one after the other, without any interleaving of operations.
Give the definition of serialisability.
A schedule for a group of transactions is serialisable if, and only if, it produces the same results as if the transactions had executed in some serial order.
When executing a number of transactions, several serial schedules may be possible. These serial schedules do not necessarily all give the same final results for the data objects involved. Explain whether or not you think this is problematic.
This is not problematic, as the main concern is whether the data objects are in a consistent state.
How is the notion of conflicting operations used to determine whether a schedule is serialisable?
The notion of conflicting operation allows us to determine in what order transactions execute a pair of conflicting operations.
Then, if for all the pairs of conflicting operations in a schedule the order of execution by transactions is the same, we know that the schedule is equivalent to a serial schedule.
Precedence graphs can be used as a notation for transactions and the conflicting operations of those transactions. Explain under which circumstances such graphs could contain a cycle.
A precedence graph contains a cycle if the schedule it represents is not serialisable.
What are the two phases of two-phase locking?
The two phases are acquire and release.
In the acquire phase, the transaction gradually gets hold of all the locks it needs, as and when it needs them, but does not release any locks.
In release, the transaction acquires no more locks but lets go of the ones it holds when it has completed work on the locked objects.
What is a shared lock and why might we want to use shared locks?
A shared lock is useful when non-conflicting operations are using the same object, and it is therefore safe for both to use the object. If a shared lock can be used, it increases concurrency.
What is a simple alternative to deadlock detection?
What are cascading aborts and what causes them?
Cascading aborts occur when one transaction has to abort and this leads to other transactions having to abort. This is bad for the performance of a system, as much work may have to be repeated.
What are the advantages and disadvantages of TSO?
Time-stamp ordering (TSO)
The advantages of TSO are:
- simple and efficient implementation;
- concurrency control data is held with individual objects and not centrally;
- objects are not locked for longer than the duration of an operation and so circular waits and hence deadlocks cannot occur.
The main disadvantages of TSO are:
- the loss of flexibility, in that it imposes one serial schedule and excludes other possibilities;
- it is more susceptible to transaction aborts than other approaches, due to lack of isolation, and may possibly lead to cascading aborts.
Why is the optimistic concurrency control approach so named?
This approach is called optimistic because it expects success, rather than failure. That is, it works away, confident that things will probably turn out for the best, and only performs a final check at the very end.
Distinguish between volatile memory, persistent storage and stable storage.
Volatile memory holds data that would be lost if the device is switched off (see Section 2),
whereas data stored in persistent storage remains even when the device is switched off, or the program that created the data stops executing.
Stable storage is even more secure, because data is stored in several places. Therefore, if a disk fails in one location, usually the data survives in a different location.
For crash resilience, what must be done when a transaction aborts?
For a transaction, when a crash occurs, it must be as though the transaction had never been invoked. Therefore all the effects of an aborted transaction must be undone before it can be restarted.
Which of the ACID properties of transactions are addressed through the provision of crash resilience?
(Failure) atomicity and durability. Transactions have to be all or nothing, i.e. failure atomic, and therefore crash-resilience mechanisms are put in place to ensure that a half-completed transaction can be undone should a crash occur. A crash-resilience mechanism also ensures that, if the transaction had committed, its results are definitely stored in persistent store, i.e. durability.
What is meant by rolling back?
Rolling back means restoring the persistent store to the state that existed before the start of a transaction that was aborted due to a system crash.
In logging, what might happen if log information is written to the persistent store after the data updated by the atomic operation is written there?
In logging, the information in the log is used to roll back the persistent store to its state at the start of the transaction in which the crash occurred, by using the old values recorded in the log. If a crash occurs after the data updated by the transaction is written to the permanent store but before the log information is written there (i.e. it is not a write-ahead log), then there will be no record of the original state to facilitate rollback.
What is shadowing? What essential feature is required to ensure that shadowing is implemented successfully?
Shadowing is where the results of a transaction are built up in a structure that mirrors part of the persistent store but the persistent store is not updated until the transaction commits. Once the transaction has committed, the updated copy replaces the relevant part of the persistent store. The essential feature of shadowing is that the replacement of the relevant part of the persistent store by the updated copy should occur in a single operation (e.g. by setting a single pointer). This avoids problems that would occur if a crash happened during the replacement.
What does the term relation refer to in relational database models?
Relation refers to the logical grouping of the data, which is often represented as a table, with rows and columns. Each column contains data of a certain type, and each row, also known as a record, contains the details for an object in the table. The relation is the table as a whole.
What are the core features of a DBMS?
database management system (DBMS)
The core features of a DBMS are:
- a modelling language to model the data;
- data structures for storage of data;
- a query language to retrieve and update data;
- a transaction concurrency control mechanism.
What is the purpose of JDBC?
JDBC originally stood for Java Database Connectivity, but is now not supposed to be an acronym at all (much to everyone’s puzzlement).
The purpose of JDBC is to provide a standard interface for communication with a large number of different databases.
What are the steps involved in setting up a connection between a Java application and a database, when using JDBC?
The steps involved in setting up the communication are:
1 the application causes the database driver to be loaded;
2 the application requests a connection for a particular database, and the driver manager requests all registered drivers to respond with a connection;
3 the driver manager returns a connection to the application;
4 communication through the JDBC connection to the database.
Why is a DBMS not able to offer concurrency control for distributed transactions?
database management system (DBMS)
Each DBMS has one resource manager. A distributed transaction involves objects residing in several locations held in several databases, and will require more than one resource manager to manage the concurrency control.
What is the default setting for transaction processing when executing SQL statements?
The default setting of the Connection object is that each SQL statement is treated as a transaction.
How can the default setting be altered?
The default setting can be altered with the command setAutoCommit(false), in which case all the statements after that are treated as forming part of one transaction, which will finish as soon as the commit or the rollback command is executed.
Explain how we can deal with possible SQLExceptions occurring during the execution of SQL statements arriving from a Java application.
The SQL statements that form one transaction should be grouped in a try–catch clause. If an exception occurs during one of the statements being executed, this can be caught, and the transaction can then be rolled back using the rollback method.
What is the point of having transaction isolation levels?
For some applications it may be appropriate to fine-tune the concurrency control, based on allowing certain types of read to happen.
Explain the different forms of read: dirty read, non-repeatable read and phantom read.
A dirty read occurs if a transaction reads values that are written by another transaction that hasn’t committed yet.
A non-repeatable read occurs if a transaction reads the same object twice during execution and finds a different value the second time, although the transaction has not changed the value in the meantime.
A phantom read occurs when a transaction re-executes a query, returning a set of data that satisfies the condition in the query, and finds that the set of data has changed as a result of another recently committed transaction.
Give two strategies for the replication of data.
Two strategies for the replication of data are synchronous and asynchronous replication.
In synchronous replication all replicas of a data item are updated at the same time, when a data item is being updated. A transaction cannot commit unless all replicas will be updated.
In asynchronous replication the replicas are updated after the source has been updated.
Differentiate between three forms of ownership for the replication of data.
Three forms of ownership for the replication of data are:
- master–slave ownership – only the master site is permitted to make updates;
- workflow ownership – permission to update the data moves along a certain path;
- update anywhere ownership – permission is equally shared between peers.
What do we mean by transparent management of a distributed DBMS? How does this apply to fragmentation and replication?
database management system (DBMS)
Transparent management is an important aim of a distributed DBMS, meaning that to the user of such a system it should appear as if they are dealing with just one database.
The management of the distribution involves hiding the low-level details of where everything is stored exactly (fragmentation) and whether there are several copies of each data item (replication).
What does the 2PC protocol aim to achieve?
2 phaze commit (2PC)
The aim of 2PC is that the committing of a distributed transaction is done atomically – that is, either the entire transaction has taken place and its effects are made durable, or it is as if nothing has happened.