Managing and querying DB Flashcards Preview

Big Data > Managing and querying DB > Flashcards

Flashcards in Managing and querying DB Deck (74)
Loading flashcards...

Concurrency control schemes

Control the interactions among the concurrent transactions in order to prevent them from destroying the consistency of the database


ACID transactions

  • Atomic: Either the whole transaction is run, or nothing
  • Consistent: Database constrainst are preserved
  • Isolated: Different transactions may not interact with each other
  • Durable: Effects of a transactions are not lost in case of a system crash


Transaction concept

  • A transaction is a unit of pgrogram execution that accesses and possibly updates various data items.
  • A transaction must see a consistent databse
  • During transaction execution the database may be inconsistent
  • When the transaction is committed, the database must be consistent
  • Two main issues to deal with
    • Failures of various kinds such as hardware failures and system crashes
    • Concurrent exection of multiple transaction


Implicit schema

  • document can be stored as it is without having to define a schema for it 
  • without checking that it conforms to a schema.
  • An explicit schema requires the schema to be defined and that documents are compliant before they can be stored successfull


Charactaristics of NoSQL

  • Non-relational
  • Cluster-friendly
  • schema-less


Impedance mismatch problem



  • The programming model of data doesn’t match the database model of
  • This led to the rise of object-oriented databases in the 1990s.





  • Nested hierarchical structure
  • Aggregate orientation fits naturally with clusters
  • Can store a whole aggreagte on a single node
  • For applications where we need to slice and dice data in diferent ways, RDBMS and graph databases are more appropriate


Which of the NoSQL is most different ot the others ?

  • Graph database
  • Other are aggregate oriented (Column-family, Document databases, and key-value)


polyglot persistence



  • Different kinds of data are best dealt with different data storage technologies.



  • Taking one copy of the data and splitting it across many machines
  • Nothing is shared


Technical debt

Implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer.


CAP theorem

it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

  1. Consistency: Every read receives the most recent write or an error
  2. Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write
  3. Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes


  • Single-server RDBMS is a "CA" system


Difference between relation and relationship

Relation refers to a table in a relational model based database while relationship refers to how two tables are connected


Relational DBMS

  • Tabular structure
  • Foundation in set theory


Database implementation topics

  • Database sytem architecutre
  • Block buffer
  • Transactions and concurrency
  • Recovery
  • Indexes
  • Query processing and optimisation


Three-level schema

  1. External:
  2. Conceptual: Logical level
  3. Interal: Storage/physical level


Data independence (Logical and physical)

  • Logical data independence
    • Conceptual schema must be able to evolve without having to change external application program
  • Physical data independence
    • Must be able to substitute a different physical sotrage schema for an existing one without having to change external application programs


Data models

  • "No data model"
    • Flat files
  • "Classical" data models
    • Hierarchical
    • Network
    • Relational
  • Semantic data models
    • Entity-Relatoinship model
    • Functional dat model
    • SDM
  • Objet oriented
  • NoSQL


Failure classifcation in DB

  • Transaction failure
    • Logical errors: transaction cannot complete due to some internal error condition
    • System errors:  the DB system must terminate an active transaction due to an error condition (e.g deadlock)
  • System crash: a power failure or other hardware or software failure causes the system to crash
    • fail-stop assumption: non-volatile storage contens are assumed to not be corrupted by system crash
  • Disk failure: a head crash or similiar disk failre destroy all or part of disk storage


D. input() and output()


C. Main memory



  • A lock is a mechanism to control concurrent access to a data item.
  • There is potential for deadlock; transactions(s) must be rolled back to release lock(s) and resolve the deadlock
  • Starvation is also possible e.g
    • A transaction may be waiting for an exclusive lock on an item, while a sequence of other transactions request and are granted shared locks on the same item.


Database system architecture


B. False


Recovery algorithms

Techniques to ensure database consistency and transaction atomicity and durability despite failures. Contains two parts

  1. Actions taken during normal transcation processing to ensure enough information eists to recover from failures
  2. Actions taken after a failure to recover the database contents to a state that ensure atomicity, consistency and durability.


Storage structure

  • Volatile storage
    • Does not survive system crashes
    • examples: main memory, cache memory
  • Non-volatie storage
    • Survive system crashes
    • examples: disk, tape, flash meomry
  • Stable storage
    • a "mythical" form of storage that survives all failures
    • Approximated by maintining copies on distinct non-volative media; copies can be at remote sites (to protect against fire, food etc..)


Log-based recovery

  • The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if any failure occurs, then it can be recovered from there.
  • For a transaction T_i write following to vlog
    • Transaction start:
    • before write: let V1 be value of X before write and V2 the value to be written to X
    • Transaction finishes: When transaction T_i it last statment, write commit log


We assume that log records are written directly to stable storage


There are two approaches to modify the database:

  1. Deferred database modification
  2. immediate database modifcation


Deferred database modification

  • Transactions operations do not immediately update the physical database
  • Only transaction log is update
  • Physical database is only updated after transcation reaches commit


Immediate database modification

  • approache to modify the database:
  • Database is immediately updated by the transaction operation during the execution of the transcation even before it reaches the commit point
  • Update log record must be written before database item is written.


Remote backup systems

  • Detection of failure
  • Transfer of control
  • Time to recover
  • Hot-spare