Main Flashcards

1
Q

What is the Entity-Relationship Model?

A

The entity-relationship (ER) data model uses a collection of basic objects, called entities, and relationships among these objects.

An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the languages of Databases?

A
  1. Data-definition language (Specify database schema)
  2. Data-manipulation language (Database queries)

In practice: both are in the same database language, e.g. SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Database System?

A

A Database Management System and a Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the 3-layer schema architecture?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a conceptual schema

A

The conceptual schema defines the ontology (the formal naming and definition of the types, properties, and interrelationships of the entities) of the concepts as the users think of them and talk about them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an external schema?

A

The external schema defines the view of the data presented to the application programs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an internal schema?

A

Describes the internal formats of the data stored in the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Physical data independence?

A

Changes regarding file structure and access paths (physical layer) have no influence on the conceptual schema (logical layer).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some examples of changes on the physical layer?

A
  • New hard disk added
  • New processor added
  • Files are split into multiple files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Logical data independence?

A

Changes on the logical layer have no infuence on external schemas and applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of changes on the logical layer?

A
  • Adding another attribute to the conceptual schema
  • Changing the name of an attribute on the conceptual schema
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Mini-world?

A

Some part of the real world about which information is stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are data/information?

A

Known facts about the mini-world that can be recorded and have an implicit meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a database (DB)?

A

A collection of related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Database Management Systems (DBMS)?

A

A software package to facilitate the creation and maintenance of a database

Examples: MySQL, PostgreSQL, Microsoft SQL Server, Oracle, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a database instance?

A

The content of a DB at a particular time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name some of the problems that can occur when managing data without a DBS

A
  • Data can get lost.
  • Query evaluation might be very slow, especially for large amounts of data.
  • Access rights are hard to control.
  • Extending the data with additional attributes is difficult.
  • Inconsistent data might be stored.
  • Concurrent access to the data cannot be handled efficiently.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Definition of a relation

A

Assume D1, D2,…,Dn are domains

R ⊆ D1 x … x Dn

  • Example: telephoneBook ⊆ string x string x integer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Relational schema

A

Defines the structure of stored data

  • Is denoted as: sch(R) or R
  • Notation: R(A1 : D1, A2 : D2, …) with Ai denoting attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In what order are tuples in a relation stored?

A

Tuples in a relation are unordered.

These relations have the same information content:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In what order are attributes in a touple/relation stored?

A

In accordance with the mathematical definition of tuples, attributes in a tuple/relation are ordered.

These relations have different information content:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Values in a tuple

A
  • Values in a tuple are atomic (indivisible)
  • A value cannot be a structure, a record, a collection type or a relation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Null values

A

A special null value is used to represent values that are unknown or inapplicable to certain tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When are duplicates not allowed?

A

A relation adheres to the mathematical definition of a set.

No two tuples in a relation may have identical values for all attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does union-compatible mean?

A

Two relations are union-compatible if:

  • They have the same number of attributes (they have the same arity)
  • The domain of each attribute in column order is the same in both relations
    (the i’th attribute of both R and S have the same domain)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Relational algebra operator: π(pi)

A

Projection: The result is a relation of n columns obtained by removing the columns that are not specified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Relational algebra operator: σ(sigma)

A

Selection: Selects/filters rows of a table according to the selection predicate

  • Notation: σF
  • Selection predicate F consists of:
    • ​Logic operators: \/(or), /(and), ¬(not)
    • Arithmetic comparison operators: <, ≤, =, >, ≥, ≠
    • Attribute names of the argument relations or constants as operands
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Relational algebra operator: ρ(rho)

A

Rename:

  • Renaming relation R to S: ρS(R)
  • Renaming attribute B to A: ρA←B(R)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Relational algebra operator: X

A

Cartesian product (aka. cross product)

The cartesian product (R x S) between relations R and S consists of all possible combinations (|R| * |S| pairs) of tuples from both relations.

Attributes in the result are referenced as R.A or S.A to resolve ambiguity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Relational algebra operator: U

A

Union: The result of a union (R U S) between two relations R and S contains all tuples from both relations without duplicates.

R and S have to be union-compatible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Relational algebra operator: —

A

Difference: The difference (R — S or R \ S) of two relations R and S removes all tuples from the first relation that are also contained in the second relation.

R and S have to be union-compatible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Relational algebra operator: ∩

A

Intersection: The intersection (R ∩ S) of two relations R and S consists of a set of tuples that occur in both relations.

R and S have to be union-compatible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Relational algebra operator: ⋈

A

Natural join: The natural join combines two relations via common attributes (same names and domains) by combining only tuples with the same values for common attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Relational algebra operator: ⟕

A

Left outer join: keep dangling tuples in the left operand relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Relational algebra operator: ⟖

A

Right outer join: keep danging tuples in the right operand relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Relational algebra operator: ⟗

A

(Full) outer join: keep dangling tuples of both operand relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Relational algebra operator: ⋉ and ⋊

A

Left semi join:Right semi join:

Find all tuples in a relation for which there are matching tuples in the other relation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Relational algebra operator: γ (gamma)

A

Grouping: tuples with the same attribute values (for a specific list of attributes) are grouped. An aggregate function is applied to each group (computing one value for each group).

Typical aggregate functions:

  • count - number of tuples in a group
  • sum - sum of attribute values in a group
  • min, max, avg - minimum, maximum and average of attribute values in a group
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Relational algebra operator: ÷

A

Division

Example: Find all wines that have been reviewed by both Parker and Clarke

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

θ-join (Theta join)

A

To combine tuples from two relations where the combination condition is not simply the equality of shared attributes then it is convenient to have a more general form of join operator, which is the θ-join.

You can use any binary operator in the set {<, ≤, =, >, ≥} in between the two variables,

Example: price < myMoney or price < 800

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

When is a variable free in domain relational calculus?

A

A variable is free if:

  • It is not quantified by a universal quantifier
  • It is not quantified by an extential quantifier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Universal quantifier

A

A universal quantification is a type of quantifier, a logical constant, which is interpreted as (∀) or (⇒).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Existential quantifier

A

An existential quantification is a type of quantifier, a logical constant, which is interpreted as “there exists” (∃)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Universal quantifier: ⇒

A

P ⇒ Q means “P implies Q”, i.e., “if P is true, then Q must be true”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Domain relational calculus: When is a query safe?

A

When it returns only a finite number of tuples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the expressiveness of relational algebra, tuple relational calculus, and domain relational calculus?

A

All three languages are equal regarding expressiveness (relationally complete)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How is an attribute marked as a primary key?

A

By having a line under it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

ER model: Entities and entity types

A

Entities are objects of the real world about which we want to store information

Entities are grouped into entity types (represented as a square).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

ER model: Entity set

A

An entity set is a set of entities of the same type (e.g., all persons having an account at a bank).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

ER model: Attributes

A

Attributes model characteristics of entities or relationships.

  • All entities of an entity type have the same characteristics
  • Attributes are declared for entity types
  • Attributes have a domain or value set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

ER model: Single-valued vs multi-valued attributes

A

Ex: A person might have multiple phone numbers (or a single one)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

ER model: Simple attributes vs composite attributes

A

Ex: An address can be modeled as a string or composed of street and city

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

ER model: Stored attributes vs derived attributes

A

Ex: Birthday and age.

Age is computed several times using birthday.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Functional dependency

A
  • A functional dependency is a constraint between two sets of attributes in a relation from a database.
    • In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
  • Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X → Y) if, and only if, each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency X → Y.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

ER model: Primary keys

A

Primary key attributes are marked by underlining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Super key

A

A set of attributes which can uniquely identify an entity. Example:

  1. Finding the super key of a Person(name, address, licencePlateNumber, phoneNumber)
  2. We consider that the same phoneNumber can appear in several countries so phone number is not enough to uniquely identify a person
  3. But the key: (address, phoneNumber, name) is a super key

Remember that super keys are not necessarily minimal.
Both: (address, phoneNumber) and (address, phoneNumber, name) are super keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Candidate key

A

A canditate key is a minimal super key (if you remove any attribute from the super key, then you cannot uniquely identify the entity)

One of the candidate keys for an entity is chosen as the primary key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

ER model: Relationships and relationship types

A

Relationships describe connections between entities.
Relationships between entities are grouped into relationship types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

ER model: Relationship instance and relationship set

A

An association between two or more entities is called relationship (instance). A relationship set is a collection of relationship instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

ER model: Role names

A

Role names are optional and used to characterize a relationship type.

Especially useful for recursive relationship types, i.e., an entity type is participating multiple times in a relationship type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

ER model: Descriptive attributes

A

Relationship types can also have (descriptive) attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Characteristics of relationship types: Degree

A
  • Number of participating entity types
  • Mostly: binary
  • Rarely: ternary
  • In general: n-ary or n-way (multiway relationship types)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Characteristics of relationship types:

  • Cardinality ratio
  • Cardinality limits
  • Participation constraint
A
  • Cardinality ratio (Chen notation): 1:1, 1:N, N:M
  • Cardinality limits ([min,max] notation): [min,max]
  • Participation constraint: partial or total
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Total participation constraint

A

Each entity of an entity type must participate in a relationship, i.e., it cannot exist without any participation.

  • Double line means that it has to participate
  • Example:
    • Wine has to be produced
    • A producer does not have to produce
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Partial participation constraint

A

Each entity of an entity type can participate in a relationship, i.e., it can exist without any participation.

  • Single line means partial
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

ER model: Weak entity types

A

A weak entity is an entity that cannot be uniquely identified by its attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The foreign key is typically a primary key of an entity it is related to.

  • The weak entity type’s key attributes are marked by underlining with a dashed line (partial key, discriminator).​
  • The weak entity is marked by a double line around it
  • Total participation of the weak entity type.
  • Only in combination with 1:N (N:1) (or rarely also 1:1) relationship types
  • The strong entity type is always on the ‘1’-side
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

ER model: ISA relationship type

A

Specialization and generalization is expressed by the isa relationship type (inheritance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Mapping relationship types to relations: M:N relationship type

A
  • New relation with relationship type’s attributes
  • Add attributes referencing the primary keys of the involved entity type relations
  • Primary key: set of foreign keys
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Mapping relationship types to relations: 1:N relationship type

A
  • Add information to the entity type relation of the “N”-side:
    • Add foreign key referencing the primary key of the “1”-side entity type relation
    • Add attributes of the relationship type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Mapping relationship types to relations: 1:1 relationship type

A
  • Add information to one of the involved entity type relations:
    • Add foreign key referencing the primary key of the other entity type relation
    • Add attributes of the relationship type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Mapping relationship types to relations: N-ary relationship type

A

All participating entity types are mapped according to the standard rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Mapping ERD to relations: How do we handle multi-valued attributes?

A

Create a separate relation for each multi-valued attribute.

Example:
person(PID, name)
phoneNumber: (PID –> person, number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Mapping ERD to relations: How do we handle composite attributes?

A

Include the component attributes in the relation

Ex: Person(PID, name, street, city)

74
Q

Mapping ERD to relations: How do we handle derived attributes?

A

Ignore them during mapping to relations, can be added later by using views.

75
Q

What is an update anomaly?

A

Ex: A teacher moves from office 226 to office 338.

There is an update anomaly if only some tuples are changed but not all.

76
Q

What is an insert anomaly?

A

When you cannot insert a new row without using null values.

77
Q

What is a deletion anomaly?

A

Ex: When a course is created a teacher is assigned to the course. When the course ends all information about the teacher may also disappear.

78
Q

When is a functional dependency trivial?

A

The functional dependency α → β is called trivial if β ⊆ α.

Example:
α = {A, B, C}, β = {A, C} then α → β is trivial

79
Q

Full functional dependency

A

A full functional dependency occurs when you already meet the requirements for a functional dependency and the set of attributes on the left side of the functional dependency statement cannot be reduced any further.

Examples: “{CPR, age} → name” is a functional dependency, but it is not a full functional dependency because you can remove age from the left side of the statement without impacting the dependency relationship.

80
Q

Partial functional dependency

A

The opposite of full functional dependency. If you remove an attribute from the left-hand side the FD still holds.

Eg. if both {A,B} → {C} and {A} → {C} then {C} is partially dependent on {A,B}.

81
Q

Functional dependencies: F+

A

Given a set of FDs F we can derive additional FDs

  • F+ contains all FDs that can be derived from F, i.e., all FDs logically implied by dependencies in F.
  • F+ is F’s closure
  • Inference rules (Armstrong axioms) help computing F+
82
Q

Closure of a set of attributes

A

The closure of a set of attributes (α+) with respect to a set of FDs F and a set of attributes α is:

α+ = { A | α → A ∈ F+ }

Observation:
If α → β is in F+, then β is in α+

83
Q

Armstrong axioms (3 essential rules)

A

α, β, γ, δ are subsets of attributes in ℛ

  • Reflexivity:
    • If β ⊆ α then α → β
  • Augmentation
    • If α → β then αγ → β
    • That is adding attributes in dependencies, does not change the basic dependencies
  • Transitivity
    • If α → β and β → γ then α → γ
84
Q

Armstrong axioms (3 additional rules)

A

α, β, γ, δ are subsets of attributes in ℛ

  • Union
    • If α → β and α → γ then α → βγ
  • ​Decomposition
    • ​If α → βγ then α → β and α → γ
  • ​Pseudotransitivity
    • ​If α → β and ​βγ → δ then ​α​γ → δ
85
Q

When are two sets of FDs F and G equivalent? (F ≡ G)

A

F ≡ G if:

  • Their closures are the same, i.e., F+ = G+
  • Both sets allow for deriving the same set of FDs
86
Q

Computing a canonical cover (minimal cover)

A
  1. Make sure that the right-hand side only contains singletons
    1. Ex: AB → CD turns into AB → C and AB → D
  2. Remove extraneous attributes on the left-hand side
    1. Ex: If B+ contains A then AB → C turns into B → C
  3. Remove redundant FDs
    1. For each FD
      1. Pretend the FD doesn’t exist
      2. If α+ (left-hand side) contains the right-hand side then there is another way to reach the right-hand side and the FD can be removed.
  4. ​​Combine all FDs with the same left-hand side
87
Q

What is normalization by decomposition?

A

Normalization involves decomposing a table into less redundant (and smaller) tables without losing information, and then linking the data back together by defining foreign keys in the old table referencing the primary keys of the new ones.

The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.

88
Q

What is a valid decomposition?

A

A decomposition is valid if ℛ = ℛ1 U ℛ2, i.e., no attributes in ℛ get lost

89
Q

What is a lossless decomposition?

A

A decomposition of R into R1 and R2 is lossless if R = R1 ⋈ R2

90
Q

What is a lossy decomposition

A

Means that the reconstruction leads to additional tuples.

91
Q

When is a decomposition dependency preserving?

A

All functional dependencies that hold for R must be verifiable in the new schemas R1, …, Rn

  • We can check all dependencies locally on R1, …, Rn
  • We avoid the alternative: computing the join R1 ⋈ … ⋈ Rn to test if a FD is violated

A decomposition is dependency preserving if:

  • FR ≡ (FR1 ∪ … ∪ FRn) i.e,
  • FR+ = (FR1 ∪ … ∪ FRn)+
92
Q

Prime attribute and non-prime attribute

A

Prime attribute: an attribute which is in at least one candiate key.

Non-prime attribute: an attribute which is not in any candiate key.

Example: Person(name, phoneNumber, address, age)

  • Prime attributes: phoneNumber and address
  • Non-prime attributes: name and age
93
Q

First normal form (1NF)

A

A relation ℛ is in 1NF if the domains of all its attributes are atomic (no composite or set-valued domains).

Example:

94
Q

Second normal form (2NF)

A

A relational schema is in 2NF if each non-prime attribute is fully functionally dependent on each candidate key

Cheatsheet:

  • Any table which does not have composite primary keys are automatically in 2NF
95
Q

Third normal form (3NF)

A

A relation schema R is in 3NF if at least one of the following conditions holds for each of its functional dependencies α → β:

  • α → β is a trivial functional dependency.
  • α is a superkey of R.
  • β is prime (part of a candidate key)
96
Q

Boyce Codd normal form (BCNF)

A

BCNF is a stricter form of 3NF. To be in BCNF there are only the two first criteria from 3NF.

  • α → β is a trivial functional dependency.
  • α is a superkey of R.
97
Q

SQL: SELECT

A

Retrieves information

SELECT * FROM tableName;

98
Q

SQL: FROM

A

Creates a cartesian product of the tables listed

99
Q

SQL: GROUP BY

A

GROUP BY is used in collaboration with the SELECT statement to group together those rows in a table that have identical data. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.

100
Q

SQL: LIKE

A
  • ‘abc’ LIKE ‘abc’ true (must match exactly)
  • ‘abc’ LIKE ‘a%’ true (must start with ‘a’ and can be followed by anything)
  • ‘abc’ LIKE ‘_b_’ true (three letters, the middle one is ‘b’)
  • ‘abc’ LIKE ‘c’ false

  • *Syntax:** SELECT tname FROM Trainer WHERE tname LIKE ‘%s’
  • *Returns:** every trainer whose name ends on an ‘s’
101
Q

SQL: Join

A

Syntax:
SELECT empId FROM Company LEFT OUTER JOIN Department ON Company.Id = Department.empId

The different joins:

102
Q

SQL: Dirty data

A

Ex: The “age” field is set to -20 or dateOfDeath happened before dateOfBirth

103
Q

SQL: Correlated subquery

A

A correlated subquery is a subquery (a query nested inside another query) that uses values from the outer query.

Because the subquery is evaluated once for each row processed by the outer query, it can be inefficient.

104
Q

How are transactions ended?

A
  • Commit = success = “going forward”
  • Rollback = failure = “in reverse”
105
Q

Name the four ACID properties of transactions

A
  • Atomicity
  • Consistency
  • Isolation
  • Durability
106
Q

ACID: Atomicity

A

Atomicity requires that each transaction be “all or nothing”: if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged.

  • An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes.
  • To the outside world, a committed transaction appears (by its effects on the database) to be indivisible (“atomic”), and an aborted transaction does not happen.
107
Q

ACID: Consistency

A

The consistency property ensures that any transaction will bring the database from one valid state to another.

Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors cannot result in the violation of any defined rules.

108
Q

ACID: Durability

A

The database should be durable enough to hold all its latest updates even if the system fails or restarts.

  • If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data.
  • If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.
109
Q

ACID: Isolation

A

In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system.

No transaction will affect the existence of any other transaction.

110
Q

What is a schedule?

A

A sequence of operations from one or more transactions.

In concurrent transactions, the operations are interleaved.

111
Q

Serial schedule

A

The transactions are executed non-interleaved i.e., a serial schedule is one in which no transaction starts until a running transaction has ended.

112
Q

What operations can a schedule consist of?

A
  • read(Q, q)
    • Read the value of the database item Q and store in the local variable q.
  • write(Q, q)
    • Store the local variable q in the database item Q
  • Arithmetic operations
  • commit
  • rollback
113
Q

Conflicting operations

A

We say two operations performed by different transactions on the same resource conflict if the resulting state depends on the order in which they are executed

Conflicting operations:

  • Read / Write
  • Write / Read
  • Write / Write
114
Q

The Concurrency Problem: What is meant by correctness?

A

Definition1: Concurrent execution of transactions must leave the database in a consistent state.

Definition2: Concurrent execution of transactions must be (result) equivalent to some serial execution of the transactions. (Result equivalent means final database states must be identical.)

115
Q

Definition: View Equivalent Schedules

A

Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a similar manner.

Conditions:

  1. If T reads the initial data (data which has not had a write operation performed on it yet) in S1, then it also reads the initial data in S2.
  2. If T reads the value written by J in S1, then it also reads the value written by J in S2.
  3. If T performs the final write on the data value in S1, then it also performs the final write on the data value in S2.
116
Q

Definition: View Serializability

A

A schedule is view serializable iff it is view equivalent to a serial schedule

117
Q

Definition: Conflict Equivalent Schedules

A

If two sechedules S1 and S2 perform the same conflicting operations (W/R, R/W, W/W) they are conflict equivalent

Example below (S1 and S2 are conflict equivalent):

118
Q

Definition: Conflict serializable

A

A schedule is conflict serializable if it is conflict equivalent to a serial schedule.

A conflict serializable schedule ensures that after execution the database is in a consistent state.

119
Q

What does blind writes mean?

A

The transactions T1 and T2 have write instructions without read instructions, termed
blind writes.

120
Q

How do we determine if a schedule is conflict serializable using a conflict graph?

A

A schedule is conflict serializable if its conflict graph is acyclic (no loops).

121
Q

What is a recoverable schedule?

A

For each pair of transactions T1 and T2 where T2 reads data items written by T1 , T1 must commit before T2 commits.

122
Q

What is a cascadeless schedule?

A

If T2 reads data items written by T1, the commit operation of T1 must appear before the read done by T2.

123
Q

What does a locking protocol do?

A

Ensures serializable schedules by delaying transactions that could violate serializability.

124
Q

What are the different locking operations?

A
  • Two types of locks can be set on an item Q:
    • Exclusive lock: LX Q
      • An exclusive lock gives write or read access on an item
    • Shared lock: LS Q
      • A shared lock gives read access request on an item
  • Locks can be released.
    • Unlock: UN Q
125
Q

Difference between strict and rigorous two-phase locking?

A
  • In strict two-phase locking, exclusive locks are not released until after commit time.
    • E.g. shared locks can be unlocked before commit
  • In rigorous two-phase locking, no locks are released until after commit time.
126
Q

Deadlock: Wait-die

A

When T1 requests a conflicting lock from T2:

  • If T1 is older than T2
    • Then T1 waits until the data-item is available.
  • If T1 is younger than T2
    • ​Then T1 dies.
    • T1 is restarted later with a random delay but with the same timestamp.

This scheme allows the older transaction to wait but kills the younger one.

127
Q

Deadlock: Wound-wait

A

When T1 requests a conflicting lock from T2:

  • If T1 is older than T2
    • Then T2 is rolled back.
    • T2 is restarted later with a random delay but with the same timestamp.
  • If T1 is younger than T2
    • Then T1 waits until the resource is available.

This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a younger one, the older transaction forces the younger one to abort and release the item.

128
Q

What are conflict graphs also called?

A

Precedence graphs

129
Q

Storage types

A
  • Volatile storage: does not survive system crashes
    • Main memory
    • Cache memory
  • Nonvolatile storage: survives system crashes
    • Disk, both magnetic and solid-state
    • Tape
  • Stable storage: a mythical form of storage that survives all failures
    • Approximated by maintaining multiple copies of distinct nonvolatile media with independent failure modes.
130
Q

Undo/redo algorithm

A

Following a failure, the following is done.

  • Undo all transactions for which the log has “start” entry but no “commit” entry.
  • Redo all transactions for which the log has both “start” and “commit” entries.
131
Q

No-undo/no-redo algorith

A

Algorithm:

  • No-undo → do not change the database during a transaction
  • No-redo → on commit, write changes to the database in a single atomic action
132
Q

Checkpointing

A

A checkpoint identifies the state of the database at a point in time.

  • Modifications to database are performed in memory and are not necessarily written to disk after every update.
  • Therefore, periodically, the database system must perform a checkpoint to write these updates which are held in-memory to the storage disk.
  • A checkpoint creates a known good point from which the database system can start applying changes contained in the log during recovery after an unexpected shutdown or crash.
133
Q

How is the database stored?

A
  • The database is stored in a collection of files
  • A file is a split into a set of fixed size blocks
  • A block contains a number of records
  • A record contains a number of fields

Note: Files reside in secondary memory (usually a hard disk).

134
Q

Clustered, and Non-clustered indexes

A

Clustered index (sometimes primary index):

  • Data is stored in the order of the clustered index
    • Example: Phonebook, the entire book is an index; The phone number is stored next to the name
  • Only one clustered index per table
  • Usually the primary key is what is sorted by

Non-clustered index (sometimes secondary index):

  • Like the index of a text book; The index in the back of the book points to which pages the information is stored
135
Q

Heap

A

Main idea: Records are stored in blocks in no particular order

136
Q

Sequential storage

A
  • Main idea: Records are stored in blocks in one particular order
  • Can do binary search because data is sorted
137
Q

Sparse vs Dense indexes

A

Ordered Indexing is of two types: sparse or dense

In dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself.

In sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.

138
Q

B+ tree

A

A B+ tree is a balanced binary search tree that follows a multi-level index format.

The leaf nodes of a B+ tree denote actual data pointers.
B+ tree ensures that all leaf nodes remain at the same height, thus balanced.
Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as sequential access.

139
Q

Properties of a B+ tree

A

Properties:

  • Balanced, i.e., all path from root to leaf has the same length
    • Nice because predictable performance
  • Bushy, each node has between ceiling(d/2) and d children
    • Except the root node that has between 2 and d children
    • Leaf nodes have between ceiling(d/2) and d - 1 children
    • Nice because each level adds extra I/O
  • Ordered
    • The key values are sorted in each node, i.e, kx < ky , if x < y
140
Q

Trade-off: improving queries vs improving modifications

A

If queries are sped up it is usually at the expense of slowing down modifications, and vice-versa.

141
Q

The four selection strategies

A
  • Linear search
    • Expensive, but always applicable
  • Binary search
    • Applicable only when the file is appropriately ordered
  • Hash table search
    • Single record retrieval; does not work for range queries
    • Retrieval of multiple records
  • Secondary index search
    • Implemented with multiple pointers, each to a single record.
142
Q

How does a nested-loop join work?

A

If we: A ⋈ B

Then for each tuple of A, we scan the entirety of B.

  • Can be used for all join types
  • Can be quite expensive

Reason it is called ‘nested’:
for each (tuple of A)
for each (tuple of B)
compare common attributes of A and B

143
Q

Sort-merge join

A
  • The relations R & S have to be sorted on the join attributes (the foreign key between the two relations)
  • Then both relations are merged, starting with a pointer at the top of each relation
    • When the two foreign keys at the pointers are equal the joined product is added to the new relation, and R’s pointer is moved to the next entry
    • If they are still equal another join product is added. If they aren’t equal S’s pointer is moved to the next entry.
144
Q

Hash join

A

A hash join is performed by hashing one data set into memory based on join columns and reading the other one and probing the hash table for matches.

The hash join is very low cost when the hash table can be held entirely in memory, with the total cost amounting to very little more than the cost of reading the data sets.
The cost rises if the hash table has to be spilled to disk in a one-pass sort, and rises considerably for a multipass sort.

145
Q

The three rules of heuristic query optimization

A
  • Push down selections
  • Push down projections
  • Avoid cross products
146
Q

Arity of a relation

A

The arity of a relation is its number of columns (attributes)

  • The arity of Person(name, cpr, age, height) is 4
  • The arity of Job(name, title, salary) is 3
  • Then the arity of Person x Job is 4 + 3 = 7
147
Q

Intersection: R ∩ S broken down

A

The intersection (R ∩ S) of two relations R and S consists of a set of tuples that occur in both relations.

Intersection can be expressed as dffierence:

(R ∩ S = R — (R — S))

148
Q

Natural join: ⋈ broken down

A

The natural join can be expressed as a cartesian product followed by selections and projections

R⋈S = πA1,…,Am,R.B1,…R.Bk,C1,…,Cn(σR.B1 = S.B1Λ…ΛR.Bk = S.Bk(R X S))

149
Q

Left/Right semi join: ⋉ / ⋊ broken down

A

L ⋉ R = πL(L⋈R)

L ⋊ R = R ⋉ L = π<span>R</span>(L⋈R)

150
Q

Division: ÷ broken down

A

We don’t need to know what makes up the division operator.

All we need to know at the exam is that an alternative way to describe it exists.

151
Q

Commutative join

A

A binary operation is commutative if changing the order of the operands does not change the result.

The following joins at not commutative:

  • Natural join
  • Left outer join
  • Right outer join
  • Semi join
152
Q

What kind of languages are:

  1. Tuple relational calculus?
  2. Domain relational calculus?
A

They are:

  1. Declarative language (also called non-prodecural)
  2. Declarative language (also called non-prodecural)
153
Q

ER model: which objects can an attribute be associated with?

A
  • A relationship type
  • An entity type
  • A composite attribute
  • A weak entity type
154
Q

ER model: An identifying relationship

A

An identifying relationship means that the child table cannot be uniquely identified without the parent.

Example:

  • Account (AccountID, AccountNum, AccountTypeID)
  • PersonAccount (AccountID, PersonID, Balance)
  • Person(PersonID, Name)

The Account to PersonAccount relationship and the Person to PersonAccount relationship are identifying because the child row (PersonAccount) cannot exist without having been defined in the parent (Account or Person). In other words: there is no PersonAccount when there is no Person or when there is no Account.

155
Q

A problem with a normalized database schema is…?

A

… that the number of relations becomes larger.

156
Q

Two benefits of normalizing a relational design is…

A
  1. Modification anomalies are avoided
  2. The information redundancy in the schema is minimized
157
Q

ACID: How can atomicity be ensured?

A

Logging

158
Q

ACID: How can durability be ensured?

A

Logging

159
Q

ACID: How can isolation be ensured?

A

Locking

160
Q

Two Phase Locking (2PL)

A

2PL is a concurrency control method that guarantees serializability. The protocol utilizes locks, applied by a transaction to data, which may block other transactions from accessing the same data during the transaction’s life.

By the 2PL protocol locks are applied and removed in two phases:

  1. Expanding phase: locks are acquired and no locks are released.
  2. Shrinking phase: locks are released and no locks are acquired.
161
Q

Recovery is typically used to implement which ACID properties?

A

Durability and atomicity

162
Q

A B+ tree may be used to index which SQL data types?

A
  1. NUMBER
  2. TIME
  3. VARCHAR
  4. DATETIME
  5. DATE
  6. CHAR
163
Q

A B+ tree may speed up the execution of which SELECT statements?

A
  • A point query, e.g., SELECT name FROM employee WHERE id = 45
  • A range query, e.g., SELECT name FROM employee WHERE id BETWEEN 120 AND 130
164
Q

Adding an index to a table has in general which consequences?

A

Speeding up queries and slowing down modification

165
Q

A hash index may speed up the execution of which SELECT statements?

A

A point query, e.g., SELECT name FROM employee WHERE id = 45

166
Q

Search key

A

Any attribute (combination of attributes) of a table

167
Q

Difference between a query plan and a query execution plan

A
  • A query plan describes the logic operations whose executions will deliver the correct result.
  • A query execution plan describes the algorithms whose executions will deliver the correct result.
168
Q

Disjunctive and conjunctive queries

A

A disjunctive query is simply two or more predicates joined by ‘OR’, whereas a conjunctive query would be two or more predicates joined by ‘AND’.

169
Q

ER model: What does an arrow mean?

A

An arrow is at the ‘1’ side of a 1:N or N:1 relationship and at both sides at a 1:1 relationship

Example:

A grandma takes care of exactly one cat

170
Q

Chen notation

A

The cardinality ratio is attached to the entity which it governs.

N-amount of people can have only 1 location as their birthplace

171
Q

Min/Max notation

A

The cardinality limit is the minimum and maximum values allowed.

  • How many times can a person participate in the relationship “Birthplace”?
    • One time.
  • How many times can a location be the birthplace of people?
    • Zero to N times.
172
Q

SQL: rename the result columns

A

Syntax:

SELECT person.Name AS nameOfPerson, person.money AS money FROM person

173
Q

SQL: BETWEEN

A

There is a requirement that the argument to the left of AND be less than or equal to the argument on the right

Syntax:

a BETWEEN x AND y

is equivalent to:

a >= x AND a <= y

174
Q

SQL: HAVING

A

Syntax:

SELECT city, max(temp)
FROM weather
GROUP BY city
HAVING max(temp) < 40;

175
Q

Dense index

A

A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a recordin the sorted data file. In clustered indices with duplicate keys, the dense index points to the first record with that key.

176
Q

Sparse index

A

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the blockin the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.

177
Q

Conflict graphs are used to…

A

decide if a schedule is serializable.

178
Q

In a non-clustered index:

A
  • The physical order of the rows is not the same as the index order.
  • The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses.
179
Q

In a clustered index:

A
  • Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order.
  • Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.
180
Q

Construct B+ tree

A

When you get a set of key values and a dimension (number of pointers per block).

  • number of keys on non-leafs to keep on the old leaf when splitting: celing(d / 2) - 1
  • number of keys to keep on the old leaf when splitting: celing( (d-1) / 2)
181
Q
A