Main Flashcards

(181 cards)

1
Q

What is the Entity-Relationship Model?

A

The entity-relationship (ER) data model uses a collection of basic objects, called entities, and relationships among these objects.

An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the languages of Databases?

A
  1. Data-definition language (Specify database schema)
  2. Data-manipulation language (Database queries)

In practice: both are in the same database language, e.g. SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Database System?

A

A Database Management System and a Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the 3-layer schema architecture?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a conceptual schema

A

The conceptual schema defines the ontology (the formal naming and definition of the types, properties, and interrelationships of the entities) of the concepts as the users think of them and talk about them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an external schema?

A

The external schema defines the view of the data presented to the application programs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an internal schema?

A

Describes the internal formats of the data stored in the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Physical data independence?

A

Changes regarding file structure and access paths (physical layer) have no influence on the conceptual schema (logical layer).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some examples of changes on the physical layer?

A
  • New hard disk added
  • New processor added
  • Files are split into multiple files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Logical data independence?

A

Changes on the logical layer have no infuence on external schemas and applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of changes on the logical layer?

A
  • Adding another attribute to the conceptual schema
  • Changing the name of an attribute on the conceptual schema
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Mini-world?

A

Some part of the real world about which information is stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are data/information?

A

Known facts about the mini-world that can be recorded and have an implicit meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a database (DB)?

A

A collection of related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Database Management Systems (DBMS)?

A

A software package to facilitate the creation and maintenance of a database

Examples: MySQL, PostgreSQL, Microsoft SQL Server, Oracle, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a database instance?

A

The content of a DB at a particular time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name some of the problems that can occur when managing data without a DBS

A
  • Data can get lost.
  • Query evaluation might be very slow, especially for large amounts of data.
  • Access rights are hard to control.
  • Extending the data with additional attributes is difficult.
  • Inconsistent data might be stored.
  • Concurrent access to the data cannot be handled efficiently.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Definition of a relation

A

Assume D1, D2,…,Dn are domains

R ⊆ D1 x … x Dn

  • Example: telephoneBook ⊆ string x string x integer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Relational schema

A

Defines the structure of stored data

  • Is denoted as: sch(R) or R
  • Notation: R(A1 : D1, A2 : D2, …) with Ai denoting attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In what order are tuples in a relation stored?

A

Tuples in a relation are unordered.

These relations have the same information content:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In what order are attributes in a touple/relation stored?

A

In accordance with the mathematical definition of tuples, attributes in a tuple/relation are ordered.

These relations have different information content:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Values in a tuple

A
  • Values in a tuple are atomic (indivisible)
  • A value cannot be a structure, a record, a collection type or a relation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Null values

A

A special null value is used to represent values that are unknown or inapplicable to certain tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When are duplicates not allowed?

A

A relation adheres to the mathematical definition of a set.

No two tuples in a relation may have identical values for all attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does union-compatible mean?
Two relations are union-compatible if: * They have the same number of attributes (they have the same arity) * The domain of each attribute in column order is the same in both relations (the i'th attribute of both R and S have the same domain)
26
Relational algebra operator: π(pi)
**Projection:** The result is a relation of n columns obtained by removing the columns that are not specified.
27
Relational algebra operator: σ(sigma)
**Selection:** Selects/filters rows of a table according to the selection predicate * Notation: σF * Selection predicate F consists of: * ​Logic operators: \/(or), /\(and), ¬(not) * Arithmetic comparison operators: \<, ≤, =, \>, ≥, ≠ * Attribute names of the argument relations or constants as operands
28
Relational algebra operator: ρ(rho)
Rename: * Renaming relation R to S: ρS(R) * Renaming attribute B to A: ρA←B(R)
29
Relational algebra operator: X
Cartesian product (aka. cross product) ## Footnote The cartesian product (R x S) between relations R and S consists of all possible combinations (|R| \* |S| pairs) of tuples from both relations. Attributes in the result are referenced as R.A or S.A to resolve ambiguity.
30
Relational algebra operator: U
**Union:** The result of a union (R U S) between two relations R and S contains all tuples from both relations without duplicates. ## Footnote R and S have to be union-compatible.
31
Relational algebra operator: —
**Difference:** The difference (R — S or R \ S) of two relations R and S removes all tuples from the first relation that are also contained in the second relation. ## Footnote R and S have to be union-compatible.
32
Relational algebra operator: ∩
**Intersection:** The intersection (R ∩ S) of two relations R and S consists of a set of tuples that occur in both relations. ## Footnote R and S have to be union-compatible.
33
Relational algebra operator: ⋈
**Natural join:** The natural join combines two relations via **common attributes** (same names and domains) by combining only tuples with the same values for common attributes.
34
Relational algebra operator: ⟕
**Left outer join:** keep dangling tuples in the left operand relation
35
Relational algebra operator: ⟖
**Right outer join:** keep danging tuples in the right operand relation
36
Relational algebra operator: ⟗
**(Full) outer join:** keep dangling tuples of both operand relations
37
Relational algebra operator: ⋉ and ⋊
**Left semi join:** ⋉ **Right semi join:** ⋊ Find all tuples in a relation for which there are **matching tuples** in the other relation.
38
Relational algebra operator: γ (gamma)
**Grouping:** tuples with the same attribute values (for a specific list of attributes) are grouped. An aggregate function is applied to each group (computing one value for each group). Typical aggregate functions: * count - number of tuples in a group * sum - sum of attribute values in a group * min, max, avg - minimum, maximum and average of attribute values in a group
39
Relational algebra operator: ÷
Division ## Footnote Example: Find all wines that have been reviewed by both Parker and Clarke
40
θ-join (Theta join)
To combine tuples from two relations where the combination condition is not simply the equality of shared attributes then it is convenient to have a more general form of join operator, which is the θ-join. You can use any binary operator in the set {\<, ≤, =, \>, ≥} in between the two variables, Example: price \< myMoney **or** price \< 800
41
When is a variable free in domain relational calculus?
A variable is free if: * It is not quantified by a universal quantifier * It is not quantified by an extential quantifier
42
Universal quantifier
A universal quantification is a type of quantifier, a logical constant, which is interpreted as (∀) or (⇒).
43
Existential quantifier
An existential quantification is a type of quantifier, a logical constant, which is interpreted as "there exists" (∃)
44
Universal quantifier: ⇒
P ⇒ Q means "P implies Q", i.e., "if P is true, then Q must be true"
45
Domain relational calculus: When is a query safe?
When it returns only a finite number of tuples
46
What is the expressiveness of relational algebra, tuple relational calculus, and domain relational calculus?
All three languages are equal regarding expressiveness (relationally complete)
47
How is an attribute marked as a primary key?
By having a line under it
48
ER model: Entities and entity types
Entities are objects of the real world about which we want to store information Entities are grouped into entity types (represented as a square).
49
ER model: Entity set
An entity set is a set of entities of the same type (e.g., all persons having an account at a bank).
50
ER model: Attributes
Attributes model characteristics of entities or relationships. * All entities of an entity type have the same characteristics * Attributes are declared for entity types * Attributes have a domain or value set
51
ER model: Single-valued vs multi-valued attributes
Ex: A person might have multiple phone numbers (or a single one)
52
ER model: Simple attributes vs composite attributes
Ex: An address can be modeled as a string or composed of street and city
53
ER model: Stored attributes vs derived attributes
Ex: Birthday and age. ## Footnote Age is computed several times using birthday.
54
Functional dependency
* A functional dependency is a constraint between two sets of attributes in a relation from a database. * In other words, functional dependency is a constraint that describes the relationship between attributes in a relation. * Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X → Y) if, and only if, each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency X → Y.
55
ER model: Primary keys
Primary key attributes are marked by underlining.
56
Super key
A set of attributes which can uniquely identify an entity. Example: 1. Finding the super key of a *Person(name, address, licencePlateNumber, phoneNumber)* 2. We consider that the same phoneNumber can appear in several countries so phone number is not enough to uniquely identify a person 3. But the key: *(address, phoneNumber, name)* is a super key Remember that super keys are not necessarily minimal. Both: *(address, phoneNumber)* and *(address, phoneNumber, name)* are super keys
57
Candidate key
A canditate key is a minimal super key (if you remove any attribute from the super key, then you cannot uniquely identify the entity) One of the candidate keys for an entity is chosen as the primary key.
58
ER model: Relationships and relationship types
Relationships describe connections between entities. Relationships between entities are grouped into relationship types.
59
ER model: Relationship instance and relationship set
An association between two or more entities is called relationship (instance). A relationship set is a collection of relationship instances.
60
ER model: Role names
Role names are optional and used to characterize a relationship type. ## Footnote Especially useful for recursive relationship types, i.e., an entity type is participating multiple times in a relationship type.
61
ER model: Descriptive attributes
Relationship types can also have (descriptive) attributes.
62
Characteristics of relationship types: Degree
* Number of participating entity types * Mostly: binary * Rarely: ternary * In general: n-ary or n-way (multiway relationship types)
63
Characteristics of relationship types: * Cardinality ratio * Cardinality limits * Participation constraint
* Cardinality ratio (Chen notation): 1:1, 1:N, N:M * Cardinality limits ([min,max] notation): [min,max] * Participation constraint: partial or total
64
Total participation constraint
Each entity of an entity type must participate in a relationship, i.e., it cannot exist without any participation. * Double line means that it has to participate * Example: * Wine has to be produced * A producer does not have to produce
65
Partial participation constraint
Each entity of an entity type ***can*** participate in a relationship, i.e., it can exist without any participation. * Single line means partial
66
ER model: Weak entity types
A weak entity is an entity that cannot be uniquely identified by its attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The foreign key is typically a primary key of an entity it is related to. * The weak entity type's key attributes are marked by underlining with a dashed line (partial key, discriminator).​ * The weak entity is marked by a double line around it * Total participation of the weak entity type. * Only in combination with 1:N (N:1) (or rarely also 1:1) relationship types * The strong entity type is always on the '1'-side
67
ER model: ISA relationship type
Specialization and generalization is expressed by the isa relationship type (inheritance).
68
Mapping relationship types to relations: M:N relationship type
* New relation with relationship type's attributes * Add attributes referencing the primary keys of the involved entity type relations * Primary key: set of foreign keys
69
Mapping relationship types to relations: 1:N relationship type
* Add information to the entity type relation of the "N"-side: * Add foreign key referencing the primary key of the "1"-side entity type relation * Add attributes of the relationship type
70
Mapping relationship types to relations: 1:1 relationship type
* Add information to one of the involved entity type relations: * Add foreign key referencing the primary key of the other entity type relation * Add attributes of the relationship type
71
Mapping relationship types to relations: N-ary relationship type
All participating entity types are mapped according to the standard rules.
72
Mapping ERD to relations: How do we handle multi-valued attributes?
Create a separate relation for each multi-valued attribute. ## Footnote **Example:** person(_PID_, name) phoneNumber: (_PID --\> person, number)_
73
Mapping ERD to relations: How do we handle composite attributes?
Include the component attributes in the relation Ex: Person(_PID_, name, street, city)
74
Mapping ERD to relations: How do we handle derived attributes?
Ignore them during mapping to relations, can be added later by using views.
75
What is an update anomaly?
Ex: A teacher moves from office 226 to office 338. There is an update anomaly if only some tuples are changed but not all.
76
What is an insert anomaly?
When you cannot insert a new row without using null values.
77
What is a deletion anomaly?
Ex: When a course is created a teacher is assigned to the course. When the course ends all information about the teacher may also disappear.
78
When is a functional dependency trivial?
The functional dependency α → β is called trivial if β ⊆ α. ## Footnote **Example:** α = {A, B, C}, β = {A, C} then α → β is trivial
79
Full functional dependency
A full functional dependency occurs when you already meet the requirements for a functional dependency and the set of attributes on the left side of the functional dependency statement cannot be reduced any further. ## Footnote Examples: “{CPR, age} → name” is a functional dependency, but it is not a full functional dependency because you can remove age from the left side of the statement without impacting the dependency relationship.
80
Partial functional dependency
The opposite of full functional dependency. If you remove an attribute from the left-hand side the FD still holds. ## Footnote Eg. if both {A,B} → {C} and {A} → {C} then {C} is **partially dependent** on {A,B}.
81
Functional dependencies: F+
Given a set of FDs F we can derive additional FDs * F+ contains all FDs that can be derived from F, i.e., all FDs **logically implied** by dependencies in F. * F+ is F's **closure** * Inference rules (Armstrong axioms) help computing F+
82
Closure of a set of attributes
The closure of a set of attributes (α+) with respect to a set of FDs F and a set of attributes α is: α+ = { A | α → A ∈ F+ } Observation: If α → β is in F+, then β is in α+
83
Armstrong axioms (3 essential rules)
α, β, γ, δ are subsets of attributes in ℛ * Reflexivity: * If β ⊆ α then α → β * Augmentation * If α → β then αγ → β * That is adding attributes in dependencies, does not change the basic dependencies * Transitivity * If α → β and β → γ then α → γ
84
Armstrong axioms (3 additional rules)
α, β, γ, δ are subsets of attributes in ℛ * Union * If α → β and α → γ then α → βγ * ​Decomposition * ​If α → βγ then α → β and α → γ * ​Pseudotransitivity * ​If α → β and ​βγ → δ then ​α​γ → δ
85
When are two sets of FDs F and G equivalent? (F ≡ G)
F ≡ G if: * Their closures are the same, i.e., F+ = G+ * Both sets allow for deriving the same set of FDs
86
Computing a canonical cover (minimal cover)
1. Make sure that the right-hand side only contains singletons 1. **Ex:** AB → CD turns into AB → C and AB → D 2. Remove extraneous attributes on the left-hand side 1. **Ex:** If B+ contains A then AB → C turns into B → C 3. Remove redundant FDs 1. For each FD 1. Pretend the FD doesn't exist 2. If α+ (left-hand side) contains the right-hand side then there is another way to reach the right-hand side and the FD can be removed. 4. ​​Combine all FDs with the same left-hand side
87
What is normalization by decomposition?
Normalization involves decomposing a table into less redundant (and smaller) tables without losing information, and then linking the data back together by defining **foreign keys** in the old table referencing the **primary keys** of the new ones. ## Footnote The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just **one table** and then propagated through the rest of the database using the defined **foreign keys**.
88
What is a valid decomposition?
A decomposition is valid if ℛ = ℛ1 U ℛ2, i.e., no attributes in ℛ get lost
89
What is a lossless decomposition?
A decomposition of R into R1 and R2 is lossless if R = R1 ⋈ R2
90
What is a lossy decomposition
Means that the reconstruction leads to additional tuples.
91
When is a decomposition dependency preserving?
All functional dependencies that hold for R must be verifiable in the new schemas R1, ..., Rn * We can check all dependencies locally on R1, ..., Rn * We avoid the alternative: computing the join R1 ⋈ ... ⋈ Rn to test if a FD is violated A decomposition is dependency preserving if: * FR ≡ (FR1 ∪ ... ∪ FRn) i.e, * FR+ = (FR1 ∪ ... ∪ FRn)+
92
Prime attribute and non-prime attribute
**Prime attribute:** an attribute which is in at least one candiate key. **Non-prime attribute:** an attribute which is not in **any** candiate key. Example: *Person(name, _phoneNumber, address_, age)* * Prime attributes: *phoneNumber* and *address* * Non-prime attributes: *name* and *age*
93
First normal form (1NF)
A relation ℛ is in 1NF if the domains of all its attributes are atomic (no composite or set-valued domains). ## Footnote Example:
94
Second normal form (2NF)
A relational schema is in 2NF if each non-prime attribute is fully functionally dependent on each candidate key Cheatsheet: * Any table which does not have composite primary keys are automatically in 2NF
95
Third normal form (3NF)
A relation schema R is in 3NF if at least one of the following conditions holds for each of its functional dependencies α → β: * α → β is a trivial functional dependency. * α is a superkey of R. * β is prime (part of a candidate key)
96
Boyce Codd normal form (BCNF)
BCNF is a stricter form of 3NF. To be in BCNF there are only the two first criteria from 3NF. * α → β is a trivial functional dependency. * α is a superkey of R.
97
SQL: SELECT
Retrieves information SELECT \* FROM tableName;
98
SQL: FROM
Creates a cartesian product of the tables listed
99
SQL: GROUP BY
GROUP BY is used in collaboration with the SELECT statement to group together those rows in a table that have identical data. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.
100
SQL: LIKE
* 'abc' LIKE 'abc' true (must match exactly) * 'abc' LIKE 'a%' true (must start with 'a' and can be followed by anything) * 'abc' LIKE '\_b\_' true (three letters, the middle one is 'b') * 'abc' LIKE 'c' false ## Footnote * *Syntax:** SELECT tname FROM Trainer WHERE tname LIKE '%s' * *Returns:** every trainer whose name ends on an 's'
101
SQL: Join
**Syntax:** SELECT empId FROM Company LEFT OUTER JOIN Department ON Company.Id = Department.empId ## Footnote **The different joins:**
102
SQL: Dirty data
Ex: The "age" field is set to -20 or dateOfDeath happened before dateOfBirth
103
SQL: Correlated subquery
A correlated subquery is a subquery (a query nested inside another query) that uses values from the outer query. ## Footnote Because the subquery is evaluated once for each row processed by the outer query, it can be inefficient.
104
How are transactions ended?
* Commit = success = "going forward" * Rollback = failure = "in reverse"
105
Name the four ACID properties of transactions
* Atomicity * Consistency * Isolation * Durability
106
ACID: Atomicity
Atomicity requires that each transaction be "all or nothing": if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged. * An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. * To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
107
ACID: Consistency
The consistency property ensures that any transaction will bring the database from one valid state to another. ## Footnote Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors cannot result in the violation of any defined rules.
108
ACID: Durability
The database should be durable enough to hold all its latest updates even if the system fails or restarts. * If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data. * If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.
109
ACID: Isolation
In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. ## Footnote No transaction will affect the existence of any other transaction.
110
What is a schedule?
A sequence of operations from one or more transactions. In concurrent transactions, the operations are interleaved.
111
Serial schedule
The transactions are executed non-interleaved i.e., a serial schedule is one in which no transaction starts until a running transaction has ended.
112
What operations can a schedule consist of?
* read(Q, q) * Read the value of the database item Q and store in the local variable q. * write(Q, q) * Store the local variable q in the database item Q * Arithmetic operations * commit * rollback
113
Conflicting operations
We say two operations performed by different transactions on the same resource conflict if the resulting state depends on the order in which they are executed **Conflicting operations:** * Read / Write * Write / Read * Write / Write
114
The Concurrency Problem: What is meant by correctness?
**Definition1**: Concurrent execution of transactions must leave the database in a consistent state. **Definition2**: Concurrent execution of transactions must be (result) equivalent to some serial execution of the transactions. (*Result equivalent means final database states must be identical.*)
115
Definition: View Equivalent Schedules
Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a similar manner. **Conditions:** 1. If T reads the initial data (data which has not had a write operation performed on it yet) in S1, then it also reads the initial data in S2. 2. If T reads the value written by J in S1, then it also reads the value written by J in S2. 3. If T performs the final write on the data value in S1, then it also performs the final write on the data value in S2.
116
Definition: View Serializability
A schedule is view serializable iff it is view equivalent to a serial schedule
117
Definition: Conflict Equivalent Schedules
If two sechedules S1 and S2 perform the same conflicting operations (W/R, R/W, W/W) they are conflict equivalent ## Footnote **Example below (S1 and S2 are conflict equivalent):**
118
Definition: Conflict serializable
A schedule is conflict serializable if it is conflict equivalent to a serial schedule. ## Footnote A conflict serializable schedule ensures that after execution the database is in a consistent state.
119
What does blind writes mean?
The transactions T1 and T2 have write instructions without read instructions, termed **blind writes**.
120
How do we determine if a schedule is conflict serializable using a conflict graph?
A schedule is conflict serializable if its conflict graph is acyclic (no loops).
121
What is a recoverable schedule?
For each pair of transactions T1 and T2 where T2 reads data items written by T1 , T1 must commit before T2 commits.
122
What is a cascadeless schedule?
If T2 reads data items written by T1, the commit operation of T1 must appear before the read done by T2.
123
What does a locking protocol do?
Ensures serializable schedules by delaying transactions that could violate serializability.
124
What are the different locking operations?
* Two types of locks can be set on an item Q: * Exclusive lock: LX Q * An exclusive lock gives write or read access on an item * Shared lock: LS Q * A shared lock gives read access request on an item * Locks can be released. * Unlock: UN Q
125
Difference between **strict** and **rigorous** two-phase locking?
* In **strict** two-phase locking, exclusive locks are not released until after commit time. * E.g. shared locks can be unlocked before commit * In **rigorous** two-phase locking, no locks are released until after commit time.
126
Deadlock: Wait-die
When T1 requests a conflicting lock from T2: * If T1 is older than T2 * Then T1 waits until the data-item is available. * If T1 is younger than T2 * ​Then T1 dies. * T1 is restarted later with a random delay but with the same timestamp. This scheme allows the older transaction to wait but kills the younger one.
127
Deadlock: Wound-wait
When T1 requests a conflicting lock from T2: * If T1 is older than T2 * Then T2 is rolled back. * T2 is restarted later with a random delay but with the same timestamp. * If T1 is younger than T2 * Then T1 waits until the resource is available. This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a younger one, the older transaction forces the younger one to abort and release the item.
128
What are conflict graphs also called?
Precedence graphs
129
Storage types
* **Volatile** storage: does not survive system crashes * Main memory * Cache memory * **Nonvolatile** storage: survives system crashes * Disk, both magnetic and solid-state * Tape * **Stable** storage: a mythical form of storage that survives all failures * Approximated by maintaining multiple copies of distinct nonvolatile media with independent failure modes.
130
Undo/redo algorithm
Following a failure, the following is done. * Undo all transactions for which the log has “start” entry but no “commit” entry. * Redo all transactions for which the log has both “start” and “commit” entries.
131
No-undo/no-redo algorith
Algorithm: * No-undo → do not change the database during a transaction * No-redo → on commit, write changes to the database in a single atomic action
132
Checkpointing
A checkpoint identifies the state of the database at a point in time. * Modifications to database are performed in memory and are not necessarily written to disk after every update. * Therefore, periodically, the database system must perform a checkpoint to write these updates which are held in-memory to the storage disk. * A checkpoint creates a known good point from which the database system can start applying changes contained in the log during recovery after an unexpected shutdown or crash.
133
How is the database stored?
* The database is stored in a collection of **files** * A file is a split into a set of fixed size **blocks** * A block contains a number of **records** * A record contains a number of **fields** ## Footnote Note: Files reside in secondary memory (usually a hard disk).
134
Clustered, and Non-clustered indexes
**Clustered index (sometimes primary index):** * Data is stored in the order of the clustered index * Example: Phonebook, the entire book is an index; The phone number is stored next to the name * Only one clustered index per table * Usually the primary key is what is sorted by **Non-clustered index (sometimes secondary index):** * Like the index of a text book; The index in the back of the book points to which pages the information is stored
135
Heap
Main idea: Records are stored in blocks in no particular order
136
Sequential storage
* Main idea: Records are stored in blocks in one particular order * Can do binary search because data is sorted
137
Sparse vs Dense indexes
Ordered Indexing is of two types: sparse or dense In **dense index**, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. In **sparse index**, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.
138
B+ tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. ## Footnote The leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as sequential access.
139
Properties of a B+ tree
Properties: * **Balanced**, i.e., all path from root to leaf has the same length * Nice because predictable performance * **Bushy**, each node has between ceiling(d/2) and d children * Except the root node that has between 2 and d children * Leaf nodes have between ceiling(d/2) and d - 1 children * Nice because each level adds extra I/O * **Ordered** * The key values are sorted in each node, i.e, kx \< ky , if x \< y
140
Trade-off: improving queries vs improving modifications
If queries are sped up it is usually at the expense of slowing down modifications, and vice-versa.
141
The four selection strategies
* Linear search * Expensive, but always applicable * Binary search * Applicable only when the file is appropriately ordered * Hash table search * Single record retrieval; does not work for range queries * Retrieval of multiple records * Secondary index search * Implemented with multiple pointers, each to a single record.
142
How does a nested-loop join work?
If we: A ⋈ B Then for each tuple of A, we scan the entirety of B. * Can be used for all join types * Can be quite expensive **Reason it is called 'nested':** for each (tuple of A) for each (tuple of B) compare common attributes of A and B
143
Sort-merge join
* The relations R & S have to be sorted on the join attributes (the foreign key between the two relations) * Then both relations are merged, starting with a pointer at the top of each relation * When the two foreign keys at the pointers are equal the joined product is added to the new relation, and R's pointer is moved to the next entry * If they are still equal another join product is added. If they aren't equal S's pointer is moved to the next entry.
144
Hash join
A hash join is performed by hashing one data set into memory based on join columns and reading the other one and probing the hash table for matches. ## Footnote The hash join is very low cost when the hash table can be held entirely in memory, with the total cost amounting to very little more than the cost of reading the data sets. The cost rises if the hash table has to be spilled to disk in a one-pass sort, and rises considerably for a multipass sort.
145
The three rules of heuristic query optimization
* Push down selections * Push down projections * Avoid cross products
146
Arity of a relation
The arity of a relation is its number of columns (attributes) * The arity of *Person(name, cpr, age, height)* is 4 * The arity of *Job(name, title, salary)* is 3 * Then the arity of Person x Job is 4 + 3 = 7
147
Intersection: R ∩ S broken down
The intersection (R ∩ S) of two relations R and S consists of a set of tuples that occur in both relations. Intersection can be expressed as dffierence: (R ∩ S = R — (R — S))
148
Natural join: ⋈ broken down
The natural join can be expressed as a cartesian product followed by selections and projections R⋈S = πA1,...,Am,R.B1,...R.Bk,C1,...,Cn(σR.B1 = S.B1Λ...ΛR.Bk = S.Bk(R X S))
149
Left/Right semi join: ⋉ / ⋊ broken down
L ⋉ R = πL(L⋈R) L ⋊ R = R ⋉ L = πR(L⋈R)
150
Division: ÷ broken down
We don't need to know what makes up the division operator. All we need to know at the exam is that an alternative way to describe it exists.
151
Commutative join
A binary operation is commutative if changing the order of the operands does not change the result. The following joins at **not** commutative: * Natural join * Left outer join * Right outer join * Semi join
152
What kind of languages are: 1. Tuple relational calculus? 2. Domain relational calculus?
They are: 1. Declarative language (also called non-prodecural) 2. Declarative language (also called non-prodecural)
153
ER model: which objects can an attribute be associated with?
* A relationship type * An entity type * A composite attribute * A weak entity type
154
ER model: An identifying relationship
An identifying relationship means that the child table cannot be uniquely identified without the parent. Example: * Account (AccountID, AccountNum, AccountTypeID) * PersonAccount (AccountID, PersonID, Balance) * Person(PersonID, Name) The *Account* to *PersonAccount* relationship and the *Person* to *PersonAccount* relationship are identifying because the child row (*PersonAccount*) cannot exist without having been defined in the parent (*Account* or *Person*). In other words: there is no *PersonAccount* when there is no *Person* or when there is no *Account*.
155
A problem with a normalized database schema is...?
... that the number of relations becomes larger.
156
Two benefits of normalizing a relational design is...
1. Modification anomalies are avoided 2. The information redundancy in the schema is minimized
157
ACID: How can atomicity be ensured?
Logging
158
ACID: How can durability be ensured?
Logging
159
ACID: How can isolation be ensured?
Locking
160
Two Phase Locking (2PL)
2PL is a concurrency control method that guarantees serializability. The protocol utilizes locks, applied by a transaction to data, which may block other transactions from accessing the same data during the transaction's life. By the 2PL protocol locks are applied and removed in two phases: 1. Expanding phase: locks are acquired and no locks are released. 2. Shrinking phase: locks are released and no locks are acquired.
161
Recovery is typically used to implement which ACID properties?
Durability and atomicity
162
A B+ tree may be used to index which SQL data types?
1. NUMBER 2. TIME 3. VARCHAR 4. DATETIME 5. DATE 6. CHAR
163
A B+ tree may speed up the execution of which SELECT statements?
* A point query, e.g., SELECT name FROM employee WHERE id = 45 * A range query, e.g., SELECT name FROM employee WHERE id BETWEEN 120 AND 130
164
Adding an index to a table has in general which consequences?
Speeding up queries and slowing down modification
165
A hash index may speed up the execution of which SELECT statements?
A point query, e.g., SELECT name FROM employee WHERE id = 45
166
Search key
Any attribute (combination of attributes) of a table
167
Difference between a query plan and a query execution plan
* A query plan describes the logic operations whose executions will deliver the correct result. * A query execution plan describes the algorithms whose executions will deliver the correct result.
168
Disjunctive and conjunctive queries
A **disjunctive** query is simply two or more predicates joined by 'OR', whereas a **conjunctive** query would be two or more predicates joined by 'AND'.
169
ER model: What does an arrow mean?
An arrow is at the '1' side of a 1:N or N:1 relationship and at both sides at a 1:1 relationship ## Footnote **Example:** A grandma takes care of exactly one cat
170
Chen notation
The cardinality ratio is attached to the entity which it governs. N-amount of people can have only 1 location as their birthplace
171
Min/Max notation
The cardinality limit is the minimum and maximum values allowed. * How many times can a person participate in the relationship "Birthplace"? * One time. * How many times can a location be the birthplace of people? * Zero to N times.
172
SQL: rename the result columns
**Syntax:** ## Footnote SELECT person.Name AS nameOfPerson, person.money AS money FROM person
173
SQL: BETWEEN
There is a requirement that the argument to the left of AND be less than or equal to the argument on the right ## Footnote **Syntax:** a BETWEEN x AND y **is equivalent to:** a \>= x AND a \<= y
174
SQL: HAVING
**Syntax:** SELECT city, max(temp) FROM weather GROUP BY city HAVING max(temp) \< 40;
175
Dense index
A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a recordin the sorted data file. In clustered indices with duplicate keys, the dense index points to the first record with that key.
176
Sparse index
A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the blockin the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.
177
Conflict graphs are used to...
decide if a schedule is serializable.
178
In a non-clustered index:
* The physical order of the rows is not the same as the index order. * The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses.
179
In a clustered index:
* Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. * Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.
180
Construct B+ tree
When you get a set of key values and a dimension (number of pointers per block). * number of keys on non-leafs to keep on the old leaf when splitting: celing(d / 2) - 1 * number of keys to keep on the old leaf when splitting: celing( (d-1) / 2)
181