databases Flashcards

(230 cards)

1
Q

What is an entity set?

A
  • set of entities of the same type sharing the same set of attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a relationship set?

A
  • set of relationships between one or many entity sets
  • an attribute can also be a property of a relationship set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an n-ary relationship?

A
  • involving n entity sets
  • relationships involving more than two entity sets (binary relationship)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the multiplicity constraints?

A
  • participation
  • cardinality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is participation?

A
  • determines if an entity is require or just allowed to be part of a relationship
  • total: represented with a double line
  • partial: represented with a single line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is cardinality?

A
  • the maximum amount an entity can take part in a relationship set
  • one to one
  • one to many
  • many to one
  • many to many
  • a directed line (arrow) represents “one”
  • an undirected line represents “many”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a candidate key?

A
  • minimal set of attributes that uniquely identifies each entity in an entity set
  • primary key is the candidate key chosen to uniquely identify each entity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a weak entity set?

A
  • an entity set with no candidate keys
  • relies on a strong or identifying entity set for context and meaning
  • have total participation with identifying sets
  • represented with a double lined diamond
  • one to many relationship from each identifying set to the weak entity set
    primary key is the identifying set with the weak set’s discriminator
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a discriminator (aka a partial key)?

A
  • set of attributes that distinguishes among all the entities of a weak entity set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What shape is used to represent an entity set?

A
  • rectangles
  • divided in two parts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is used to indicate a primary key in an ER diagram?

A
  • underlining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What shape is used to represent a relationship set?

A
  • diamonds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is a composite attribute represented?

A

-indentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is a derived attribute represented?

A
  • the derived attribute is followed by parentheses()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is a multi-valued represented?

A
  • {enclosed in braces}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is generalisation (also called specialisation)?

A
  • referred to as superclass-subclass relationship or inheritance
  • lower level entity set inherits all the attributes and relationship participation of the higher level entity set to which it is linked
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the inheritance constraints?

A
  • total: entity must belong to one of the subclass entity sets. Represented with a keyword total.
  • partial: entity need not belong to one of the subclass entity sets. Default unless total is specified.
  • overlapping: an entity can only belong to more than one subclass entity set (multiple arrows to the superclass)
  • disjoint: an entity can belong to at most one subclass entity set (single arrow to the superclass, line splits to subclass)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Is an attribute better placed on an entity set or a relationship set?

A

– An attribute of a many-to-many relationship set must be placed on the relationship set
– An attribute of a one-to-many relationship set can be placed on the relationship set or the many-side entity set
– An attribute of a one-to-one relationship set can be placed on the relationship set or in either entity set
– An attribute of a many-to-many relationship set must be placed on the relationship set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you change a weak entity set to a strong entity set?

A
  • adding an artificial key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When to use inheritance or not?

A
  • If entities share attributes there may or may not be a specialisation/generalisation relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the simple modelling methodology?

A
  1. Identify entity sets
    – Use nouns to determine what needs representation, discarding inappropriate candidates.
  2. Define attributes
    – Establish primary keys, other stored information, and any derived attributes.
  3. Determine relationships
    – Use verbs or entity names, considering cardinality and participation.
  4. Assess generalisation/specialisation
    – Apply if it reduces redundancy or makes logical sense.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is required of attributes?

A

To be atomic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the set of allowed values for each attribute?

A

The domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a relation schema and how does it relate to attributes and domains?

A
  • Attributes ({A1, A2, …, An}): Names of properties in a database.
  • Domains ({D1, D2, …, Dn}): Defines possible values for each attribute.
  • Relation Schema (R = (A1, A2, …, An)): A template of a table consisting of attributes.
  • Relation: A set of n-tuples, where each tuple is an ordered list of n values (a1, a2, …, an) forming a row.
  • Each value ai must be an element of domain Di, corresponding to attribute Ai.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is a relation r?
- a subset of D1 x D2 x ... x Dn - x is the cartesian product
26
How are the current values of a relation represented?
- The relation instance (current values of a relation) is often displayed as a table. - Each element t of r is a tuple, represented as a row in the table.
27
What are superkeys, candidate keys, primary keys, and foreign key constraints?
- Superkey (K): A set of attributes sufficient to uniquely identify a tuple in relation r(R). - Candidate Key: A minimal superkey (no unnecessary attributes). - Primary Key: One candidate key is chosen as the main identifier. - Foreign Key Constraint: A value in one relation must appear in another.
28
What is the relational schema notation?
() - Primary keys are underlined - Foreign keys are followed by an asterisk - Attributes are optionally followed by domain information
29
What is the select operation in relational algebra?
- The Select (σ) operation retrieves rows from a relation that satisfy a given condition. - Notation: σ_condition(Relation) - Purpose: Filters tuples based on specified criteria, keeping only relevant data. - Operates horizontally (filters rows, not columns).
30
What are key relational algebra operations for retrieving and combining data?
- Π (Project): Returns all records but only specified fields.- SQL Equivalent: SELECT column_name FROM table_name - σ (Select): Retrieves all records that satisfy a condition.- SQL Equivalent: SELECT * FROM table_name WHERE condition - × (Cartesian Product): Combines two tables, producing all possible record combinations. - ⨝ (Natural Join): A Cartesian product that selects records with matching values in same-named fields, eliminating redundant columns.
31
What are some key relational algebra operations and their symbols?
- Selection (σ) → Retrieves rows that match a condition. - Projection (Π) → Selects specific columns from a relation. - Renaming (ρ) → Renames attributes or relations for clarity. - Difference (−) → Returns tuples in one relation but not in another. - Cartesian Product (×) → Combines all possible tuples from two relations. - Intersection (∩) → Retrieves tuples common to both relations.
32
How do you convert a strong entity set into a relational model?
A strong entity set reduces to a schema with the same attributes – module ( module_code, title, credits )
33
How do you convert a weak entity set into a relational model?
A weak entity set reduces to a schema that includes foreign key column(s) for the primary key of the identifying entity set – section ( module_code*, name, semester, year, description )
34
How do you convert a many-to-many set into a relational model?
A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set. – Also those primary keys will be foreign keys
35
How do you convert a many-to-one and one-to-many set into a relational model?
Many-to-one and one-to-many relationship sets can be represented by adding a foreign key attribute to the side whose entities can only participate once in the relationship set -> “many side” – If relationship is partial on that side this could result in null values
36
How do you convert an identifying relationship set (one which relates a weak entity set to its identifying entity set) into a relational model?
- does not require a schema
37
How do you convert a one-to-one set into a relational model?
For one-to-one relationship sets, either side can be chosen for the foreign key attribute – Extra attribute can be added to the schema for either of the two entity sets (but not both)
38
How do you convert a Composite Attributes into a relational model?
Composite attributes are flattened out by creating a separate attribute for each component attribute
39
How do you convert a Multivalued Attributes into a relational model?
A multivalued attribute M of an entity set E is represented by a separate schema EM – Schema EM has attributes corresponding Each value of the multivalued attribute maps to a separate tuple of the relation on schema EM
40
How do you convert a Derived Attributes into a relational model?
* Derived attributes should be ignored in the relational schemata
41
How do you convert a Specialisation / Generalisation into a relational model?
* If inheritance is total, no schema for the superclass (person) is required. Can be defined as a “view” relation containing union of subclass relations * Form a schema for each lower level entity set with all local and inherited attributes * But if inheritance is total and the superclass has a relationship with other entity set, we will need: – Form a schema for the superclass – Form a schema for each subclass, with local attributes and a foreign key to the superclass * Also, if inheritance is total and the superclass has a multivalued attribute, and that multivalued attribute needs a schema, then we will need: – Form a schema for the superclass – Form a schema for each subclass, with local attributes and a foreign key to the superclass * Also, if inheritance is total and the superclass has a relationship with weak entity set, then we will need: – Form a schema for the superclass – Form a schema for each subclass, with local attributes and a foreign key to the superclass
42
What is functional dependency in a relational database?
- A functional dependency (FD) describes a relationship between attributes. - Notation: X → Y (If X is known, Y is uniquely determined).
43
What is partial dependency in database normalization?
- A partial dependency exists when a non-prime attribute is functionally dependent on part of a composite key, instead of the entire key. - Occurs in 2NF (eliminated in 3NF). Example: In relation (Student_ID, Course_ID → Course_Name, Student_Name), if Student_Name depends only on Student_ID, it’s a partial dependency.
44
What is a trivial functional dependency?
Back: - A functional dependency X → Y is trivial if Y is a subset of X. - Example: {A, B} → A is trivial because A is part of {A, B}. - Trivial FDs always hold and do not provide additional constraints.
45
What is full functional dependency?
- A full functional dependency occurs when an attribute is dependent on the entire primary key, not just part of it. - Example: In relation (Student_ID, Course_ID → Grade), Grade depends on both attributes together, making it a full dependency.
46
What are transitive dependencies?
- A transitive dependency occurs when an attribute depends on another non-key attribute, instead of the primary key directly. - Occurs in 3NF (eliminated in BCNF). - Example: In Student(ID → Dept_ID, Dept_Name), if Dept_Name depends on Dept_ID, which in turn depends on Student_ID, it’s a transitive dependency.
47
How are keys identified using functional dependencies?
- A candidate key is a minimal set of attributes that uniquely identifies tuples. - Steps: 1. Find all functional dependencies. 2. Determine which attributes uniquely identify all other attributes. 3. Ensure the key is minimal (removing any unnecessary attributes).
48
What are Keys & Functional Dependencies in relational databases?
- Superkey: A set of attributes that uniquely identifies a tuple. - Candidate Key: A minimal superkey. - Primary Key: A chosen candidate key. - Functional Dependency: A rule where an attribute depends on another (e.g., ID → Name).
49
What is closure in functional dependency analysis?
- The closure of an attribute set X (denoted X⁺) is the set of all attributes that can be functionally determined from X. - Used to find keys and verify dependencies. - Steps: 1. Start with X. 2. Apply FD rules to expand the set. 3. Continue until no more attributes can be added.
50
What are Armstrong’s Axioms for functional dependencies?
- Reflexivity: If Y ⊆ X, then X → Y (trivial FD). - Augmentation: If X → Y, then XZ → YZ (adding attributes preserves dependency). - Transitivity: If X → Y and Y → Z, then X → Z. Used to infer all valid functional dependencies in a relation.
51
What are additional rules for functional dependencies?
- Union: If X → Y and X → Z, then X → YZ. - Decomposition: If X → YZ, then X → Y and X → Z. - Pseudo Transitivity: If X → Y and WY → Z, then XW → Z.
52
What is closure of attribute sets in relational algebra?
- Closure of attribute sets helps determine all possible attributes that can be inferred. - Process:- Start with given attributes. - Apply all known functional dependencies. - Continue until no further attributes can be derived. Useful for finding candidate keys in database design
53
What is the difference between F⁺ vs α⁺ in functional dependency analysis?
- F⁺ (Closure of FD set): The set of all functional dependencies derivable from F using inference rules. - α⁺ (Attribute Closure): The set of all attributes functionally implied by α. - Use case:- F⁺ helps find all implied FDs in a relation. - α⁺ is used to check candidate keys for attributes.
54
What is the purpose of normalization in databases?
- Eliminates data redundancy and minimizes anomalies. - Organizes data efficiently while preserving relationships. - Ensures data integrity and simplifies queries.
55
What are the steps in the process of normalization?
- Identify repeating groups → Apply 1NF. - Eliminate partial dependencies → Apply 2NF. - Remove transitive dependencies → Apply 3NF. - Handle remaining anomalies → Apply BCNF or higher.
56
What is lossless decomposition in relational databases?
- Lossless decomposition ensures no data is lost when breaking a relation into smaller tables. - The original relation can be perfectly reconstructed from the decomposed tables.
57
What is lossless-join decomposition?
- A decomposition that allows the original relation to be recovered via joins without introducing spurious tuples. - Ensures data integrity when breaking tables apart.
58
What is lossy decomposition, and why is it problematic?
- A lossy decomposition leads to data loss or the inability to reconstruct the original relation. - Joining decomposed tables may introduce incorrect or missing data.
59
What is First Normal Form (1NF)?
- Eliminates repeating groups and ensures that all attributes contain atomic values. - Each column has single, indivisible values (no lists or arrays).
60
What is Second Normal Form (2NF)?
- Removes partial dependencies (where non-key attributes depend on part of a composite key). - Requires the table to be in 1NF. - Non-key attributes must depend entirely on the primary key.
61
What is Third Normal Form (3NF)?
- Eliminates transitive dependencies (non-key attributes depending on other non-key attributes). - Requires the table to be in 2NF. - No attribute should depend on anything other than the primary key.
62
What is Boyce-Codd Normal Form (BCNF)?
- A stronger version of 3NF, ensuring every determinant is a candidate key. - Requires the table to be in 3NF. - Fixes anomalies that 3NF may not resolve.
63
How do you convert between normal forms?
- Convert to 1NF → Remove repeating groups, ensure atomicity. - Convert to 2NF → Remove partial dependencies. - Convert to 3NF → Eliminate transitive dependencies. - Convert to BCNF → Ensure every determinant is a candidate key.
64
What are domain types in SQL?
- Define data types allowed for each column. - Common types:- INTEGER, FLOAT, DECIMAL → Numeric values. - CHAR, VARCHAR, TEXT → Strings. - DATE, TIME, TIMESTAMP → Date/time values. - BOOLEAN → True/False values.
65
How do you create a table in SQL?
CREATE TABLE defines a new table structure CREATE TABLE table_name ( column1 datatype constraints, column2 datatype constraints ); CREATE TABLE Students ( ID INT PRIMARY KEY, Name VARCHAR(100), Age INT );
66
How do you alter a table in SQL?
ALTER TABLE modifies an existing table structure ALTER TABLE Students ADD COLUMN Address VARCHAR(255);
67
How do you drop a table in SQL?
DROP TABLE permanently removes a table from the database. DROP TABLE table_name; DROP TABLE Students;
68
What is a Cartesian product in SQL?
Occurs when no join condition is specified Returns all possible combinations of rows from two tables. SELECT * FROM TableA, TableB;
69
What are joins in SQL?
- Combines data from multiple tables. - Types:- INNER JOIN → Matches rows from both tables. - LEFT JOIN → Includes all rows from the left table, matching from the right. - RIGHT JOIN → Includes all rows from the right table, matching from the left. - FULL JOIN → Includes all rows from both tables.
70
What is the rename operation in SQL?
Changes the name of a column or table. ALTER TABLE Students RENAME TO Learners;
71
What are string operations in SQL?
Common string functions: 1. TRIM() → Removes spaces. 2. LENGTH() → Finds string length. 3. UPPER() / LOWER() → Changes case. 4.CONCAT() → Combines strings.
72
How do you sort tuples in SQL?
ORDER BY sorts results in ascending (ASC) or descending (DESC) order. SELECT Name, Age FROM Students ORDER BY Age DESC;
73
What are NULL values in SQL?
- NULL represents missing or unknown data. - NULL is not the same as zero or empty string.
74
What is Three-Valued Logic in SQL?
SQL uses TRUE, FALSE, and UNKNOWN (NULL) logic. Comparisons involving NULL return UNKNOWN. SELECT * FROM Students WHERE Age = NULL; -- Incorrect! SELECT * FROM Students WHERE Age IS NULL;
75
What are Aggregate Functions in SQL?
Aggregate functions perform calculations on a set of values and return a single result. Common functions: COUNT() → Returns the number of rows. SUM() → Adds numeric values. AVG() → Computes average of values. MAX() / MIN() → Finds the highest or lowest value. SELECT AVG(Salary) FROM Employees;
76
What is GROUP BY in SQL?
GROUP BY is used with aggregate functions to group data into categories. SELECT column, aggregate_function(column) FROM table GROUP BY column; SELECT Department, COUNT(*) FROM Employees GROUP BY Department;
77
What is the HAVING clause in SQL?
HAVING filters groups based on aggregate conditions. Works with GROUP BY, unlike WHERE, which filters rows SELECT Department, AVG(Salary) FROM Employees GROUP BY Department HAVING AVG(Salary) > 50000;
78
How does Modification work in SQL?
- SQL modification statements allow updating, inserting, and deleting data. - INSERT INTO → Adds new records. - UPDATE → Modifies existing records. - DELETE → Removes records.
79
How do you insert data in SQL?
INSERT INTO adds new rows to a table. INSERT INTO table_name (column1, column2) VALUES (value1, value2); INSERT INTO Students (ID, Name) VALUES (101, 'Alice');
80
How do you update records in SQL?
UPDATE modifies existing values in a table UPDATE table_name SET column1 = value1 WHERE condition; UPDATE Employees SET Salary = Salary + 5000 WHERE Department = 'HR';
81
How do you delete records in SQL?
DELETE removes rows from a table. DELETE FROM table_name WHERE condition; DELETE FROM Students WHERE ID = 101;
82
What is the CASE statement used for in conditional updates?
CASE allows conditional logic within an SQL statement. Used in UPDATE statements for selective modifications UPDATE Employees SET Salary = CASE WHEN Department = 'HR' THEN Salary + 5000 WHEN Department = 'IT' THEN Salary + 3000 ELSE Salary END;
83
What are Integrity Constraints in SQL?
Rules that maintain data accuracy and consistency. NOT NULL → Prevents empty values. UNIQUE → Ensures column values are distinct. CHECK → Enforces specific conditions FOREIGN KEY → Ensures referential integrity between tables.
84
What are Integrity Constraints on a Single Relation?
Constraints applied to a single table: 1. NOT NULL → Column cannot have NULL values. CREATE TABLE Students ( ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL ); 2. UNIQUE → Ensures distinct values. CREATE TABLE Users ( Email VARCHAR(255) UNIQUE ); 3. CHECK → Restricts column values based on a condition. CREATE TABLE Employees ( Salary INT CHECK (Salary > 30000) );
85
What is Referential Integrity (Foreign Keys)?
Ensures data consistency between related tables. A foreign key references a primary key in another table. CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(ID) );
86
How are Integrity Constraints defined in CREATE TABLE?
Constraints are added during table creation. CREATE TABLE Students ( ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL, Age INT CHECK (Age >= 18) );
87
What is a Referential Integrity Violation in SQL? What are the solutions to this?
Occurs when a foreign key references a non-existent primary key. INSERT INTO Orders (OrderID, CustomerID) VALUES (101, 999); -- Fails if CustomerID 999 doesn't exist in Customers table. Solutions: 1. Ensure referenced value exists. 2. Use ON DELETE CASCADE to automatically remove dependent rows.
88
What are Assertions in SQL?
Ensures complex conditions hold globally in the database. Constraints that apply across multiple tables. CREATE ASSERTION CheckSalary CHECK (NOT EXISTS ( SELECT * FROM Employees WHERE Salary < 30000 ));
89
What are Naming Constraints in SQL?
Assigns names to constraints for easy management. CREATE TABLE Students ( ID INT CONSTRAINT ID_PK PRIMARY KEY, Age INT CONSTRAINT Age_Check CHECK (Age >= 18) );
90
What are Nested Subqueries in SQL?
A subquery inside another SELECT, FROM, or WHERE clause. Allows advanced filtering and calculations SELECT Name FROM Students WHERE Age > (SELECT AVG(Age) FROM Students);
91
What do IN and NOT IN do in SQL?
IN checks if a value exists within a set. NOT IN excludes specific values SELECT Name FROM Employees WHERE Department IN ('HR', 'IT');
92
What is Set Comparison (SOME and ALL) in SQL?
SOME → Returns TRUE if at least one comparison is true. ALL → Returns TRUE if all comparisons are true. SELECT Name FROM Employees WHERE Salary > ALL ( SELECT Salary FROM Interns );
93
How do you test for empty relations in SQL?
Use EXISTS or NOT EXISTS to check for data presence. SELECT 'No Employees Found' WHERE NOT EXISTS (SELECT * FROM Employees);
94
What are Correlation Variables in SQL?
Variables that link an inner subquery to the outer query. SELECT E.Name FROM Employees E WHERE EXISTS ( SELECT * FROM Projects P WHERE P.Manager_ID = E.ID );
95
How do you test for duplicate tuples in SQL?
Use GROUP BY and HAVING COUNT() > 1 to find duplicates SELECT Name, COUNT(*) FROM Students GROUP BY Name HAVING COUNT(*) > 1;
96
What are Subqueries in the FROM Clause?
Subqueries used as temporary tables for further selection. SELECT AVG(Age) FROM ( SELECT Age FROM Students WHERE Gender = 'Female' ) AS FemaleStudents;
97
What is a Scalar Subquery in SQL?
A subquery that returns a single value. Used in SELECT, WHERE, and HAVING clauses. SELECT Name, (SELECT MAX(Salary) FROM Employees) AS MaxSalary FROM Employees;
98
What are Joined Relations in SQL?
- Combines data from multiple tables based on a common attribute. - Used to avoid duplication and ensure consistency.
99
What are Join Conditions in SQL?
Specifies how tables are related when joining. SELECT Students.Name, Courses.CourseName FROM Students JOIN Enrollments ON Students.ID = Enrollments.Student_ID;
100
What are Join Types in SQL?
- INNER JOIN → Matches records in both tables. - OUTER JOIN → Includes unmatched records. - LEFT JOIN → All records from the left table + matches from the right. - RIGHT JOIN → All records from the right table + matches from the left. - FULL JOIN → All records from both tables, filling in missing values. Joins are operations that combine related tables based on shared attributes.
101
How are Joined Relations used in SQL?
- Joined relations combine data logically from separate tables. - Helps maintain data integrity and efficiency in queries.
102
What are User-Defined Types (UDTs) in SQL
UDTs allow customization of data types for specific use cases. Helps enforce structured data storage. CREATE TYPE PhoneNumber AS VARCHAR(15);
103
What are time-related types in SQL?
- DATE → Stores date values (YYYY-MM-DD). - TIME → Stores time values (HH:MM:SS). - TIMESTAMP → Stores both date and time. - INTERVAL → Represents time differences.
104
What are domains in SQL?
Domains restrict valid values for attributes. CREATE DOMAIN AgeDomain AS INT CHECK (VALUE >= 18);
105
What are large-object types in SQL?
- LOBs store large binary/text data, like images or documents. - Types: 1. BLOB → Binary Large Object. 2. CLOB → Character Large Object. - Used for media storage in databases.
106
What is a view in SQL?
A virtual table based on an SQL query. Does not store data, just a saved query definition CREATE VIEW EmployeeView AS SELECT Name, Salary FROM Employees;
107
Can views be updated in SQL?
- Views can be updated if they follow simple rules: 1. Based on a single table. 2. No aggregations or derived columns. UPDATE EmployeeView SET Salary = 50000 WHERE Name = 'Alice';
108
What is a materialized view in SQL?
A saved query result that stores actual data. Improves performance for frequently accessed data. CREATE MATERIALIZED VIEW SalesSummary AS SELECT Category, SUM(Revenue) FROM Sales GROUP BY Category;
109
What is an index in SQL?
Speeds up search queries by organizing data efficiently. Types: Clustered, Non-clustered, Unique, Full-text. CREATE INDEX idx_employee_name ON Employees(Name);
110
What is authorisation in SQL?
- Controls database access and privileges. - Users are granted specific permissions (SELECT, INSERT, DELETE).
111
Who can grant privileges in SQL?
DB admin (DBA) or users with GRANT OPTION can grant privileges. GRANT SELECT ON Employees TO User1 WITH GRANT OPTION;
112
How do you grant privileges in SQL?
Grants permission to users. GRANT INSERT, UPDATE ON Orders TO User2;
113
How do you revoke privileges in SQL?
Removes previously granted permissions. REVOKE SELECT ON Employees FROM User1;
114
What are functions and procedures in SQL?
Functions return a single value. Procedures perform actions without returning values. CREATE FUNCTION GetTotalSales() RETURNS INT AS BEGIN RETURN (SELECT SUM(SalesAmount) FROM Sales); END;
115
What are triggers in SQL?
Automatic execution of SQL code when an event occurs. Types: BEFORE, AFTER, INSTEAD OF triggers. CREATE TRIGGER UpdateInventory AFTER INSERT ON Orders FOR EACH ROW BEGIN UPDATE Inventory SET Stock = Stock - NEW.Quantity WHERE ProductID = NEW.ProductID; END;
116
How are triggers used in MariaDB?
- Similar to standard SQL triggers, but MariaDB lacks INSTEAD OF triggers. - Uses BEFORE and AFTER triggers for automation.
117
What events trigger actions in SQL?
- Common trigger events:- INSERT → Fires when a new row is added. - UPDATE → Fires when a row is changed. - DELETE → Fires when a row is removed.
118
When should triggers be avoided in SQL?
- When performance is impacted (triggers slow bulk operations). - When business logic should be managed at the application level. - Complex cascading triggers can cause unintended effects.
119
What is SQL Injection, and how does it work? How can we prevent it?
A security vulnerability that allows attackers to manipulate SQL queries. Occurs when user input is improperly sanitized in SQL commands. Prevention: Use prepared statements and proper input validation. SELECT * FROM Users WHERE Username = '$input';
120
How do prepared statements prevent SQL injection?
Separates SQL syntax from user input, preventing manipulation. $stmt = $conn->prepare("SELECT * FROM Users WHERE Username = ?"); $stmt->bind_param("s", $username); $stmt->execute();
121
What are metadata features in databases?
- Metadata stores information about tables, columns, indexes, and relationships. - Helps in query optimization and schema management. Example using JDBC: DatabaseMetaData metaData = connection.getMetaData(); System.out.println(metaData.getDatabaseProductName());
122
What is JDBC, and why is it used?
Java Database Connectivity (JDBC) enables Java applications to interact with databases. Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb", "user", "password");
123
How does transaction control work in JDBC?
Uses commit() and rollback() for managing transactions. conn.setAutoCommit(false); Statement stmt = conn.createStatement(); stmt.executeUpdate("UPDATE Accounts SET balance = balance - 100 WHERE ID = 1"); conn.commit();
124
How does Python interact with MariaDB?
import mariadb conn = mariadb.connect(user="user", password="password", database="testdb")
125
What is PHP, and how is it used?
A server-side scripting language used for web development. Embedded within HTML.
126
How does PHP send data to the web browser?
Uses echo or print statements. echo "

Welcome to My Website

";
127
How are variables defined in PHP?
Uses $ before variable names. $name = "Alice"; echo "Hello, " . $name;
128
How do you define constants in PHP?
Uses define() function. define("SITE_NAME", "MyWebsite"); echo SITE_NAME;
129
How do you connect PHP to a database?
* Define required values: – define ( 'DB_USER', 'rr71' ); – define ( 'DB_PASSWORD', 'abc32d0' ); – define ( 'DB_HOST', 'localhost' ); – define ( 'DB_NAME', 'rr71_db' ); * Connecting to MariaDB server: – $dbc = mysqli_connect( DB_HOST, DB_USER, DB_PASSWORD, DB_NAME ); * Changing database: – mysqli_select_db( $dbc, 'rr71_other_db' );
130
How do you handle errors in PHP?
Use try-catch for exception handling. try { $conn = new mysqli("localhost", "user", "password", "database"); } catch (Exception $e) { echo "Connection failed: " . $e->getMessage(); } * mysqli_error ( $dbc ) – Returns last error returned by MariaDB * mysqli_errno ( $dbc ) – Returns error number corresponding to an error returned by MariaDB * @ – If prefixed to a function name, suppresses any error messages or warnings that might be invoked * die – Terminates execution of script and sends parameter to client browser
131
How does PHP execute SQL queries?
Using mysqli_query(). $result = $conn->query("SELECT * FROM Users");
132
How does PHP retrieve SQL query results?
* For queries which return results $result = mysqli_query( $dbc, $query_str); while ($row = mysqli_fetch_array($result)) { // process $row } * Optional extra parameter to indicate indexing – MYSQL_NUM (for example, $row[1] ) * equivalent to using mysqli_fetch_row($result) instead – MYSQL_ASSOC (for example, $row[id] ) * equivalent to using mysqli_fetch_assoc($result) instead – MYSQL_BOTH (can use both forms)
133
How does Node.js connect to databases?
Uses MySQL or PostgreSQL modules. const mysql = require("mysql"); const conn = mysql.createConnection({ host: "localhost", user: "root", password: "", database: "testdb" }); conn.connect();
134
How do you execute queries in Node.js?
Uses the .query() function. conn.query("SELECT * FROM Users", (err, results) => { if (err) throw err; console.log(results); });
135
How does Node.js pass parameters securely to SQL queries?
Uses prepared statements to prevent injection. conn.query("SELECT * FROM Users WHERE Username = ?", [username], (err, results) => { if (err) throw err; console.log(results); });
136
What is the storage hierarchy in databases?
* Primary storage – Fast, Expensive, Volatile – Holds data to be operated on * Secondary storage – Moderately fast, Cheaper, Non-volatile – Holds data not currently being operated on * Tertiary storage – Slow, Cheap, Non-volatile, Durable – Generally holds archival data
137
What are blocks in database storage?
- Small units of storage used for reading/writing data. - Data is stored and transferred in fixed-size blocks. - Optimized for disk I/O operations.
138
How is data stored in databases?
Database is stored as a collection of files. – Each file is a sequence of records. * File may or may not correspond to a single database table – Each record is a sequence of fields. * Files are maintained by the underlying OS – stored permanently on disks. * Files are mapped onto disk blocks – Can be specified when a database instance is created * Typically, 4 to 8 kilobytes by default
139
What are fixed-length records in database storage?
- Every record has a fixed size, simplifying storage management. - Used for structured, predictable data formats. For example: * Consider a file of instructor records defined in pseudocode as follows: type instructor = record ID varchar (5); name varchar (20); dept_name varchar (20); salary numeric (8,2); end * Let each character use 1 byte and numeric (8,2) 8 bytes, then max no of bytes for each record is 53 bytes. – Block size unlikely to be a multiple of 53
140
What assumptions apply to fixed-length records
– Record size is fixed – Each file has records of one particular type only – Different files are used for different relations – Each record is entirely contained in a single block – No record is larger than a block
141
How are fixed-length records deleted?
- Deleted records leave gaps in storage. - Solutions:- Use a special marker for deleted records. - Maintain a free list for reusing space. Steps: 1. Compact records i.e. shift records left 2. Move the last record into the gap 3. Do not move the records but fill the gap on next insertion
142
What are Free Lists?
- A list tracking unused or deleted record slots. - Helps reallocate space efficiently without fragmentation. * Store the address of the first deleted record in the file header * Use this first record to store the address of the second deleted record – Effectively “hide” this address in unused space – No overhead * Can think of these stored addresses as “pointers” * More space-efficient – Extra information stored in unused space
143
What are variable-length records in databases?
- Records do not have a fixed size, leading to flexible storage. - Two structures:- Single-record structure → Each record has length metadata. - Block structure → Records are grouped in blocks with pointers.
144
How are records organized in files?
- Sequential organization → Sorted records for fast retrieval: 1. Records are stored in sorted order. 2. Efficient for batch processing, but slow updates. - Heap organization → Unordered storage, fast inserts: 1. Records are stored in any order. 2. Fast inserts, but slower searches. - Clustered organization → Groups related records together: 1. Stores multiple related tables together for efficient access. 2. Reduces disk I/O when retrieving linked data. - Hashing: a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed
145
What is hashing?
- a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed
146
What is a data dictionary in databases?
- Stores metadata about database objects. - Includes information on tables, columns, indices, and constraints.
147
How does storage access work in databases?
* A database file is partitioned into fixed-length storage units called blocks. * Blocks are units of both storage allocation and data transfer. * Database systems seek to minimise the number of block transfers between the disk and memory. – We can reduce the number of disk accesses by keeping as many blocks as possible in main memory. * Buffer – portion of main memory available to store copies of disk blocks. * Buffer manager – subsystem responsible for allocating buffer space in main memory
148
What is the role of a buffer manager?
* Programs call on the buffer manager when they need a block from disk. – If the block is already in the buffer, buffer manager returns the address of the block in main memory – If the block is not in the buffer, the buffer manager * Allocates space in the buffer for the block – Replacing (throwing out) some other block, if required, to make space for the new block. – Replaced block written back to disk only if it was modified since the most recent time that it was written to/fetched from the disk. * Reads the block from the disk to the buffer, and returns the address of the block in main memory to requester.
149
What are common buffer-replacement policies?
* Most operating systems replace with the least recently used (LRU) strategy * Idea behind LRU is to use past pattern of block references as a predictor of future references * However, queries have well-defined access patterns (such as sequential scans), and a database system can use the information in a user’s query to predict future references * LRU can be a bad strategy for certain access patterns involving repeated scans of data FIFO → Removes oldest pages. MRU → Prefers removing most recently used pages.
150
What is an index in databases?
- Speeds up searches by mapping key values to records. - Types: Primary, Secondary, Dense, Sparse, Multi-level.
151
What is the difference between primary and secondary indexing?
- Primary index → Built on primary keys, sorted order. - Secondary index → Built on non-key attributes for faster searches.
152
What is the difference between dense and sparse indexing?
- Dense index → Stores an index entry for every record. - Sparse index → Stores entries for selected records, reducing space usage.
153
How does multi-level indexing improve search efficiency?
- Creates hierarchical index levels. - Higher levels store pointers to lower index levels, reducing search time.
154
What is a B+ Tree index in databases?
- A balanced multi-level index structure. - Leaf nodes contain actual data for fast retrieval. * Advantage: – automatically reorganises itself with small, local changes * Minor Disadvantage: – insertions and deletions are costlier – space overhead * Advantages outweigh disadvantages: – B+ trees are used extensively * Structure: – balanced tree – multi-level index
155
How is deletion handled in sparse indices?
If a key is deleted, adjacent entries must be updated to maintain search efficiency.
156
What are key properties of B+ Trees?
- Balanced structure ensures efficient searching. - Leaf nodes store actual data while non-leaf nodes store pointers. - Supports efficient range queries.
157
How does lookup work in indexing?
1. Start with the root node 1a. Examine the node for the smallest search-key value Kj such that k < Kj 1b. If such a value exists, follow pointer Pj to the child node 1c. Otherwise Km-1 ≤ k, where m is the number of pointers in the node. Follow Pm to the child node 2. Repeat the 3 sub-steps above until a leaf node is found 3. If for some i, key Ki = k * Follow pointer Pi to the desired record or bucket. 4. Otherwise no record with search-key value k exists
158
How does insertion in a B+-tree handle non-leaf node splitting?
- If a non-leaf node exceeds the max keys, it splits into two. - The middle key is promoted to the parent node. - Ensures the tree remains balanced for efficient search Steps: 1. Take the n search key values listed in order * n − 1 in node already, plus new one 2. Put the first n/2 in existing node 3. Put the rest in a new leaf node p 4. Insert p after existing node by updating the pointers between leaf nodes 5. Insert a search-key/pointer pair (k,p ) in the parent of the node being split * where k is the least key in p 6. If the parent is full, split it and propagate the split further up
159
How is write optimization achieved in database indexing?
Performance of B+-trees can be poor for write-intensive workloads * Inserting entries one-at-a-time into a B+-tree requires ³ 1 IO per entry – Inefficient for inserting large number of entries at a time or for write-intensive workloads * Efficient alternative 1: – Sort entries first – Insert in sorted order * Insertion will go to existing page (or cause a split) * Much improved IO performance, but most leaf nodes half full * Efficient alternative 2 (AKA Bottom-up B+-tree construction) – Sort entries first – Create tree layer-by-layer, starting with leaf level – Implemented as part of bulk-load utility by most database systems
160
How does bottom-up B+-tree construction work?
1) Sort the values 2) Create the linked list representing the leaf nodes 3) Create the next level of the tree – Build nodes left to right – Split overfull nodes using the normal method for splitting non-leaf nodes * Propagate up as necessary
161
What are write-optimized indices in databases?
- Optimized to handle high write throughput efficiently. - Techniques include: 1. LSM Trees (log-based indexing). 2. Buffer Trees (delayed updates for fewer disk writes).
162
What is an LSM Tree?
- Optimized for write-heavy workloads. - Data is first written to memory, then periodically merged to disk. - Improves write speed but requires efficient merging policies.
163
How do deletion and updates work in LSM Trees?
* Deletion handled by adding special “delete” entries – Lookups find both original entry and “delete” entry * Return only entries that do not have matching “delete” entry – When trees are merged, omit entries with matching “delete” entry * Update handled using delete of original entry and insert of updated entry
164
What are the benefits and drawbacks of LSM Trees
Back: ✅ Benefits: - Faster writes due to memory buffering. - Efficient bulk insertions. - Handles high update throughput. ❌ Drawbacks: - Merge overhead increases read latency. - Deletes remain until compaction occurs.
165
What are variants of LSM Trees?
* Stepped-merge index – Variant of LSM tree with multiple trees at each level – Reduces write cost compared to LSM tree – But queries are even more expensive
166
What are buffer trees?
- Delays updates in buffers before applying them to disk. - Reduces disk write overhead by grouping operations. - Efficient for batch inserts and deletes.
167
What are the benefits and drawbacks of Buffer Trees?
✅ Benefits: - Minimizes disk writes with batch processing. - Improves indexing for write-heavy workloads. ❌ Drawbacks: - Requires efficient buffer management. - Delays updates, making real-time queries harder.
168
What is hashing?
A bucket is a unit of storage containing one or more entries (a bucket is typically a disk block). – we obtain the bucket of an entry from its search-key value using a hash function * Hash function ℎ is a function from the set of all search-key values K to the set of all bucket addresses B. * Hash function is used to locate entries for access, insertion and deletion. * Entries with different search-key values may be mapped to the same bucket – Thus, the entire bucket must be searched sequentially to locate an entry. * In a hash index buckets store entries with pointers to records * In a hash file-organization buckets store records
169
What are hash functions?
Worst hash function maps all search-key values to the same bucket An ideal hash function is uniform An ideal hash function is pseudo-random Typical hash functions perform computation on the internal binary representation of the search-key
170
What is static hashing?
Hash function is static (mapping search key values to a fixed set of bucket addresses) * Effect of unsuitable set size (number of buckets): – too large: space is wasted, too many buckets – too small: too many keys map to a given bucket (overflow buckets are created) performance suffers Rule of thumb – choose bucket size to be 1 block – choose number of buckets to be nb = nr/fr – nr is (expected or actual) number of records – fr is number of records that fits in a bucket * Try to limit waste while reducing chance of overflow – choose number of buckets to be nb = (nr/fr)(1+d) – d is a ‘fudge’ factor e.g. 0.2 to reduce risk of overflow by allowing for 20% waste
171
What is bucket overflow?
* Bucket overflow can occur because of – Insufficient buckets – Skew in distribution of records because * multiple records have same search-key value * chosen hash function produces non-uniform distribution of key values * Overflow can be handled by using overflow buckets 12: – Overflow chaining – the overflow buckets of a given bucket are chained together in a linked list
172
What are some Dynamic Hash Functions?
Extendable hashing: – Allows the database to grow and shrink by splitting and coalescing buckets, thus retaining space efficiency – Reorganisation performed one bucket at a time, so overhead is incremental
173
What is extendable hashing?
✅ Extendable Hashing is a dynamic hashing technique that grows and shrinks based on the number of entries, optimizing storage efficiency. Key Features: - Uses a directory of bucket pointers. - Binary representation of keys determines bucket placement. - Expands dynamically by doubling directory size when buckets overflow. - Prevents excessive collisions and minimizes reorganization costs. ✅ Advantages: - Efficient space utilization. - Fast lookup, insertion, and deletion. - Flexible structure adapts to data growth.
174
How to locate bucket containing search-key Kj?
1. Compute ℎ(Kj)) = X 2. Use the first i bits of X as a decimal index into bucket address table, and follow the pointer to appropriate bucket, then do a sequential search for the record
175
Hash Insertion
* To insert record with search key value Km – lookup as before to find bucket, say j – if enough room in bucket, add record to bucket – if no room in the bucket it must be split and entries redistributed between the original and new bucket
176
What is insertion case 1?
Case 1: ij < i: * More than one cell in bucket address table points to bucket j * Allocate new bucket z *Rehash all records in bucket j and take the first ij bits of their hash
177
What is insertion case 2?
Only one cell in bucket address table points to bucket j Re-try the insertion of the record from the start!
178
What is deletion in hashing?
* Lookup to find bucket * Delete bucket if now empty * Periodic reorganisation possible but not necessary benefits: Hash performance does not degrade with growth of file minimal space overhead rehashing cons: Extra level of indirection to find desired record Bucket address table may itself become very big (larger than memory)
179
What is query processing, and why does execution cost vary?
- Multiple ways to execute a query, each with different costs. - Disk transfer times significantly affect performance. - Optimizing execution reduces resource consumption. - Even single-use queries can benefit from optimization.
180
How does SQL allow multiple interpretations of a query?
- Different SQL queries may produce equivalent results. - Users may not write the most efficient query. - The system rewrites queries into more efficient equivalents. - This process is called Query Optimisation.
181
What are the three main steps in query processing?
1. Parser & Translator:- Checks syntax & table references. - Translates into relational algebra. 2. Optimiser:- Converts query into a more efficient form. - Creates a query execution plan. 3. Evaluation:- Executes the query evaluation plan. - Returns final results.
182
How does query optimisation work?
- Generates multiple execution plans for evaluation. - Converts queries into equivalent relational algebra expressions. - Different algorithms affect execution speed. - Estimates costs based on database statistics. - Chooses the lowest-cost execution plan.
183
How is relational algebra used in query optimisation?
- Queries are translated into relational algebra expressions. - Multiple equivalent expressions exist for the same query. - The system finds the most efficient execution order.
184
What statistical information helps in query optimisation?
- Number of tuples per table. - Size of tuples (storage space per row). - Number of distinct values per attribute. - Intermediate result size estimations. - Cost estimations of different execution algorithms.
185
How does the system choose the lowest-cost query execution plan?
- Estimates query execution cost for various plans. - Uses data dictionary statistics to make predictions. - Selects the plan with the least resource consumption.
186
What is the role of parsing and translation in query processing?
- Parsing → Validates SQL syntax and ensures query correctness. - Translation → Converts SQL query into relational algebra for optimization. - Ensures queries are interpretable by the database engine.
187
What does the parser do in SQL query processing?
- Checks for syntax errors in SQL queries. - Verifies referenced tables and attributes exist in the schema. - Generates a parse tree representing query structure.
188
Why is SQL translated into relational algebra?
- Relational algebra is structured for optimization. - Helps the system generate equivalent, efficient execution plans. - Converts SQL into basic operations (selection, projection, join, etc.)
189
What is a query parse tree, and why is it important?
Hierarchical representation of query structure. Helps optimizer understand dependencies among operations. SELECT ├── FROM (Students) ├── WHERE (Age > 18)
190
How does parsing & translation fit into query execution?
- Parse SQL query → Validate syntax & semantics. - Translate to relational algebra → Express query using logical operations. - Pass expression to query optimizer for efficiency evaluation.
191
What are the ACID properties in database transactions?
- Atomicity → The transaction is all-or-nothing; either all operations succeed or none do. - Consistency → Transactions ensure the database remains valid and follows integrity rules. - Isolation → Each transaction executes as if it were alone, unaffected by other concurrent transactions. - Durability → Once committed, changes persist, even after system failures.
192
What are the different transaction states in a database?
- Active → Initial state; transaction is executing. - Partially Committed → Final statement has executed but not yet fully committed. - Failed → Execution cannot proceed due to an error. - Aborted → Transaction rolled back, database restored. Restart if no logical error. Kill and report failure if needed. - Committed → Transaction successfully completed and changes are permanent.
193
Why is concurrent execution important in large databases?
- Multiple users need simultaneous access. - Database must maintain isolation illusion (each user feels sole access). - Improves system performance while ensuring consistency. - Requires concurrency control to prevent interference
194
What is a schedule in database transactions?
- Defines the order of transaction execution. - Includes all instructions for a set of transactions. - Preserves instruction order within each transaction. - Helps enforce isolation without relying on specific database designs.
195
What is a serial schedule in transaction processing?
- Transactions are executed one after another (no interleaving). - Ensures consistent results without transient inconsistencies. - Example: Transfer £50 from Account A to Account B. Transfer 10% of Account A's funds to Account B.
196
How does transaction order impact results in a serial schedule?
- Different serial schedules may yield different results. - Example: Start balance: £1000 (A) & £2000 (B). Schedule 1 → Transfer first, then percentage: Result: £855 (A), £2145 (B). - Schedule 2 → Percentage first, then transfer: Result: £850 (A), £2150 (B). - Total balance remains the same in both cases.
197
What is a non-serial schedule in databases?
- Transactions execute concurrently with interleaved instructions. - Some non-serial schedules are correct, meaning they maintain consistency. - Others may produce unexpected results if control mechanisms aren’t applied.
198
What makes a transaction schedule incorrect?
- Each transaction individually maintains consistency, but: Write operations may be overwritten by other transactions. Sum of values may not be preserved correctly. - Temporary inconsistencies can occur during execution. - To be correct, the schedule must be: Serial (executed one-by-one). Or equivalent to a serial schedule in net effect.
199
How do we determine if two schedules are equivalent?
- Internal calculations are ignored—only read/write operations matter. - If two consecutive operations refer to different data items, they can swap without affecting results. - If they refer to the same data item, their execution order may impact correctness.
200
How do transactions behave when operating on the same data item?
- Case 1: Both transactions read → Order does not matter. - Case 2: One reads, one writes → Order matters (read must occur before the write). - Case 3: One writes, one reads → Order matters (read depends on the previous write). - Case 4: Both transactions write → Order matters (the first value written is lost).
201
What is conflict equivalence between schedules?
- Two schedules are conflict equivalent if one can be transformed into the other by swapping non-conflicting operations. - Conflicting operations involve: Reads/Writes on the same data item. Writes that overwrite values. - A schedule is conflict serialisable if: It is equivalent to some serial schedule, meaning it preserves isolation
202
How do you test for conflict serialisability?
- Construct a directed precedence graph: Vertices → Represent transactions. Edges → Add an edge T₁ → T₂ if: 1. T₁ writes before T₂ reads (read-after-write). 2. T₁ reads before T₂ writes (write-after-read). 3. T₁ writes before T₂ writes (write-after-write). - Significance of edges: If T₁ → T₂ exists, then T₁ must execute before T₂ in any equivalent serial schedule. - Conflict serialisability: If the precedence graph has no cycles, the schedule is conflict serialisable.
203
How do you interpret a precedence graph in transaction scheduling?
- Depends on cycles: No cycles → Conflict serialisable (can be transformed into a serial schedule). Cycles → Not conflict serialisable (some operations interfere). - Test for conflict serialisability: Construct a precedence graph. Check for cycles.
204
Why is conflict serialisability important but not always sufficient?
- Ensures that schedules are logically equivalent to serial execution. - However, it does not fully guarantee isolation at the application level.
205
What is a lock-based protocol, and why is it needed?
- Locks control concurrent access to data items. - A transaction must acquire a lock before modifying an item. - Two types of locks: Exclusive (x-lock) → Allows reading and writing. Shared (s-lock) → Allows only reading. - Managed by a concurrency-control manager.
206
What are common problems in lock-based concurrency control?
❌ Deadlock → Transactions wait indefinitely for each other’s locks. - Solution: Rollback one transaction and release locks. ❌ Starvation → A transaction is repeatedly rolled back due to deadlocks. - Solution: Use timestamps or priority scheduling.
207
What is the Two-Phase Locking (2PL) protocol, and how does it ensure serialisability?
- Ensures conflict-serialisable schedules by dividing execution into two phases: 1. Growing Phase: Transactions can acquire locks but cannot release them. 2. Shrinking Phase: Transactions can release locks but cannot acquire new ones. - Guarantees serialisability, but deadlocks can still occur.
208
How can deadlocks be detected in Two-Phase Locking (2PL)?
- Use a Wait-For Graph (WFG): Vertices → Represent active transactions. Edges → Directed edges T₁ → T₂ indicate T₁ is waiting for T₂'s lock. - Deadlock exists if the WFG has cycles.
209
How is deadlock recovery handled in transactions?
- Rollback one transaction (victim) to break deadlock. - Choose victim with minimum cost (less data loss). - Rollback strategies: 1. Total rollback → Abort transaction and restart. 2. Partial rollback → Rollback only enough to resolve deadlock. - Avoid starvation by tracking rollback counts.
210
What are common deadlock prevention techniques?
- Transactions use timestamps to determine wait behavior. - Strategies include: 1. Wait-Die: Older transactions wait, younger ones are rolled back. 2. Wound-Wait: Older transactions force rollback of younger ones. 3. Timeout-Based: Transactions wait only for a limited time, then rollback.
211
Why is failure handling important in databases?
- Database contents are valuable and must be protected. - Computer systems can fail, so consistency must be ensured. - Two key concerns: Data safety → Prevent loss or corruption. Data consistency → Ensure integrity even after failures.
212
How does a database recover from failures?
- Restores database to a consistent state. - Ensures atomicity: No updates from uncommitted transactions. - Ensures durability: Updates from committed transactions persist even after failure.
213
What are common types of database failures?
- Transaction Failure: Logical errors (incorrect input, out of memory). System errors (deadlock, scheduler failures). - System Crash: Power failure, hardware issues, OS errors. - Disk Failure: Corrupt storage, disk malfunction.
214
Why does atomicity matter during recovery?
- Partial updates may leave the database inconsistent. - Goal → Ensure all updates are applied or none at all. - Solution → Store transaction changes on stable storage before modifying the database.
215
What is log-based recovery, and how does it work?
- Logs track database modifications for recovery. - Transactions write logs before executing changes. - Log format: - T → Transaction ID. X → Data item. Old & new values → Before and after modification.
216
How does deferred database modification work
- Modifications are recorded in logs but only applied after commit. - No changes are made to the database until a transaction is fully committed. - If a crash occurs: Redo operations apply changes if and are in the log.
217
How does immediate database modification work
- Database is updated immediately, before transaction commits. - Undo required in case of failure. - Logs must contain both old and new values. - Recovery steps: Redo committed transactions. Undo uncommitted transactions.
218
What is the purpose of checkpoints in log-based recovery?
- Prevents unbounded log growth, reducing recovery time. - Periodically flushes all log records to disk. - Writes a checkpoint record to the log. - Removes completed transactions from the log. - During recovery: Only considers transactions active at the last checkpoint.
219
How does log record buffering optimize storage performance
- Buffers log records in memory instead of writing to storage immediately. - Logs are flushed when the buffer fills up or when log force is triggered. - Log force: Ensures all logs for a transaction (including commit log) are stored. - Reduces I/O costs by batching write operations.
220
What are the key rules for buffered log records
- Logs must be written to stable storage in order of creation. - Transactions commit only after the commit log record is stored. - Before database blocks are written, all related log records must be flushed first. This is called the Write-Ahead Logging (WAL) rule.
221
How can the order of joins affect query performance?
- Swapping join order minimizes intermediate result size, improving efficiency. - Uses system catalog statistics, including: Number of tuples in each relation. Tuple sizes. Number of distinct values per attribute. - Optimized strategy:- Join smaller relations first to reduce intermediate result size.
222
What are key equivalence rules for selection and projection operations?
- Selections are commutative → Order doesn't affect results. - Selections distribute over joins → Applies filters earlier to reduce intermediate results. - Only the last projection in a sequence is needed → Avoids redundant computations. - Projections distribute over joins → Apply projection before joining if possible.
223
How do equivalence rules simplify join operations?
- Natural and theta joins are commutative:- A ⋈ B ≡ B ⋈ A - Joins are associative:- (A ⋈ B) ⋈ C ≡ A ⋈ (B ⋈ C) - Theta joins are associative when constraints affect only certain relations.
224
What factors influence query result size estimation?
- Number of tuples per relation (stored in system catalog). - Distribution of values (e.g., how many customers live in a specific city). - Distinct attribute values in a relation (helps predict selection and join results).
225
How do we estimate result size for selection operations?
- Formula: num_tuples_in_result ≈ (num_tuples_in_relation) / (num_distinct_values_for_attribute) - Assumes uniform distribution of values.
226
How do we estimate the size of a Cartesian product?
- Total number of tuples: |R × S| = |R| * |S| - Bytes per tuple: Tuple size = Size(R) + Size(S) - Total result size: Total size = (|R| * |S|) * (Size(R) + Size(S))
227
How do we estimate natural join size?
- If common attribute is a key in one relation: Each tuple in R₂ joins with at most one tuple in R₁ → Result size is R₂. - If common attribute is not a key:- Assume uniform distribution: Result size ≈ (|R₁| * |R₂|) / (num_distinct_values_of_common_attribute)
228
What factors contribute to query execution cost?
- Disk access cost dominates overall performance. - Query cost is measured using: Number of seeks × average seek cost. Number of blocks read × average read cost. Number of blocks written × average write cost. - Index height affects lookup cost (e.g., B+ Tree height).
229
How does index availability affect equality selection cost?
- No index: Scan entire relation → Total cost = Num_blocks + Seek_time. - Primary (clustering) index on key: Retrieve one record efficiently. Cost = Index height + 1 seek + 1 transfer. - Primary (clustering) index on non-key: Retrieve multiple records on consecutive blocks. Cost = Index height + seeks + transfers for blocks with matching records. - Secondary index on key or non-key: May require multiple seeks and reads, increasing cost.
230
How is query cost affected when using comparison operations?
- No index: Scan full dataset → Expensive. - Primary index:- A >= v → Use index to find first tuple ≥ v, then scan sequentially. A ≤ v → Scan sequentially from start. - Secondary index: A >= v → Scan index for values ≥ v and retrieve pointers. A ≤ v → Scan index for values ≤ v and retrieve pointers. Records may be spread across multiple blocks, increasing cost