databases Flashcards

Question

What is a relation r?

Answer 1

- a subset of D1 x D2 x ... x Dn - x is the cartesian product

Answer 2

- The relation instance (current values of a relation) is often displayed as a table. - Each element t of r is a tuple, represented as a row in the table.

Answer 3

- Superkey (K): A set of attributes sufficient to uniquely identify a tuple in relation r(R). - Candidate Key: A minimal superkey (no unnecessary attributes). - Primary Key: One candidate key is chosen as the main identifier. - Foreign Key Constraint: A value in one relation must appear in another.

Answer 4

() - Primary keys are underlined - Foreign keys are followed by an asterisk - Attributes are optionally followed by domain information

Answer 5

- The Select (σ) operation retrieves rows from a relation that satisfy a given condition. - Notation: σ_condition(Relation) - Purpose: Filters tuples based on specified criteria, keeping only relevant data. - Operates horizontally (filters rows, not columns).

Answer 6

- Π (Project): Returns all records but only specified fields.- SQL Equivalent: SELECT column_name FROM table_name - σ (Select): Retrieves all records that satisfy a condition.- SQL Equivalent: SELECT * FROM table_name WHERE condition - × (Cartesian Product): Combines two tables, producing all possible record combinations. - ⨝ (Natural Join): A Cartesian product that selects records with matching values in same-named fields, eliminating redundant columns.

Answer 7

- Selection (σ) → Retrieves rows that match a condition. - Projection (Π) → Selects specific columns from a relation. - Renaming (ρ) → Renames attributes or relations for clarity. - Difference (−) → Returns tuples in one relation but not in another. - Cartesian Product (×) → Combines all possible tuples from two relations. - Intersection (∩) → Retrieves tuples common to both relations.

Answer 8

A strong entity set reduces to a schema with the same attributes – module ( module_code, title, credits )

Answer 9

A weak entity set reduces to a schema that includes foreign key column(s) for the primary key of the identifying entity set – section ( module_code*, name, semester, year, description )

Answer 10

A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set. – Also those primary keys will be foreign keys

Answer 11

Many-to-one and one-to-many relationship sets can be represented by adding a foreign key attribute to the side whose entities can only participate once in the relationship set -> “many side” – If relationship is partial on that side this could result in null values

Answer 12

- does not require a schema

Answer 13

For one-to-one relationship sets, either side can be chosen for the foreign key attribute – Extra attribute can be added to the schema for either of the two entity sets (but not both)

Answer 14

Composite attributes are flattened out by creating a separate attribute for each component attribute

Answer 15

A multivalued attribute M of an entity set E is represented by a separate schema EM – Schema EM has attributes corresponding Each value of the multivalued attribute maps to a separate tuple of the relation on schema EM

Answer 16

* Derived attributes should be ignored in the relational schemata

Answer 17

* If inheritance is total, no schema for the superclass (person) is required. Can be defined as a “view” relation containing union of subclass relations * Form a schema for each lower level entity set with all local and inherited attributes * But if inheritance is total and the superclass has a relationship with other entity set, we will need: – Form a schema for the superclass – Form a schema for each subclass, with local attributes and a foreign key to the superclass * Also, if inheritance is total and the superclass has a multivalued attribute, and that multivalued attribute needs a schema, then we will need: – Form a schema for the superclass – Form a schema for each subclass, with local attributes and a foreign key to the superclass * Also, if inheritance is total and the superclass has a relationship with weak entity set, then we will need: – Form a schema for the superclass – Form a schema for each subclass, with local attributes and a foreign key to the superclass

Answer 18

- A functional dependency (FD) describes a relationship between attributes. - Notation: X → Y (If X is known, Y is uniquely determined).

Answer 19

- A partial dependency exists when a non-prime attribute is functionally dependent on part of a composite key, instead of the entire key. - Occurs in 2NF (eliminated in 3NF). Example: In relation (Student_ID, Course_ID → Course_Name, Student_Name), if Student_Name depends only on Student_ID, it’s a partial dependency.

Answer 20

Back: - A functional dependency X → Y is trivial if Y is a subset of X. - Example: {A, B} → A is trivial because A is part of {A, B}. - Trivial FDs always hold and do not provide additional constraints.

Answer 21

- A full functional dependency occurs when an attribute is dependent on the entire primary key, not just part of it. - Example: In relation (Student_ID, Course_ID → Grade), Grade depends on both attributes together, making it a full dependency.

Answer 22

- A transitive dependency occurs when an attribute depends on another non-key attribute, instead of the primary key directly. - Occurs in 3NF (eliminated in BCNF). - Example: In Student(ID → Dept_ID, Dept_Name), if Dept_Name depends on Dept_ID, which in turn depends on Student_ID, it’s a transitive dependency.

Answer 23

- A candidate key is a minimal set of attributes that uniquely identifies tuples. - Steps: 1. Find all functional dependencies. 2. Determine which attributes uniquely identify all other attributes. 3. Ensure the key is minimal (removing any unnecessary attributes).

Answer 24

- Superkey: A set of attributes that uniquely identifies a tuple. - Candidate Key: A minimal superkey. - Primary Key: A chosen candidate key. - Functional Dependency: A rule where an attribute depends on another (e.g., ID → Name).

Answer 25

- The closure of an attribute set X (denoted X⁺) is the set of all attributes that can be functionally determined from X. - Used to find keys and verify dependencies. - Steps: 1. Start with X. 2. Apply FD rules to expand the set. 3. Continue until no more attributes can be added.

Answer 26

- Reflexivity: If Y ⊆ X, then X → Y (trivial FD). - Augmentation: If X → Y, then XZ → YZ (adding attributes preserves dependency). - Transitivity: If X → Y and Y → Z, then X → Z. Used to infer all valid functional dependencies in a relation.

Answer 27

- Union: If X → Y and X → Z, then X → YZ. - Decomposition: If X → YZ, then X → Y and X → Z. - Pseudo Transitivity: If X → Y and WY → Z, then XW → Z.

Answer 28

- Closure of attribute sets helps determine all possible attributes that can be inferred. - Process:- Start with given attributes. - Apply all known functional dependencies. - Continue until no further attributes can be derived. Useful for finding candidate keys in database design

Answer 29

- F⁺ (Closure of FD set): The set of all functional dependencies derivable from F using inference rules. - α⁺ (Attribute Closure): The set of all attributes functionally implied by α. - Use case:- F⁺ helps find all implied FDs in a relation. - α⁺ is used to check candidate keys for attributes.

Answer 30

- Eliminates data redundancy and minimizes anomalies. - Organizes data efficiently while preserving relationships. - Ensures data integrity and simplifies queries.

Answer 31

- Identify repeating groups → Apply 1NF. - Eliminate partial dependencies → Apply 2NF. - Remove transitive dependencies → Apply 3NF. - Handle remaining anomalies → Apply BCNF or higher.

Answer 32

- Lossless decomposition ensures no data is lost when breaking a relation into smaller tables. - The original relation can be perfectly reconstructed from the decomposed tables.

Answer 33

- A decomposition that allows the original relation to be recovered via joins without introducing spurious tuples. - Ensures data integrity when breaking tables apart.

Answer 34

- A lossy decomposition leads to data loss or the inability to reconstruct the original relation. - Joining decomposed tables may introduce incorrect or missing data.

Answer 35

- Eliminates repeating groups and ensures that all attributes contain atomic values. - Each column has single, indivisible values (no lists or arrays).

Answer 36

- Removes partial dependencies (where non-key attributes depend on part of a composite key). - Requires the table to be in 1NF. - Non-key attributes must depend entirely on the primary key.

Answer 37

- Eliminates transitive dependencies (non-key attributes depending on other non-key attributes). - Requires the table to be in 2NF. - No attribute should depend on anything other than the primary key.

Answer 38

- A stronger version of 3NF, ensuring every determinant is a candidate key. - Requires the table to be in 3NF. - Fixes anomalies that 3NF may not resolve.

Answer 39

- Convert to 1NF → Remove repeating groups, ensure atomicity. - Convert to 2NF → Remove partial dependencies. - Convert to 3NF → Eliminate transitive dependencies. - Convert to BCNF → Ensure every determinant is a candidate key.

Answer 40

- Define data types allowed for each column. - Common types:- INTEGER, FLOAT, DECIMAL → Numeric values. - CHAR, VARCHAR, TEXT → Strings. - DATE, TIME, TIMESTAMP → Date/time values. - BOOLEAN → True/False values.

Answer 41

CREATE TABLE defines a new table structure CREATE TABLE table_name ( column1 datatype constraints, column2 datatype constraints ); CREATE TABLE Students ( ID INT PRIMARY KEY, Name VARCHAR(100), Age INT );

Answer 42

ALTER TABLE modifies an existing table structure ALTER TABLE Students ADD COLUMN Address VARCHAR(255);

Answer 43

DROP TABLE permanently removes a table from the database. DROP TABLE table_name; DROP TABLE Students;

Answer 44

Occurs when no join condition is specified Returns all possible combinations of rows from two tables. SELECT * FROM TableA, TableB;

Answer 45

- Combines data from multiple tables. - Types:- INNER JOIN → Matches rows from both tables. - LEFT JOIN → Includes all rows from the left table, matching from the right. - RIGHT JOIN → Includes all rows from the right table, matching from the left. - FULL JOIN → Includes all rows from both tables.

Answer 46

Changes the name of a column or table. ALTER TABLE Students RENAME TO Learners;

Answer 47

Common string functions: 1. TRIM() → Removes spaces. 2. LENGTH() → Finds string length. 3. UPPER() / LOWER() → Changes case. 4.CONCAT() → Combines strings.

Answer 48

ORDER BY sorts results in ascending (ASC) or descending (DESC) order. SELECT Name, Age FROM Students ORDER BY Age DESC;

Answer 49

- NULL represents missing or unknown data. - NULL is not the same as zero or empty string.

Answer 50

SQL uses TRUE, FALSE, and UNKNOWN (NULL) logic. Comparisons involving NULL return UNKNOWN. SELECT * FROM Students WHERE Age = NULL; -- Incorrect! SELECT * FROM Students WHERE Age IS NULL;

Answer 51

Aggregate functions perform calculations on a set of values and return a single result. Common functions: COUNT() → Returns the number of rows. SUM() → Adds numeric values. AVG() → Computes average of values. MAX() / MIN() → Finds the highest or lowest value. SELECT AVG(Salary) FROM Employees;

Answer 52

GROUP BY is used with aggregate functions to group data into categories. SELECT column, aggregate_function(column) FROM table GROUP BY column; SELECT Department, COUNT(*) FROM Employees GROUP BY Department;

Answer 53

HAVING filters groups based on aggregate conditions. Works with GROUP BY, unlike WHERE, which filters rows SELECT Department, AVG(Salary) FROM Employees GROUP BY Department HAVING AVG(Salary) > 50000;

Answer 54

- SQL modification statements allow updating, inserting, and deleting data. - INSERT INTO → Adds new records. - UPDATE → Modifies existing records. - DELETE → Removes records.

Answer 55

INSERT INTO adds new rows to a table. INSERT INTO table_name (column1, column2) VALUES (value1, value2); INSERT INTO Students (ID, Name) VALUES (101, 'Alice');

Answer 56

UPDATE modifies existing values in a table UPDATE table_name SET column1 = value1 WHERE condition; UPDATE Employees SET Salary = Salary + 5000 WHERE Department = 'HR';

Answer 57

DELETE removes rows from a table. DELETE FROM table_name WHERE condition; DELETE FROM Students WHERE ID = 101;

Answer 58

CASE allows conditional logic within an SQL statement. Used in UPDATE statements for selective modifications UPDATE Employees SET Salary = CASE WHEN Department = 'HR' THEN Salary + 5000 WHEN Department = 'IT' THEN Salary + 3000 ELSE Salary END;

Answer 59

Rules that maintain data accuracy and consistency. NOT NULL → Prevents empty values. UNIQUE → Ensures column values are distinct. CHECK → Enforces specific conditions FOREIGN KEY → Ensures referential integrity between tables.

Answer 60

Constraints applied to a single table: 1. NOT NULL → Column cannot have NULL values. CREATE TABLE Students ( ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL ); 2. UNIQUE → Ensures distinct values. CREATE TABLE Users ( Email VARCHAR(255) UNIQUE ); 3. CHECK → Restricts column values based on a condition. CREATE TABLE Employees ( Salary INT CHECK (Salary > 30000) );

Answer 61

Ensures data consistency between related tables. A foreign key references a primary key in another table. CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(ID) );

Answer 62

Constraints are added during table creation. CREATE TABLE Students ( ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL, Age INT CHECK (Age >= 18) );

Answer 63

Occurs when a foreign key references a non-existent primary key. INSERT INTO Orders (OrderID, CustomerID) VALUES (101, 999); -- Fails if CustomerID 999 doesn't exist in Customers table. Solutions: 1. Ensure referenced value exists. 2. Use ON DELETE CASCADE to automatically remove dependent rows.

Answer 64

Ensures complex conditions hold globally in the database. Constraints that apply across multiple tables. CREATE ASSERTION CheckSalary CHECK (NOT EXISTS ( SELECT * FROM Employees WHERE Salary < 30000 ));

Answer 65

Assigns names to constraints for easy management. CREATE TABLE Students ( ID INT CONSTRAINT ID_PK PRIMARY KEY, Age INT CONSTRAINT Age_Check CHECK (Age >= 18) );

Answer 66

A subquery inside another SELECT, FROM, or WHERE clause. Allows advanced filtering and calculations SELECT Name FROM Students WHERE Age > (SELECT AVG(Age) FROM Students);

Answer 67

IN checks if a value exists within a set. NOT IN excludes specific values SELECT Name FROM Employees WHERE Department IN ('HR', 'IT');

Answer 68

SOME → Returns TRUE if at least one comparison is true. ALL → Returns TRUE if all comparisons are true. SELECT Name FROM Employees WHERE Salary > ALL ( SELECT Salary FROM Interns );

Answer 69

Use EXISTS or NOT EXISTS to check for data presence. SELECT 'No Employees Found' WHERE NOT EXISTS (SELECT * FROM Employees);

Answer 70

Variables that link an inner subquery to the outer query. SELECT E.Name FROM Employees E WHERE EXISTS ( SELECT * FROM Projects P WHERE P.Manager_ID = E.ID );

Answer 71

Use GROUP BY and HAVING COUNT() > 1 to find duplicates SELECT Name, COUNT(*) FROM Students GROUP BY Name HAVING COUNT(*) > 1;

Answer 72

Subqueries used as temporary tables for further selection. SELECT AVG(Age) FROM ( SELECT Age FROM Students WHERE Gender = 'Female' ) AS FemaleStudents;

Answer 73

A subquery that returns a single value. Used in SELECT, WHERE, and HAVING clauses. SELECT Name, (SELECT MAX(Salary) FROM Employees) AS MaxSalary FROM Employees;

Answer 74

- Combines data from multiple tables based on a common attribute. - Used to avoid duplication and ensure consistency.

Answer 75

Specifies how tables are related when joining. SELECT Students.Name, Courses.CourseName FROM Students JOIN Enrollments ON Students.ID = Enrollments.Student_ID;

Answer 76

- INNER JOIN → Matches records in both tables. - OUTER JOIN → Includes unmatched records. - LEFT JOIN → All records from the left table + matches from the right. - RIGHT JOIN → All records from the right table + matches from the left. - FULL JOIN → All records from both tables, filling in missing values. Joins are operations that combine related tables based on shared attributes.

Answer 77

- Joined relations combine data logically from separate tables. - Helps maintain data integrity and efficiency in queries.

Answer 78

UDTs allow customization of data types for specific use cases. Helps enforce structured data storage. CREATE TYPE PhoneNumber AS VARCHAR(15);

Answer 79

- DATE → Stores date values (YYYY-MM-DD). - TIME → Stores time values (HH:MM:SS). - TIMESTAMP → Stores both date and time. - INTERVAL → Represents time differences.

Answer 80

Domains restrict valid values for attributes. CREATE DOMAIN AgeDomain AS INT CHECK (VALUE >= 18);

Answer 81

- LOBs store large binary/text data, like images or documents. - Types: 1. BLOB → Binary Large Object. 2. CLOB → Character Large Object. - Used for media storage in databases.

Answer 82

A virtual table based on an SQL query. Does not store data, just a saved query definition CREATE VIEW EmployeeView AS SELECT Name, Salary FROM Employees;

Answer 83

- Views can be updated if they follow simple rules: 1. Based on a single table. 2. No aggregations or derived columns. UPDATE EmployeeView SET Salary = 50000 WHERE Name = 'Alice';

Answer 84

A saved query result that stores actual data. Improves performance for frequently accessed data. CREATE MATERIALIZED VIEW SalesSummary AS SELECT Category, SUM(Revenue) FROM Sales GROUP BY Category;

Answer 85

Speeds up search queries by organizing data efficiently. Types: Clustered, Non-clustered, Unique, Full-text. CREATE INDEX idx_employee_name ON Employees(Name);

Answer 86

- Controls database access and privileges. - Users are granted specific permissions (SELECT, INSERT, DELETE).

Answer 87

DB admin (DBA) or users with GRANT OPTION can grant privileges. GRANT SELECT ON Employees TO User1 WITH GRANT OPTION;

Answer 88

Grants permission to users. GRANT INSERT, UPDATE ON Orders TO User2;

Answer 89

Removes previously granted permissions. REVOKE SELECT ON Employees FROM User1;

Answer 90

Functions return a single value. Procedures perform actions without returning values. CREATE FUNCTION GetTotalSales() RETURNS INT AS BEGIN RETURN (SELECT SUM(SalesAmount) FROM Sales); END;

Answer 91

Automatic execution of SQL code when an event occurs. Types: BEFORE, AFTER, INSTEAD OF triggers. CREATE TRIGGER UpdateInventory AFTER INSERT ON Orders FOR EACH ROW BEGIN UPDATE Inventory SET Stock = Stock - NEW.Quantity WHERE ProductID = NEW.ProductID; END;

Answer 92

- Similar to standard SQL triggers, but MariaDB lacks INSTEAD OF triggers. - Uses BEFORE and AFTER triggers for automation.

Answer 93

- Common trigger events:- INSERT → Fires when a new row is added. - UPDATE → Fires when a row is changed. - DELETE → Fires when a row is removed.

Answer 94

- When performance is impacted (triggers slow bulk operations). - When business logic should be managed at the application level. - Complex cascading triggers can cause unintended effects.

Answer 95

A security vulnerability that allows attackers to manipulate SQL queries. Occurs when user input is improperly sanitized in SQL commands. Prevention: Use prepared statements and proper input validation. SELECT * FROM Users WHERE Username = '$input';

Answer 96

Separates SQL syntax from user input, preventing manipulation. $stmt = $conn->prepare("SELECT * FROM Users WHERE Username = ?"); $stmt->bind_param("s", $username); $stmt->execute();

Answer 97

- Metadata stores information about tables, columns, indexes, and relationships. - Helps in query optimization and schema management. Example using JDBC: DatabaseMetaData metaData = connection.getMetaData(); System.out.println(metaData.getDatabaseProductName());

Answer 98

Java Database Connectivity (JDBC) enables Java applications to interact with databases. Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb", "user", "password");

Answer 99

Uses commit() and rollback() for managing transactions. conn.setAutoCommit(false); Statement stmt = conn.createStatement(); stmt.executeUpdate("UPDATE Accounts SET balance = balance - 100 WHERE ID = 1"); conn.commit();

Answer 100

import mariadb conn = mariadb.connect(user="user", password="password", database="testdb")

Answer 101

A server-side scripting language used for web development. Embedded within HTML.

Answer 102

Uses echo or print statements. echo "

Welcome to My Website

";

Answer 103

Uses $ before variable names. $name = "Alice"; echo "Hello, " . $name;

Answer 104

Uses define() function. define("SITE_NAME", "MyWebsite"); echo SITE_NAME;

Answer 105

* Define required values: – define ( 'DB_USER', 'rr71' ); – define ( 'DB_PASSWORD', 'abc32d0' ); – define ( 'DB_HOST', 'localhost' ); – define ( 'DB_NAME', 'rr71_db' ); * Connecting to MariaDB server: – $dbc = mysqli_connect( DB_HOST, DB_USER, DB_PASSWORD, DB_NAME ); * Changing database: – mysqli_select_db( $dbc, 'rr71_other_db' );

Answer 106

Use try-catch for exception handling. try { $conn = new mysqli("localhost", "user", "password", "database"); } catch (Exception $e) { echo "Connection failed: " . $e->getMessage(); } * mysqli_error ( $dbc ) – Returns last error returned by MariaDB * mysqli_errno ( $dbc ) – Returns error number corresponding to an error returned by MariaDB * @ – If prefixed to a function name, suppresses any error messages or warnings that might be invoked * die – Terminates execution of script and sends parameter to client browser

Answer 107

Using mysqli_query(). $result = $conn->query("SELECT * FROM Users");

Answer 108

* For queries which return results $result = mysqli_query( $dbc, $query_str); while ($row = mysqli_fetch_array($result)) { // process $row } * Optional extra parameter to indicate indexing – MYSQL_NUM (for example, $row[1] ) * equivalent to using mysqli_fetch_row($result) instead – MYSQL_ASSOC (for example, $row[id] ) * equivalent to using mysqli_fetch_assoc($result) instead – MYSQL_BOTH (can use both forms)

Answer 109

Uses MySQL or PostgreSQL modules. const mysql = require("mysql"); const conn = mysql.createConnection({ host: "localhost", user: "root", password: "", database: "testdb" }); conn.connect();

Answer 110

Uses the .query() function. conn.query("SELECT * FROM Users", (err, results) => { if (err) throw err; console.log(results); });

Answer 111

Uses prepared statements to prevent injection. conn.query("SELECT * FROM Users WHERE Username = ?", [username], (err, results) => { if (err) throw err; console.log(results); });

Answer 112

* Primary storage – Fast, Expensive, Volatile – Holds data to be operated on * Secondary storage – Moderately fast, Cheaper, Non-volatile – Holds data not currently being operated on * Tertiary storage – Slow, Cheap, Non-volatile, Durable – Generally holds archival data

Answer 113

- Small units of storage used for reading/writing data. - Data is stored and transferred in fixed-size blocks. - Optimized for disk I/O operations.

Answer 114

Database is stored as a collection of files. – Each file is a sequence of records. * File may or may not correspond to a single database table – Each record is a sequence of fields. * Files are maintained by the underlying OS – stored permanently on disks. * Files are mapped onto disk blocks – Can be specified when a database instance is created * Typically, 4 to 8 kilobytes by default

Answer 115

- Every record has a fixed size, simplifying storage management. - Used for structured, predictable data formats. For example: * Consider a file of instructor records defined in pseudocode as follows: type instructor = record ID varchar (5); name varchar (20); dept_name varchar (20); salary numeric (8,2); end * Let each character use 1 byte and numeric (8,2) 8 bytes, then max no of bytes for each record is 53 bytes. – Block size unlikely to be a multiple of 53

Answer 116

– Record size is fixed – Each file has records of one particular type only – Different files are used for different relations – Each record is entirely contained in a single block – No record is larger than a block

Answer 117

- Deleted records leave gaps in storage. - Solutions:- Use a special marker for deleted records. - Maintain a free list for reusing space. Steps: 1. Compact records i.e. shift records left 2. Move the last record into the gap 3. Do not move the records but fill the gap on next insertion

Answer 118

- A list tracking unused or deleted record slots. - Helps reallocate space efficiently without fragmentation. * Store the address of the first deleted record in the file header * Use this first record to store the address of the second deleted record – Effectively “hide” this address in unused space – No overhead * Can think of these stored addresses as “pointers” * More space-efficient – Extra information stored in unused space

Answer 119

- Records do not have a fixed size, leading to flexible storage. - Two structures:- Single-record structure → Each record has length metadata. - Block structure → Records are grouped in blocks with pointers.

Answer 120

- Sequential organization → Sorted records for fast retrieval: 1. Records are stored in sorted order. 2. Efficient for batch processing, but slow updates. - Heap organization → Unordered storage, fast inserts: 1. Records are stored in any order. 2. Fast inserts, but slower searches. - Clustered organization → Groups related records together: 1. Stores multiple related tables together for efficient access. 2. Reduces disk I/O when retrieving linked data. - Hashing: a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed

Answer 121

- a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed

Answer 122

- Stores metadata about database objects. - Includes information on tables, columns, indices, and constraints.

Answer 123

* A database file is partitioned into fixed-length storage units called blocks. * Blocks are units of both storage allocation and data transfer. * Database systems seek to minimise the number of block transfers between the disk and memory. – We can reduce the number of disk accesses by keeping as many blocks as possible in main memory. * Buffer – portion of main memory available to store copies of disk blocks. * Buffer manager – subsystem responsible for allocating buffer space in main memory

Answer 124

* Programs call on the buffer manager when they need a block from disk. – If the block is already in the buffer, buffer manager returns the address of the block in main memory – If the block is not in the buffer, the buffer manager * Allocates space in the buffer for the block – Replacing (throwing out) some other block, if required, to make space for the new block. – Replaced block written back to disk only if it was modified since the most recent time that it was written to/fetched from the disk. * Reads the block from the disk to the buffer, and returns the address of the block in main memory to requester.

Answer 125

* Most operating systems replace with the least recently used (LRU) strategy * Idea behind LRU is to use past pattern of block references as a predictor of future references * However, queries have well-defined access patterns (such as sequential scans), and a database system can use the information in a user’s query to predict future references * LRU can be a bad strategy for certain access patterns involving repeated scans of data FIFO → Removes oldest pages. MRU → Prefers removing most recently used pages.

Answer 126

- Speeds up searches by mapping key values to records. - Types: Primary, Secondary, Dense, Sparse, Multi-level.

Answer 127

- Primary index → Built on primary keys, sorted order. - Secondary index → Built on non-key attributes for faster searches.

Answer 128

- Dense index → Stores an index entry for every record. - Sparse index → Stores entries for selected records, reducing space usage.

Answer 129

- Creates hierarchical index levels. - Higher levels store pointers to lower index levels, reducing search time.

Answer 130

- A balanced multi-level index structure. - Leaf nodes contain actual data for fast retrieval. * Advantage: – automatically reorganises itself with small, local changes * Minor Disadvantage: – insertions and deletions are costlier – space overhead * Advantages outweigh disadvantages: – B+ trees are used extensively * Structure: – balanced tree – multi-level index

Answer 131

If a key is deleted, adjacent entries must be updated to maintain search efficiency.

Answer 132

- Balanced structure ensures efficient searching. - Leaf nodes store actual data while non-leaf nodes store pointers. - Supports efficient range queries.

Answer 133

1. Start with the root node 1a. Examine the node for the smallest search-key value Kj such that k < Kj 1b. If such a value exists, follow pointer Pj to the child node 1c. Otherwise Km-1 ≤ k, where m is the number of pointers in the node. Follow Pm to the child node 2. Repeat the 3 sub-steps above until a leaf node is found 3. If for some i, key Ki = k * Follow pointer Pi to the desired record or bucket. 4. Otherwise no record with search-key value k exists

Answer 134

- If a non-leaf node exceeds the max keys, it splits into two. - The middle key is promoted to the parent node. - Ensures the tree remains balanced for efficient search Steps: 1. Take the n search key values listed in order * n − 1 in node already, plus new one 2. Put the first n/2 in existing node 3. Put the rest in a new leaf node p 4. Insert p after existing node by updating the pointers between leaf nodes 5. Insert a search-key/pointer pair (k,p ) in the parent of the node being split * where k is the least key in p 6. If the parent is full, split it and propagate the split further up

Answer 135

Performance of B+-trees can be poor for write-intensive workloads * Inserting entries one-at-a-time into a B+-tree requires ³ 1 IO per entry – Inefficient for inserting large number of entries at a time or for write-intensive workloads * Efficient alternative 1: – Sort entries first – Insert in sorted order * Insertion will go to existing page (or cause a split) * Much improved IO performance, but most leaf nodes half full * Efficient alternative 2 (AKA Bottom-up B+-tree construction) – Sort entries first – Create tree layer-by-layer, starting with leaf level – Implemented as part of bulk-load utility by most database systems

Answer 136

1) Sort the values 2) Create the linked list representing the leaf nodes 3) Create the next level of the tree – Build nodes left to right – Split overfull nodes using the normal method for splitting non-leaf nodes * Propagate up as necessary

Answer 137

- Optimized to handle high write throughput efficiently. - Techniques include: 1. LSM Trees (log-based indexing). 2. Buffer Trees (delayed updates for fewer disk writes).

Answer 138

- Optimized for write-heavy workloads. - Data is first written to memory, then periodically merged to disk. - Improves write speed but requires efficient merging policies.

Answer 139

* Deletion handled by adding special “delete” entries – Lookups find both original entry and “delete” entry * Return only entries that do not have matching “delete” entry – When trees are merged, omit entries with matching “delete” entry * Update handled using delete of original entry and insert of updated entry

Answer 140

Back: ✅ Benefits: - Faster writes due to memory buffering. - Efficient bulk insertions. - Handles high update throughput. ❌ Drawbacks: - Merge overhead increases read latency. - Deletes remain until compaction occurs.

Answer 141

* Stepped-merge index – Variant of LSM tree with multiple trees at each level – Reduces write cost compared to LSM tree – But queries are even more expensive

Answer 142

- Delays updates in buffers before applying them to disk. - Reduces disk write overhead by grouping operations. - Efficient for batch inserts and deletes.

Answer 143

✅ Benefits: - Minimizes disk writes with batch processing. - Improves indexing for write-heavy workloads. ❌ Drawbacks: - Requires efficient buffer management. - Delays updates, making real-time queries harder.

Answer 144

A bucket is a unit of storage containing one or more entries (a bucket is typically a disk block). – we obtain the bucket of an entry from its search-key value using a hash function * Hash function ℎ is a function from the set of all search-key values K to the set of all bucket addresses B. * Hash function is used to locate entries for access, insertion and deletion. * Entries with different search-key values may be mapped to the same bucket – Thus, the entire bucket must be searched sequentially to locate an entry. * In a hash index buckets store entries with pointers to records * In a hash file-organization buckets store records

Answer 145

Worst hash function maps all search-key values to the same bucket An ideal hash function is uniform An ideal hash function is pseudo-random Typical hash functions perform computation on the internal binary representation of the search-key

Answer 146

Hash function is static (mapping search key values to a fixed set of bucket addresses) * Effect of unsuitable set size (number of buckets): – too large: space is wasted, too many buckets – too small: too many keys map to a given bucket (overflow buckets are created) performance suffers Rule of thumb – choose bucket size to be 1 block – choose number of buckets to be nb = nr/fr – nr is (expected or actual) number of records – fr is number of records that fits in a bucket * Try to limit waste while reducing chance of overflow – choose number of buckets to be nb = (nr/fr)(1+d) – d is a ‘fudge’ factor e.g. 0.2 to reduce risk of overflow by allowing for 20% waste

Answer 147

* Bucket overflow can occur because of – Insufficient buckets – Skew in distribution of records because * multiple records have same search-key value * chosen hash function produces non-uniform distribution of key values * Overflow can be handled by using overflow buckets 12: – Overflow chaining – the overflow buckets of a given bucket are chained together in a linked list

Answer 148

Extendable hashing: – Allows the database to grow and shrink by splitting and coalescing buckets, thus retaining space efficiency – Reorganisation performed one bucket at a time, so overhead is incremental

Answer 149

✅ Extendable Hashing is a dynamic hashing technique that grows and shrinks based on the number of entries, optimizing storage efficiency. Key Features: - Uses a directory of bucket pointers. - Binary representation of keys determines bucket placement. - Expands dynamically by doubling directory size when buckets overflow. - Prevents excessive collisions and minimizes reorganization costs. ✅ Advantages: - Efficient space utilization. - Fast lookup, insertion, and deletion. - Flexible structure adapts to data growth.

Answer 150

1. Compute ℎ(Kj)) = X 2. Use the first i bits of X as a decimal index into bucket address table, and follow the pointer to appropriate bucket, then do a sequential search for the record

Answer 151

* To insert record with search key value Km – lookup as before to find bucket, say j – if enough room in bucket, add record to bucket – if no room in the bucket it must be split and entries redistributed between the original and new bucket

Answer 152

Case 1: ij < i: * More than one cell in bucket address table points to bucket j * Allocate new bucket z *Rehash all records in bucket j and take the first ij bits of their hash

Answer 153

Only one cell in bucket address table points to bucket j Re-try the insertion of the record from the start!

Answer 154

* Lookup to find bucket * Delete bucket if now empty * Periodic reorganisation possible but not necessary benefits: Hash performance does not degrade with growth of file minimal space overhead rehashing cons: Extra level of indirection to find desired record Bucket address table may itself become very big (larger than memory)

Answer 155

- Multiple ways to execute a query, each with different costs. - Disk transfer times significantly affect performance. - Optimizing execution reduces resource consumption. - Even single-use queries can benefit from optimization.

Answer 156

- Different SQL queries may produce equivalent results. - Users may not write the most efficient query. - The system rewrites queries into more efficient equivalents. - This process is called Query Optimisation.

Answer 157

1. Parser & Translator:- Checks syntax & table references. - Translates into relational algebra. 2. Optimiser:- Converts query into a more efficient form. - Creates a query execution plan. 3. Evaluation:- Executes the query evaluation plan. - Returns final results.

Answer 158

- Generates multiple execution plans for evaluation. - Converts queries into equivalent relational algebra expressions. - Different algorithms affect execution speed. - Estimates costs based on database statistics. - Chooses the lowest-cost execution plan.

Answer 159

- Queries are translated into relational algebra expressions. - Multiple equivalent expressions exist for the same query. - The system finds the most efficient execution order.

Answer 160

- Number of tuples per table. - Size of tuples (storage space per row). - Number of distinct values per attribute. - Intermediate result size estimations. - Cost estimations of different execution algorithms.

Answer 161

- Estimates query execution cost for various plans. - Uses data dictionary statistics to make predictions. - Selects the plan with the least resource consumption.

Answer 162

- Parsing → Validates SQL syntax and ensures query correctness. - Translation → Converts SQL query into relational algebra for optimization. - Ensures queries are interpretable by the database engine.

Answer 163

- Checks for syntax errors in SQL queries. - Verifies referenced tables and attributes exist in the schema. - Generates a parse tree representing query structure.

Answer 164

- Relational algebra is structured for optimization. - Helps the system generate equivalent, efficient execution plans. - Converts SQL into basic operations (selection, projection, join, etc.)

Answer 165

Hierarchical representation of query structure. Helps optimizer understand dependencies among operations. SELECT ├── FROM (Students) ├── WHERE (Age > 18)

Answer 166

- Parse SQL query → Validate syntax & semantics. - Translate to relational algebra → Express query using logical operations. - Pass expression to query optimizer for efficiency evaluation.

Answer 167

- Atomicity → The transaction is all-or-nothing; either all operations succeed or none do. - Consistency → Transactions ensure the database remains valid and follows integrity rules. - Isolation → Each transaction executes as if it were alone, unaffected by other concurrent transactions. - Durability → Once committed, changes persist, even after system failures.

Answer 168

- Active → Initial state; transaction is executing. - Partially Committed → Final statement has executed but not yet fully committed. - Failed → Execution cannot proceed due to an error. - Aborted → Transaction rolled back, database restored. Restart if no logical error. Kill and report failure if needed. - Committed → Transaction successfully completed and changes are permanent.

Answer 169

- Multiple users need simultaneous access. - Database must maintain isolation illusion (each user feels sole access). - Improves system performance while ensuring consistency. - Requires concurrency control to prevent interference

Answer 170

- Defines the order of transaction execution. - Includes all instructions for a set of transactions. - Preserves instruction order within each transaction. - Helps enforce isolation without relying on specific database designs.

Answer 171

- Transactions are executed one after another (no interleaving). - Ensures consistent results without transient inconsistencies. - Example: Transfer £50 from Account A to Account B. Transfer 10% of Account A's funds to Account B.

Answer 172

- Different serial schedules may yield different results. - Example: Start balance: £1000 (A) & £2000 (B). Schedule 1 → Transfer first, then percentage: Result: £855 (A), £2145 (B). - Schedule 2 → Percentage first, then transfer: Result: £850 (A), £2150 (B). - Total balance remains the same in both cases.

Answer 173

- Transactions execute concurrently with interleaved instructions. - Some non-serial schedules are correct, meaning they maintain consistency. - Others may produce unexpected results if control mechanisms aren’t applied.

Answer 174

- Each transaction individually maintains consistency, but: Write operations may be overwritten by other transactions. Sum of values may not be preserved correctly. - Temporary inconsistencies can occur during execution. - To be correct, the schedule must be: Serial (executed one-by-one). Or equivalent to a serial schedule in net effect.

Answer 175

- Internal calculations are ignored—only read/write operations matter. - If two consecutive operations refer to different data items, they can swap without affecting results. - If they refer to the same data item, their execution order may impact correctness.

Answer 176

- Case 1: Both transactions read → Order does not matter. - Case 2: One reads, one writes → Order matters (read must occur before the write). - Case 3: One writes, one reads → Order matters (read depends on the previous write). - Case 4: Both transactions write → Order matters (the first value written is lost).

Answer 177

- Two schedules are conflict equivalent if one can be transformed into the other by swapping non-conflicting operations. - Conflicting operations involve: Reads/Writes on the same data item. Writes that overwrite values. - A schedule is conflict serialisable if: It is equivalent to some serial schedule, meaning it preserves isolation

Answer 178

- Construct a directed precedence graph: Vertices → Represent transactions. Edges → Add an edge T₁ → T₂ if: 1. T₁ writes before T₂ reads (read-after-write). 2. T₁ reads before T₂ writes (write-after-read). 3. T₁ writes before T₂ writes (write-after-write). - Significance of edges: If T₁ → T₂ exists, then T₁ must execute before T₂ in any equivalent serial schedule. - Conflict serialisability: If the precedence graph has no cycles, the schedule is conflict serialisable.

Answer 179

- Depends on cycles: No cycles → Conflict serialisable (can be transformed into a serial schedule). Cycles → Not conflict serialisable (some operations interfere). - Test for conflict serialisability: Construct a precedence graph. Check for cycles.

Answer 180

- Ensures that schedules are logically equivalent to serial execution. - However, it does not fully guarantee isolation at the application level.

Answer 181

- Locks control concurrent access to data items. - A transaction must acquire a lock before modifying an item. - Two types of locks: Exclusive (x-lock) → Allows reading and writing. Shared (s-lock) → Allows only reading. - Managed by a concurrency-control manager.

Answer 182

❌ Deadlock → Transactions wait indefinitely for each other’s locks. - Solution: Rollback one transaction and release locks. ❌ Starvation → A transaction is repeatedly rolled back due to deadlocks. - Solution: Use timestamps or priority scheduling.

Answer 183

- Ensures conflict-serialisable schedules by dividing execution into two phases: 1. Growing Phase: Transactions can acquire locks but cannot release them. 2. Shrinking Phase: Transactions can release locks but cannot acquire new ones. - Guarantees serialisability, but deadlocks can still occur.

Answer 184

- Use a Wait-For Graph (WFG): Vertices → Represent active transactions. Edges → Directed edges T₁ → T₂ indicate T₁ is waiting for T₂'s lock. - Deadlock exists if the WFG has cycles.

Answer 185

- Rollback one transaction (victim) to break deadlock. - Choose victim with minimum cost (less data loss). - Rollback strategies: 1. Total rollback → Abort transaction and restart. 2. Partial rollback → Rollback only enough to resolve deadlock. - Avoid starvation by tracking rollback counts.

Answer 186

- Transactions use timestamps to determine wait behavior. - Strategies include: 1. Wait-Die: Older transactions wait, younger ones are rolled back. 2. Wound-Wait: Older transactions force rollback of younger ones. 3. Timeout-Based: Transactions wait only for a limited time, then rollback.

Answer 187

- Database contents are valuable and must be protected. - Computer systems can fail, so consistency must be ensured. - Two key concerns: Data safety → Prevent loss or corruption. Data consistency → Ensure integrity even after failures.

Answer 188

- Restores database to a consistent state. - Ensures atomicity: No updates from uncommitted transactions. - Ensures durability: Updates from committed transactions persist even after failure.

Answer 189

- Transaction Failure: Logical errors (incorrect input, out of memory). System errors (deadlock, scheduler failures). - System Crash: Power failure, hardware issues, OS errors. - Disk Failure: Corrupt storage, disk malfunction.

Answer 190

- Partial updates may leave the database inconsistent. - Goal → Ensure all updates are applied or none at all. - Solution → Store transaction changes on stable storage before modifying the database.

Answer 191

- Logs track database modifications for recovery. - Transactions write logs before executing changes. - Log format: - T → Transaction ID. X → Data item. Old & new values → Before and after modification.

Answer 192

- Modifications are recorded in logs but only applied after commit. - No changes are made to the database until a transaction is fully committed. - If a crash occurs: Redo operations apply changes if and are in the log.

Answer 193

- Database is updated immediately, before transaction commits. - Undo required in case of failure. - Logs must contain both old and new values. - Recovery steps: Redo committed transactions. Undo uncommitted transactions.

Answer 194

- Prevents unbounded log growth, reducing recovery time. - Periodically flushes all log records to disk. - Writes a checkpoint record to the log. - Removes completed transactions from the log. - During recovery: Only considers transactions active at the last checkpoint.

Answer 195

- Buffers log records in memory instead of writing to storage immediately. - Logs are flushed when the buffer fills up or when log force is triggered. - Log force: Ensures all logs for a transaction (including commit log) are stored. - Reduces I/O costs by batching write operations.

Answer 196

- Logs must be written to stable storage in order of creation. - Transactions commit only after the commit log record is stored. - Before database blocks are written, all related log records must be flushed first. This is called the Write-Ahead Logging (WAL) rule.

Answer 197

- Swapping join order minimizes intermediate result size, improving efficiency. - Uses system catalog statistics, including: Number of tuples in each relation. Tuple sizes. Number of distinct values per attribute. - Optimized strategy:- Join smaller relations first to reduce intermediate result size.

Answer 198

- Selections are commutative → Order doesn't affect results. - Selections distribute over joins → Applies filters earlier to reduce intermediate results. - Only the last projection in a sequence is needed → Avoids redundant computations. - Projections distribute over joins → Apply projection before joining if possible.

Answer 199

- Natural and theta joins are commutative:- A ⋈ B ≡ B ⋈ A - Joins are associative:- (A ⋈ B) ⋈ C ≡ A ⋈ (B ⋈ C) - Theta joins are associative when constraints affect only certain relations.

Answer 200

- Number of tuples per relation (stored in system catalog). - Distribution of values (e.g., how many customers live in a specific city). - Distinct attribute values in a relation (helps predict selection and join results).

Answer 201

- Formula: num_tuples_in_result ≈ (num_tuples_in_relation) / (num_distinct_values_for_attribute) - Assumes uniform distribution of values.

Answer 202

- Total number of tuples: |R × S| = |R| * |S| - Bytes per tuple: Tuple size = Size(R) + Size(S) - Total result size: Total size = (|R| * |S|) * (Size(R) + Size(S))

Answer 203

- If common attribute is a key in one relation: Each tuple in R₂ joins with at most one tuple in R₁ → Result size is R₂. - If common attribute is not a key:- Assume uniform distribution: Result size ≈ (|R₁| * |R₂|) / (num_distinct_values_of_common_attribute)

Answer 204

- Disk access cost dominates overall performance. - Query cost is measured using: Number of seeks × average seek cost. Number of blocks read × average read cost. Number of blocks written × average write cost. - Index height affects lookup cost (e.g., B+ Tree height).

Answer 205

- No index: Scan entire relation → Total cost = Num_blocks + Seek_time. - Primary (clustering) index on key: Retrieve one record efficiently. Cost = Index height + 1 seek + 1 transfer. - Primary (clustering) index on non-key: Retrieve multiple records on consecutive blocks. Cost = Index height + seeks + transfers for blocks with matching records. - Secondary index on key or non-key: May require multiple seeks and reads, increasing cost.

Answer 206

- No index: Scan full dataset → Expensive. - Primary index:- A >= v → Use index to find first tuple ≥ v, then scan sequentially. A ≤ v → Scan sequentially from start. - Secondary index: A >= v → Scan index for values ≥ v and retrieve pointers. A ≤ v → Scan index for values ≤ v and retrieve pointers. Records may be spread across multiple blocks, increasing cost

databases Flashcards

(230 cards)

Welcome to My Website