DataBases Flashcards

Question

Define a relationship and what a degree of relationship means

Answer 1

Relationship (type): meaningful associations among two or more entities. * Degree of Relationship: The number of participating entity types in a relationship e.g Binary if 2, ternary if three and Quaternary if 4

Answer 2

Degree one: recursive ― A relationship between occurrences of the same entity type ― E.g., AcademicStaff (professor) supervises AcademicStaff (lecturer).

Answer 3

Candidate Key: The minimal number of attribute(s) whose value(s) uniquely identify each entity occurrence ― A candidate key can not contain a null Primary Key: The candidate key that is selected to uniquely identify each occurrence of an entity type * Principles of choosing the primary key: ― Attribute length ― Minimal number of attributes required ― The future certainty of uniqueness

Answer 4

Weak Entity Type: existence-dependent on some other entity ― Each entity occurrence cannot be uniquely identified using only the attributes associated with that entity type strong is not that ig

Answer 5

Multiplicity: the number of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship e.g that 1..* stuff Representation of Multiplicity Constraint Meaning 0..1 Zero or one entity occurrence 1..1 (or just 1) Exactly one entity occurrence 0..* (or just *) Zero or many entity occurrences 1..* One or many entity occurrences 3..7 Minimum of 3 up to a maximum of 7entity occurrences 1, 5, 10-12 1 or 5 or 10 to 12 entity occurrences

Answer 6

Cardinality: the maximum values for the multiplicity ranges on either side of the relationship. ― One-to-one (1:1) ― One-to-many (1:*) ― Many-to-many (*:*) Participation: the minimum values for the multiplicity ranges on either side of the relationship ― Optional participation (if 0) ― Mandatory participation (if 1 or more)

Answer 7

The participation of entities appears on the opposite side of the relationship - google if u confused it's kinda wierd

Answer 8

A fan trap is a type of join path between three tables when a "one-to-many" join links a table which is in turn linked by another "one-to-many" join. The fanning out effect of "one-to-many" joins can cause incorrect results to be returned when a query includes objects based on both tables.

Answer 9

A chasm trap is a type of join path between three tables when two "many-to-one" joins converge on a single table, and there is no context in place that separates the converging join paths.

Answer 10

* [Step 1] Identify entity types * [Step 2] Identify relationship types * [Step 3] Identify and associate attributes with entity and relationship types * [Step 4] Determine primary keys for entity types * [Step 5] Identify the multiplicity constraints * [Step 6] Consider use of enhanced modeling (optional) * [Step 7] Check model for redundancy, traps * [Step 8] Validate conceptual model against user transactions * [Step 9] Review conceptual data model with user

Answer 11

Specialisation: The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics. e.g making more entities from one(Manager, SalesPersonnel and Secretary from Staff)

Answer 12

Generalisation: The process of minimizing the differences between entities by identifying their common characteristics.

Answer 13

* A relation has a name that is distinct from all other relation names in the database * Each cell of the relation contains exactly one atomic (single) value * Each attribute has a distinct name * The values of an attribute are all from the same domain * Each tuple is distinct – there are no duplicate tuples * The order of attributes has no significance * The order of tuples has no significance

Answer 14

Relation schema: a named relation defined by the relation name (table name) and attributes. ― 𝑅(𝐴1, 𝐴2, … , 𝐴𝑛) Student (stdID, stdEmail, stdFstName, stdSecName, gender, programCode) Primary Key: stdID, Alternate Key: stdEmail Foreign Key: programCode references Program(programCode) * Relation instance/state: a set of tuples from a relation schema. ― r(R) ^This is like an actual database table

Answer 15

― Entity integrity ― Referential integrity ― Domain constrain

Answer 16

― Uniqueness: No two tuples can have identical values for candidate keys. ― Not-Null: No attribute of a candidate key can be a NULL. * Null: A value of an attribute is currently unknown or not applicable for this tuple.

Answer 17

Referential integrity: If a foreign key exists in a relation, ― either the foreign key value matches a primary key value in its home relation ― or the foreign key value is NULL

Answer 18

Domain Constraints: enables the RDBMS to carry out checks on data errors and to determine the range of operations that can be carried out on the domain ― E.g., if the attribute Age is specified to be in the range16-30, any number outside would give an error ― E.g., If the attribute Name is to hold a set of characters, then any multiplication or division operations on the attribute values are illegal

Answer 19

View: a subset of base relation(s) Base relation: A named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database

Answer 20

Virtual/derived relation. It does not necessarily exist in the database, but can be produced upon request. * Dynamic. The changes made to the base relation(s) that affect the view are immediately reflected in the view. * Generated by applying appropriate relational operations (e.g., 𝜎, 𝜋, x, ⋈, ….)

Answer 21

* Security mechanism * Customize user’s need * Simplify complex operations on the base relations

Answer 22

For composite attributes, include only the constituent simple attributes. So for example staffName which is a composite attribute containing staffFName and staffLName would be ignored and the relation schema would look like this: AcademicStaff(staffID, staffFName,staffLname, academicPosition, gender, DOB)

Answer 23

Normalization: A technique for producing a set of relations with desired properties. ― remove redundancy ― remove potential for insertion, modification, deletion anomalies

Answer 24

if each value of A is associated with exactly one value of B, this is denoted A -> B In this case A is the determinant

Answer 25

C is transitively dependent on A via B ... A -> C

Answer 26

if by removing some attribute(s) from A, the dependency still holds.

Answer 27

B is not functionally dependent on any subset of A.

Answer 28

Repeating groups must be removed, a primary key must exist and duplicate rows must be removed. Additionaly there must be no order dependency

Answer 29

All partial dependencies must be removed

Answer 30

Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key.

Answer 31

All transitive dependencies must be removed

Answer 32

A transitive dependency refers to some non-prime attribute other than the candidate key that depends on another non-prime attribute that is dependent entirely on the candidate key. ^this is hard to remember but it's not that hard to actually do Show_ID Telecast_ID Telecast_Type CD_Cost ($) F08 S09 Thriller 50 F03 S05 Romantic 30 F05 S09 Comedy 20 In this case Telecast_Type needs to be removed cos it depends on telecast ID which in turn depends on Show ID

Answer 33

Every determinant must be a candidate key or Every non-prime attribute must be functionally dependent on the entire primary key

Answer 34

― Projection (𝜋 ) ― Selection (𝜎) ― Rename (𝜌)

Answer 35

― Union (∪) ― Intersection (∩) ― Set difference (−) ― Cartesian Product (X)

Answer 36

― Join (⋈) ― Outer Join (⋉, ⋊) ― Division (÷)

Answer 37

𝝅<𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆 𝒍𝒊𝒔𝒕>(𝑹): Produce a new relation that has only some of 𝑅‘s columns e.g 𝝅ID, Name, Salary(Lecturer) Removes duplicate tuples

Answer 38

𝝈<𝒔𝒆𝒍𝒆𝒄𝒕 𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏>(𝑹): Produce a new relation that contains only those tuples of R that satisfy the condition. e.g 𝝈𝒅𝒆𝒑𝒕𝑰𝑫=′𝟎𝟐′∧ 𝒂𝒈𝒆>𝟓𝟎(𝑳𝒆𝒄𝒕𝒖𝒓𝒆𝒓) or 𝝈barName='Sues'(Sells)

Answer 39

𝝆𝑺 𝑬 : Rename the expression E to S * 𝝆𝑺(𝒂𝟏,𝒂𝟐,…,𝒂𝒏) 𝑬 : Rename the relation E to S, and also re-names the attributes as 𝑎1, 𝑎2, … , 𝑎𝑛. e.g Q: List all the professors, rename the output relation as Professor: 𝜌𝑃𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟(𝜎𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=′𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟′(𝐿𝑒𝑐𝑡𝑢𝑟𝑒𝑟))

Answer 40

Will take the union of both tables (R U S ), both tables must have the same relation schema for this to work

Answer 41

Will take the intersections of both tables (R ∩ S ), both tables must have the same relation schema for this to work

Answer 42

Will take the difference (R - S), records that exist in both R and S are remove from the R table and then that's outputted. Both tables must have the same relation schema for this to work 𝑅 − 𝑆 ≠ 𝑆 − 𝑅

Answer 43

This one is kinda weird * 𝑹 × 𝑺: Produces a relation that is the concatenation of every tuple of relation 𝑅 with every tuple of relation 𝑆. ―If both 𝑅 and 𝑆 have the same attribute 𝐴, then use: 𝑅. 𝐴 and 𝑆. 𝐴 Basically just stick S on the back of R and if they have the same attribute then call it R.attribute and S.attribute. Like for example Bar.barName and Sells.barName

Answer 44

― Theta join (Θ-join) ― Equijoin ― Natural join ― Outer join

Answer 45

* 𝑹 ⋈<𝒋𝒐𝒊𝒏 𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏> 𝑺: Defines a relation that contains all combinations of tuples from 𝑅 and 𝑆 that satisfy the join condition * 𝑅 ⋈<𝑗𝑜𝑖𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛> 𝑆 = 𝜎<𝑗𝑜𝑖𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛>(𝑅 × 𝑆)

Answer 46

* Equijoin: Produces all the combinations of tuples from that satisfy a join condition with only equality comparisons. ―Equijoin is a special case of theta-join.

Answer 47

* 𝑹 ⋈ 𝑺: an equijoin of the two relations R and S over all common attributes.

Answer 48

* SQL is a standard language for interacting with a relational database

Answer 49

It retrieves and updates statements: -Select -Insert -Update -Delete

Answer 50

Define DB structure & control access: ― CREATE ― ALTER ― DROP ― GRANT ― REVOKE

Answer 51

* Comparison: =, <> (!=), <, <=, >=, >, OR, AND, NOT * Range: BETWEEN ... AND, NOT BETWEEN ... AND * Set membership: IN, NOT IN * Pattern match: LIKE, NOT LIKE * IS NULL, IS NOT NULL

Answer 52

* COUNT * MIN * MAX * SUM * AVG

Answer 53

SELECT * FROM Products WHERE Price BETWEEN 10 AND 15; -Note you can add a not in front of between if needed

Answer 54

SELECT * FROM Customers WHERE Country IN ('Germany', 'France', 'UK'); -Note you can add a not before IN for the opposite result

Answer 55

SELECT * FROM Customers WHERE CustomerName LIKE 'a%';

Answer 56

% -- represents zero, one, or multiple characters _ -- represents a single character

Answer 57

SELECT CustomerName FROM Customers WHERE Address IS NULL ; -Not it must be IS NULL or IS NOT NULL, the IS is necessary

Answer 58

* COUNT * MIN * MAX * SUM * AVG

Answer 59

SELECT COUNT(DISTINCT Country) FROM Customers;

Answer 60

SELECT MIN(Price), MAX(Price), AVG(Price)q FROM Products;

Answer 61

GROUP BY: forms group of rows with the same column value. HAVING: filters the groups subject to some condition This is an example of where they are used in a query SELECT [DISTINCT] column_name(s) FROM table_name(s) [WHERE condition] [GROUP BY column_name(s)] [HAVING condition] [ORDER BY column_name(s)];

Answer 62

SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;

Answer 63

― The grouping columns ― Aggregate functions. An example is this: SELECT COUNT(CustomerID), Country, City FROM Customers GROUP BY Country, City;

Answer 64

When aggregate functions are involved; -HAVING clause always includes at least one aggregate function. -WHERE clause can not include aggregate function.

Answer 65

An alias is a temporary name assigned to a table or a column for the duration of a query. Aliases are used to make the output of a query more readable or to shorten lengthy table or column names

Answer 66

Subquery (or nested query) : a complete SELECT statement embedded in another query. ― In WHERE, or HAVING clause SELECT * FROM Customers WHERE Country = (SELECT Country FROM Customers WHERE CustomerName= ‘Ernst Handel’);

Answer 67

Used in cases where the subquery will output more than one value If the inner SELECT statement outputs a single value, use = If multiple values, use IN

Answer 68

The EXISTS operator returns true if the subquery returns one or more rows. It doesn't matter what the actual values of the rows are; the existence of any row is sufficient for the EXISTS condition to be true.

Answer 69

Inner joins in SQL are used to combine rows from two or more tables based on a related column between them. The result includes only the rows where the specified condition is met.

Answer 70

Outer joins in SQL are used to retrieve rows from one or more tables, even if there is no match between the joined tables. Unlike inner joins, which return only the matching rows, outer joins return all rows from at least one of the tables, with the non-matching rows filled with NULL values for columns from the table that does not have a corresponding match.

Answer 71

― LEFT JOIN: returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match. ― RIGHT JOIN: … ― FULL JOIN: Retain both the left table and right table rows. The result is NULL for those no matches.

Answer 72

SELECT o.OrderID, c.CustomerName, o.OrderDate FROM Orders o Natural JOIN Customers c;

Answer 73

A self join in SQL is a specific case where a table is joined with itself. This is useful when you want to compare rows within the same table, treating it as if it were two separate tables. To perform a self join, you typically use aliases to differentiate between the columns from the same table

Answer 74

Insert adds new rows to the table Syntax is: INSERT INTO (Column1, Column2) VALUES (19, 'ThatValue'), (23, 'ThisValue'); or with a whole record the columns are not needed INSERT INTO Orders VALUES (10447, 90, 6, ‘2017-05-25’,3);

What is the syntax for an update statement

Example 1 : Give all the products’ price a 3% increase UPDATE Products SET Price=Price*1.03; Example 2 UPDATE Customers SET ContactName = ‘Alfred Schmidt’, City= ‘Frankfurt’ WHERE CustomerID = ‘001’;

What is the syntax for a delete statement

DELETE FROM

WHERE

What is the syntax for table creation

CREATE TABLE IF NOT EXISTS Tasks ( task_id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255) NOT NULL, start_date DATE, due_date DATE, status TINYINT NOT NULL, description TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); IF NOT EXISTS is optional

What are the main data types in MySql

Numeric types -INT, TINYINT, DECIMAL, FLOAT, DOUBLE string types; -CHAR, VARCHAR, TEXT Dat/time types -DATE -TIME -DATETIME

What are the 5 constraints?

– NOT NULL – UNIQUE – PRIMARY KEY – FOREIGN KEY – CHECK

What does the NOT NULL constraint do?

NOT NULL: ensures values stored in a column are not NULL.

What does the UNIQUE constraint do?

UNIQUE: ensures values in a column or a group of columns are different used like : CREATE TABLE table_name( ..., column_name data_type UNIQUE, ... );

How can constraints work for more than one column

CREATE TABLE table_name( ..., [CONSTRAINT constraint_name] UNIQUE(Column_list) );

What does the check constraint do?

CHECK: ensures that values stored in a column or group of columns satisfy a Boolean expression. used like : CREATE TABLE table_name( ..., column_name data_type CHECK(Boolean_expression), ... );

How is the primary key constraint used?

PRIMARY KEY ― Implicitly means NOT NULL + UNIQUE ― A table can have one and only one primary key. CREATE TABLE table_name( ..., column_name data_type PRIMARY KEY, ... );

How is the foreign key constraint used?

* FOREIGN KEY: refers to the PRIMARY KEY in another table -The domain of the foreign key must be the same as the domain used in the referenced table CREATE TABLE Products( productId INT AUTO_INCREMENT PRIMARY KEY, productName VARCHAR(100) NOT NULL, categoryId INT, CONSTRAINT fk_category FOREIGN KEY (categoryId) REFERENCES categories(categoryId) ) ENGINE=INNODB;

What do these three referential options that can be applied to the foreign key do? CASCADE, SET NULL and RESTRICT/NO ACTION

― CASCADE: If a row from the parent table is deleted or updated, the values of the matching rows in the child table automatically deleted or updated. ― SET NULL: When referenced data in the parent key is deleted or updated, all rows in the child table that depend on those values have their foreign keys set to NULL. ― RESTRICT/NO ACTION: Reject the delete/update operation from the parent table, if a row from the parent table has a matching row in the child table.

Using ALTER TABLE how could you add / drop columns?

ALTER TABLE Persons ADD COLUMN dateOfBirth date NOT NULL, ADD COLUMN joinDate date default CURRENT_DATE(); ALTER TABLE Persons DROP COLUMN dateOfBirth;

Using ALTER TABLE how could you modify existing columns?

ALTER ALTER TABLE Vehicles MODIFY year SMALLINT NOT NULL, MODIFY color VARCHAR(20) NULL AFTER make;

Using ALTER TABLE how can i add or drop a table constraint?

Add a table constraint: ALTER TABLE table_name ADD [CONSTRAINT constraint_name] UNIQUE/CHECK/PRIMARY KEY/FOREIGN KEY (...); drop a table constraint ALTER TABLE table_name DROP CONSTRAINT constraint_name; drop primary/ foreign key ALTER TABLE table_name DROP PRIMARY KEY; ALTER TABLE table_name DROP FOREIGN KEY constraint_name;

Using ALTER TABLE how can i add a constraint for a single table attribute?

ALTER TABLE Persons ADD CONSTRAINT UC_person UNIQUE(ID,lastName);

How do i drop a table in SQL?

DROP TABLE Shippers;

What is AUTO_INCREMENT and how can it be assigned as a constraint

Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table. ― Typically for primary key ― In MySQL, assign AUTO_INCREMENT to a column example: CREATE TABLE Persons ( personId int NOT NULL AUTO_INCREMENT PRIMARY KEY, lastName varchar(255) NOT NULL, firstName varchar(255), Age int );

How can we let the AUTO_INCREMEMENT start with another value instead of 1?

ALTER TABLE Persons AUTO_INCREMENT 100; ^add a number on the end ¯\_(ツ)_/¯

What is an index and what does it do?

A data structure such as B-Tree that improves the speed of data retrieval on a table at the cost of additional writes and storage to maintain it. Note: Updating a table with indexes takes more time than updating a table without. So, only create indexes on columns that will be frequently searched.

How can we index an attribute in SQL and how can we add an index to an already existing attribute?

INDEX ( attributeName ) like : CREATE TABLE Persons ( personid int NOT NULL AUTO_INCREMENT PRIMARY KEY, lastName varchar(255) NOT NULL, firstName varchar(255), age int, INDEX (lastName) ); add to existing attribute: CREATE INDEX idx_persons ON Persons(lastName);

Note : MYSQL automatically creates a special index named PRIMARY for the primary key

^

How can i create a unique index and what does it do?

CREATE UNIQUE INDEX emp_u1 ON EMP (eName, job); This will Index and ensure that each combination of eName and job are unique.

What is the basic syntax to create a view?

CREATE [OR REPLACE] VIEW ViewName [column1,…] AS SELECT …; example horizontal view CREATE VIEW Emp_D20 AS SELECT * FROM Emp WHERE deptNo = 20; example vertical view CREATE VIEW Emp_NoSalary AS SELECT empNo, eName, job, mgr, hireDate, comm, deptNo FROM Emp;

How can i create a view for multiple tables?

CREATE VIEW Num_Emp_Job(deptNo,job,cnt) AS SELECT deptNo,job,COUNT(*) FROM Emp GROUP BY deptNo, job ORDER BY deptNo;

What does view updatability mean and in what case will it not work?

All updates to a base table are immediately reflected in all views that encompass that base table. Similarly, we may expect that if a view is updated then the base table(s) will reflect that change. ― For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the base table. e.g SELECT statement defining the view must be simple, meaning it doesn't involve complex expressions, aggregates, GROUP BY, DISTINCT, JOINs, or subqueries

Define a commit and a rollback

* COMMIT ― Causes all outstanding changes that you have made to become permanent. * ROLLBACK ― Causes all uncommitted changes to be discarded

Define database implementation

* Physical realization of the database and application designs ― Use DDL to create database schemas ― Use DML to populate the data ― Develop the application programs * Database transactions are implemented using the DML, possibly embedded in a host programming language.

Define Physical Database Design (PDD)

The process of producing a description of the implementation of the database on secondary storage. ― Describe the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associate integrity constraints and security measures.

What are some prerequisites before PDD can occur?

― Logical database design * Minimally in 3rd normal form ― Transaction analyses * Most frequent transactions * Most complex or resource-insensitive transactions * Distributions of transactions, over time * Most critical transactions to the applications ― Which DBMS software to use

What are the steps of PDD

STEP1: Translate the logical data model for target DBMS ― 1.1 Design base relations ― 1.2 Design representation of derived data ― 1.3 Design general constraints STEP2: Design file organizations and indexes ― 2.1 Choose file organizations ― 2.2 Choose indexes ― 2.3 Estimate disk space requirement STEP3: Design user views STEP4: Design security mechanisms STEP5: Consider controlled redundancy STEP6: Monitor and tune the operational system

How do you design base relations in PDD?

* Defining Tables and Columns: including naming objects, choosing data types and lengths, and handling null, default values * Defining Primary Keys: may considering the use of surrogate keys * Handling referential integrity for foreign keys. ^This is basically this thingy , OrderDetails(OrderID, ProductID, Quantity)

What is a surrogate key?

A surrogate key is a unique identifier assigned to each record or row in a database table. Unlike natural keys, which are based on existing data attributes and have some inherent meaning, surrogate keys are system-generated and lack any meaningful information about the data they represent. Surrogate keys are typically used as primary keys in database tables.

How do you represent derived data in PDD?

Remember derived data is the data that's calculated based on the database and not physically stored 1. Use a generated column 2. Create a view 3. Use triggers* Triggers are stored programs executed automatically to respond to specific events such as inserting, updating or deleting data. 4. Calculate it each time

What is the storage structure for a database?

A database is stored as a collection of files. A database file is partitioned into fixed-length storage units called blocks. What is a block? Block/Page: A contiguous sequence of sectors from a single track

What are some properties about primary secondary and stable storage for databases and give an example of each

-Primary storage is volatile ( e.g main memory/cache memory -Primary storage is where the operations happen -Data blocks can be moved from primary to secondary storage -Secondary storage is where the entire database is stored e.g(magnetic disk, flash drives etc) -Stable storage is where the database is backed up (e.g RAID or a Redundant Array of Independent Disks) -Stable storage is always non-volatile with independent failure modes

Define File Organisation

The physical arrangement of data in a file

What are 3 ways to organize blocks/pages in a file

― Heap (unsorted) files: Records are placed on disk in no particular order ― Sequential (ordered) files: records are ordered by the value of a specific attribute. ― Hash files: Records are placed on disk according to a hash function

What are the two types of index and what do they do?

Clustered Index: With a clustered index the rows are stored physically on the disk in the same order as the index. ― Only one clustered index is allowed for each table ― If the indexed attribute chosen is a key of a relation, the index is called primary index. Non-Clustered(secondary) index: An index which maintains a logical ordering of data rows without altering the physical ordering of the rows. ― A table could have several secondary indexes ― Foreign keys are good candidates for non clustering indexes.

Define Deonormalization and when it should be used

Denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. ― If performance is unsatisfactory and a relation has a low update and a very high query rate.

When attempting controlled redundancy why would we create new tables/views? Why would we Split tables?

They can hold data derived in columns from other tables - stops it having to be derived freshly each time Decompose very large relations into smaller and more manageable pieces.

What is a transaction?

Transaction: an executing program that forms a logical unit of database operations A transaction includes one or more database operations, e.g., insertion, deletion, updating, retrieval.

Define ACID in database transactions and what each letter stands for

* Atomicity: All or nothing e.g shouldn’t take money from A without giving it to B * Consistency: transforms the database from one consistent state to another consistent state e.g money isn't lost or gained Isolation: partial effects of incomplete transactions should not be visible to other transactions e.g. other transactions shouldn’t see such a change until completion. * Durability: successfully committed transactions are permanently recorded in the database, not lost, even in the event of a system failure e.g. After completion, such a change in A and B is saved in the database.

What is the SQL to start a transaction

START TRANSACTION; COMMIT; e.g START TRANSACTION; UPDATE Accounts SET balance = balance - 100 WHERE account = 'A'; UPDATE Accounts SET balance = balance + 100 WHERE account = 'B’; COMMIT;

What is a database Schedule and name and define the two types

Schedule: an ordering of operations for concurrent transactions. There are Serial schedules and Non-serial schedules Serial schedule: a schedule in which the operations for concurrent transactions are not interleaved. * But, unacceptable in practice. Serial schedule: a schedule in which the operations for concurrent transactions are not interleaved. * But, unacceptable in practice.

What are two desired properties for a schedule

Serializability and Recoverability

Define a Conflicting operation in a database schedule

Conflicting operations: if two operations belong to different transactions, access the same database item, and at least one operation is a writing operation.

Define when a Transaction has a recoverable schedule

A schedule is considered recoverable if, in the presence of failures (such as crashes or errors), it is possible to restore the database to a consistent state. a schedule, where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then Ti should commit before Tj.

Define concurrency control

Concurrency Control is the process of managing simultaneous operations on the database without having them interfere with one another.

State the two types of concurrency control and what they ential

― Pessimistic: Assumes that conflict is likely and take steps to prevent it. * Locking, timestamping ― Optimistic: Assumes that conflict is unlikely and only checks for it when transaction commits

What does a lock do in database transactions?

Locking: When a transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results. ― A read operation must acquire a shared/read lock ― An update operation must acquire an exclusive/write lock

What can locks be applied to?

Locks can be applied to: ― The whole database ― A file ― A page/a disk block ― A record ― A field value of a record

In a shared/exclusive locking scheme what must be issued before any read(X)

A transaction must issue read_lock(X), or write_lock(X) before any read(X).

In a shared/exclusive locking scheme what must be issued before any write(X)

― A transaction must issue write_lock(X) before any write(X)

In a shared/exclusive locking scheme what must be issued after read(X) or write(X)

― A transaction must issue unlock(X) after read(X) or write(X)

Does a shared/exclusive locking scheme guarantee stability?

Using shared/exclusive locking scheme does not guarantee serializability. More strict protocols are required, e.g. two-phase locking (2PL).

Define two-phase locking and each of the two phases in it

Two-Phase Locking (2PL): All locking operations precede the first unlock operation in a transaction. ― Growing/Expanding phase: during which new locks on items can be acquired but none can be released; ― Shrinking phase: during which existing locks can be released but no new locks can be acquired. * Advantage: it assures serializability

What problem can two phase locking cause

It can cause a deadlock which is a circular situation where each of two (or more) transactions are waiting to acquire locks that are held by the other.

How can deadlocks be prevented by using timeouts?

A transaction will wait for a (database defined) period to acquire a lock. * If this time runs out then the whole transaction is rolled back and restarted.

How can deadlocks be prevented by the use of a wait-for-graph(WFG)

* A Wait-for Graph (WFG) is constructed ― node for each transaction ― directed edge from transaction T1 to transaction T2 if T1 is waiting to lock an item currently held by T2 * A deadlock exists if the graph contains a cycle.

What needs to happen when a deadlock is detected? One issue that can occur in deadlock prevention is starvation, define starvation?

* Once deadlock is detected, DBMS needs abort one or more of the transactions. -The transaction with the minimum cost should be rolled back * Starvation: the same transaction is always chosen as a victim, and the transaction can never complete.

What are the two deadlock prevention algorithms and what do they do?

* Wait-die algorithm ― only older transactions can wait for younger ones * Wound-wait algorithm ― only younger transactions can wait for older ones

Define a timestamp in transactions and explain timestamping

Timestamp: A unique identifier created by the DBMS that indicates the relative starting time of a transaction. ― Either a logical counter or the system clock. * Timestamping: a concurrency control protocol that orders transactions in such a way that older transactions get priority in the event of conflict

Explain optimistic techniques

Optimistic techniques assume that conflict is rare. ― Before a transaction commits, a check is performed to determine whether conflict has occurred. ― If yes, roll back and restart.

Define a backup and the three different kinds

A backup is a copy of the database taken periodically and stored in a safe place which allows the database to be restored with an acceptable loss of data * Full backup ― The whole database. * Differential backup ― All changes made since the last full backup * Incremental ― All changes made since the last incremental backup`

Define database recovery and the two main techniques used for it

Recovery: The process of restoring the database to a correct state in the event of a failure. Two main recovery techniques ― Log-based ― Shadow paging

Explain a log and how it can be used as recovery

Log: a sequence of records, which maintains the records of actions performed by transactions, recovery can be performed by reversing these operations

Explain check-pointing in a log file and the operations involved

Checkpoints are scheduled at predetermined intervals and involve the following operations: ― Write all log records in main memory to secondary storage ― Write the modified blocks in the buffers to secondary storage. ― Write a checkpoint record to the log file. This record contains transactions which are active at the time of the checkpo

When a crash occurs in a database how are logs and Checkpoints used?

When a failure occurs, ― The system check the log file backwards from the end to the last checkpoint. ― REDO all the transactions that committed since the last checkpoint. ― UNDO all the transactions that were active at the time of the crash.

What are the two types of database updating schemes and what do they entail?

Two types of DB updating schemes: ― Deferred update: Updates are not written to the database until after a transaction has reached its commit point. ― Immediate update: Updates are applied to the database as they occur without waiting to reach the commit point.

Define shadow paging

Maintain two tables during the lifetime of a transaction: ― A current page table ― A shadow page table During a transaction the current page is copied into a shadow page table and the shadow page table is never modified during transaction execution. If the transaction fails the current page table is discarded and the shadow table is used instead

What are advantages and disadvantages of shadow paging?

Advantage: ― The overhead of maintaining the log file is eliminated. ― The recovery is significantly faster since there is no need for UNDO or REDO operations. Disadvantage: ― Data fragmentation ― Need for periodic garbage collection to reclaim inaccessible blocks

What are some things that can cause security issues in databases

Caused by – Unauthorized users – Authorized users – Programmers - DBA

What can bad security cause a loss of (CIA)

― Confidentiality: the protection of data from unauthorized disclosure ― Integrity: the requirement that information is protected from improper modification. ― Availability

Explain Discretionary Access Control (DAC)

― Discretionary Access Control (DAC) * The owner of the object specifies which subjects can access the object. * Used by most commercial DBMSs * Supported by SQL: GRANT and REVOKE

Explain Mandatory Access control

Mandatory Access Control (MAC) * The system (not the users) specifies which subjects can access specific data objects. * Not supported by SQL

How would i use SQL to grant rfc@localhost the INSERT privilege on classicmodels database?

GRANT INSERT ON classicmodels.* TO rfc@localhost; Note: the opposite of GRANT is REVOKE

How does the Bell-LaPadula model work?

A subject S is only allowed to read an object O if it has a higher level of clearance than the object ― Class(O): the classification of an object O (relation, tuple, column, view, operation) ― Class(S): the clearance of a subject S (user, account, program)

What are the 4 security Levels in the Bell-LaPadula model?

Top Secret (TS) > Secret (S) > Confidential (C) > Unclassified (U)

How can views be used for security?

The view mechanism provides a powerful and flexible security mechanism by hiding parts of the database from certain users.

What are the 3 different targets of encryption in a database?

― Disk ― Database ― Network traffic

How can an audit be used in database security?

Rather than preventing certain actions, record who did what. ― Keep track of all operations from each user through each login session. ― The audit trail (or system log) itself must be secured

What is a Redundant Array of Independent Disks(RAID) and what is it used to do?

A data storage technique that combines multiple physical disk drive components into one logical unit *improves reliability * increases performance * improves availability

What is Data striping is a technique used in a RAID, what does it do?

Data striping: the data is segmented into equal-size partitions (the striping unit) which are transparently distribute across multiple disks. ― Improves overall I/O performance by allowing multiple I/Os to be serviced in parallel.

What is a Parity (or error correcting) scheme ?

Each byte may have a parity bit associated with it that records whether the number of 1s is even or odd. ― Even parity: number of 1’s including parity bit is even * E.g., 0110110 → 01101100, 0101100 → 01011001 ― Odd parity: number of 1’s including parity bit is odd * E.g., 0110110 → 01101101, 0101100 → 01011000 ― Improves reliability

What are the 4 different RAID schemes?

― RAID 0 : striping ― RAID 1: mirroring ― RAID 10 ― RAID 5

What is a pro and a con for RAID 0: striping

― Pros: improved I/O performance ― Cons: complete vulnerability to drive failures

What is a pro and a con for RAID 1: mirroring

― Pros: Fast read, availability improvement ― Cons: Write throughput is always slower because every drive must be updated. Most expensive storage solution.

What is a pro and a con for RAID 10: striping and mirroring

― Pros: improved I/O performance, rebuilding time is fast if one drive fails. ― Cons: Still expensive because of mirroring.

What is a pro and a con for RAID 5: striping and parity

― The parity data is also striped across all the disk. ― Pros: Resilience to a single drive failure, fast reading. ― Cons: Writing is slower (due to the parity needed be calculated), complex technology.

What is a distributed database?

Distributed database: a collection of multiple interconnected databases, which are ― spread physically across various locations ― communicate via a computer network.

What are two desired characteristics for distributed databases?

1. Distribution/location transparency ― Users perceive the database as a single, logical entity. 2. Local autonomy ― Local data is locally owned and managed. ― Each local database operated independently. ― Local systems are able to operate effectively even if remote databases are not available.

What are two types of distributed Database?

Homogeneous ― All sites use the same type of DBMS. Heterogeneous ― Sites run different DBMSs. ― Translation using gateways: convert the language and model of each different DBMS into the language and model of the relational system.

What is Data fragmentation in databases?

Fragmentation is to divide a table into a set of smaller tables to be stored in different sites. The subsets of the table are called fragments. ― Horizontal * Subsets of rows are stored at different sites. ― Vertical * Subsets of attributes are stored at different sites. ― Hybrid (Mixed)

What is the correctness rule in Data Fragmentation

Correctness rules: ― Completeness: Each data item can be found in R must appear in at least one fragment. ― Reconstruction * Horizontal: 𝑅 = 𝑅1 ∪ 𝑅2 … ∪ 𝑅𝑛 * Vertical: 𝑅 = 𝑅1 ⋈ 𝑅2 … ⋈ 𝑅𝑛 ― Disjointness * Horizontal: The fragments are disjoint. * Vertical: The fragments are disjoint except for the primary key

Define data allocation

Data allocation: The process of assigning each fragment to a particular site in the distributed system.

What are three types of Data replication and allocation?

* Fully replication ― A complete copy of the DB at each site * Non-redundant allocation ― Each fragment is stored at exactly one site * Partial replication ― Certain transactions that access particular parts of the DB are mostly submitted at a particular site, that fragment can be allocated at that site only. ― Data that is accessed at multiple sites can be replicated at those sites. If many updates are performed, it may limit replication.

What are some advantages of a distributed database?

* Reflects organizational structure * Easier expansion * Increased reliability and availability * Better response/Improved performance

What are some disadvantages of a distributed database?

* Complex to design and implementation * Harder to control security and integrity * Lack of standard and experience

Define a data warehouse and some properties

Data warehouse is a database, which – is kept separate from the organization's operational database, – no frequent updating – possesses consolidated historical data – helps the organization to analyse its business

What are some key features on a data warehouse

― Subject Oriented − e.g., a sales data warehouse, or a credit card transaction data warehouse, etc. ― Integrated − data combined from multiple sources. ― Time Variant − looking for changes over time ― Non-volatile − the previous data is not erased when new data is added to it. Frequent changes in operational database is not reflected in the data warehouse.

Define OLAP

Online Analytical Processing (OLAP): ― Uses multi-dimensional view of aggregate data to provide quick access to strategic information

Answer 75

Example 1 : Give all the products’ price a 3% increase UPDATE Products SET Price=Price*1.03; Example 2 UPDATE Customers SET ContactName = ‘Alfred Schmidt’, City= ‘Frankfurt’ WHERE CustomerID = ‘001’;

Answer 76

DELETE FROM

Answer 77

CREATE TABLE IF NOT EXISTS Tasks ( task_id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255) NOT NULL, start_date DATE, due_date DATE, status TINYINT NOT NULL, description TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); IF NOT EXISTS is optional

Answer 78

Numeric types -INT, TINYINT, DECIMAL, FLOAT, DOUBLE string types; -CHAR, VARCHAR, TEXT Dat/time types -DATE -TIME -DATETIME

Answer 79

– NOT NULL – UNIQUE – PRIMARY KEY – FOREIGN KEY – CHECK

Answer 80

NOT NULL: ensures values stored in a column are not NULL.

Answer 81

UNIQUE: ensures values in a column or a group of columns are different used like : CREATE TABLE table_name( ..., column_name data_type UNIQUE, ... );

Answer 82

CREATE TABLE table_name( ..., [CONSTRAINT constraint_name] UNIQUE(Column_list) );

Answer 83

CHECK: ensures that values stored in a column or group of columns satisfy a Boolean expression. used like : CREATE TABLE table_name( ..., column_name data_type CHECK(Boolean_expression), ... );

Answer 84

PRIMARY KEY ― Implicitly means NOT NULL + UNIQUE ― A table can have one and only one primary key. CREATE TABLE table_name( ..., column_name data_type PRIMARY KEY, ... );

Answer 85

* FOREIGN KEY: refers to the PRIMARY KEY in another table -The domain of the foreign key must be the same as the domain used in the referenced table CREATE TABLE Products( productId INT AUTO_INCREMENT PRIMARY KEY, productName VARCHAR(100) NOT NULL, categoryId INT, CONSTRAINT fk_category FOREIGN KEY (categoryId) REFERENCES categories(categoryId) ) ENGINE=INNODB;

Answer 86

― CASCADE: If a row from the parent table is deleted or updated, the values of the matching rows in the child table automatically deleted or updated. ― SET NULL: When referenced data in the parent key is deleted or updated, all rows in the child table that depend on those values have their foreign keys set to NULL. ― RESTRICT/NO ACTION: Reject the delete/update operation from the parent table, if a row from the parent table has a matching row in the child table.

Answer 87

ALTER TABLE Persons ADD COLUMN dateOfBirth date NOT NULL, ADD COLUMN joinDate date default CURRENT_DATE(); ALTER TABLE Persons DROP COLUMN dateOfBirth;

Answer 88

ALTER ALTER TABLE Vehicles MODIFY year SMALLINT NOT NULL, MODIFY color VARCHAR(20) NULL AFTER make;

Answer 89

Add a table constraint: ALTER TABLE table_name ADD [CONSTRAINT constraint_name] UNIQUE/CHECK/PRIMARY KEY/FOREIGN KEY (...); drop a table constraint ALTER TABLE table_name DROP CONSTRAINT constraint_name; drop primary/ foreign key ALTER TABLE table_name DROP PRIMARY KEY; ALTER TABLE table_name DROP FOREIGN KEY constraint_name;

Answer 90

ALTER TABLE Persons ADD CONSTRAINT UC_person UNIQUE(ID,lastName);

Answer 91

DROP TABLE Shippers;

Answer 92

Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table. ― Typically for primary key ― In MySQL, assign AUTO_INCREMENT to a column example: CREATE TABLE Persons ( personId int NOT NULL AUTO_INCREMENT PRIMARY KEY, lastName varchar(255) NOT NULL, firstName varchar(255), Age int );

Answer 93

ALTER TABLE Persons AUTO_INCREMENT 100; ^add a number on the end ¯\_(ツ)_/¯

Answer 94

A data structure such as B-Tree that improves the speed of data retrieval on a table at the cost of additional writes and storage to maintain it. Note: Updating a table with indexes takes more time than updating a table without. So, only create indexes on columns that will be frequently searched.

Answer 95

INDEX ( attributeName ) like : CREATE TABLE Persons ( personid int NOT NULL AUTO_INCREMENT PRIMARY KEY, lastName varchar(255) NOT NULL, firstName varchar(255), age int, INDEX (lastName) ); add to existing attribute: CREATE INDEX idx_persons ON Persons(lastName);

Answer 96

CREATE UNIQUE INDEX emp_u1 ON EMP (eName, job); This will Index and ensure that each combination of eName and job are unique.

Answer 97

CREATE [OR REPLACE] VIEW ViewName [column1,…] AS SELECT …; example horizontal view CREATE VIEW Emp_D20 AS SELECT * FROM Emp WHERE deptNo = 20; example vertical view CREATE VIEW Emp_NoSalary AS SELECT empNo, eName, job, mgr, hireDate, comm, deptNo FROM Emp;

Answer 98

CREATE VIEW Num_Emp_Job(deptNo,job,cnt) AS SELECT deptNo,job,COUNT(*) FROM Emp GROUP BY deptNo, job ORDER BY deptNo;

Answer 99

All updates to a base table are immediately reflected in all views that encompass that base table. Similarly, we may expect that if a view is updated then the base table(s) will reflect that change. ― For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the base table. e.g SELECT statement defining the view must be simple, meaning it doesn't involve complex expressions, aggregates, GROUP BY, DISTINCT, JOINs, or subqueries

Answer 100

* COMMIT ― Causes all outstanding changes that you have made to become permanent. * ROLLBACK ― Causes all uncommitted changes to be discarded

Answer 101

* Physical realization of the database and application designs ― Use DDL to create database schemas ― Use DML to populate the data ― Develop the application programs * Database transactions are implemented using the DML, possibly embedded in a host programming language.

Answer 102

The process of producing a description of the implementation of the database on secondary storage. ― Describe the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associate integrity constraints and security measures.

Answer 103

― Logical database design * Minimally in 3rd normal form ― Transaction analyses * Most frequent transactions * Most complex or resource-insensitive transactions * Distributions of transactions, over time * Most critical transactions to the applications ― Which DBMS software to use

Answer 104

STEP1: Translate the logical data model for target DBMS ― 1.1 Design base relations ― 1.2 Design representation of derived data ― 1.3 Design general constraints STEP2: Design file organizations and indexes ― 2.1 Choose file organizations ― 2.2 Choose indexes ― 2.3 Estimate disk space requirement STEP3: Design user views STEP4: Design security mechanisms STEP5: Consider controlled redundancy STEP6: Monitor and tune the operational system

Answer 105

* Defining Tables and Columns: including naming objects, choosing data types and lengths, and handling null, default values * Defining Primary Keys: may considering the use of surrogate keys * Handling referential integrity for foreign keys. ^This is basically this thingy , OrderDetails(OrderID, ProductID, Quantity)

Answer 106

A surrogate key is a unique identifier assigned to each record or row in a database table. Unlike natural keys, which are based on existing data attributes and have some inherent meaning, surrogate keys are system-generated and lack any meaningful information about the data they represent. Surrogate keys are typically used as primary keys in database tables.

Answer 107

Remember derived data is the data that's calculated based on the database and not physically stored 1. Use a generated column 2. Create a view 3. Use triggers* Triggers are stored programs executed automatically to respond to specific events such as inserting, updating or deleting data. 4. Calculate it each time

Answer 108

A database is stored as a collection of files. A database file is partitioned into fixed-length storage units called blocks. What is a block? Block/Page: A contiguous sequence of sectors from a single track

Answer 109

-Primary storage is volatile ( e.g main memory/cache memory -Primary storage is where the operations happen -Data blocks can be moved from primary to secondary storage -Secondary storage is where the entire database is stored e.g(magnetic disk, flash drives etc) -Stable storage is where the database is backed up (e.g RAID or a Redundant Array of Independent Disks) -Stable storage is always non-volatile with independent failure modes

Answer 110

The physical arrangement of data in a file

Answer 111

― Heap (unsorted) files: Records are placed on disk in no particular order ― Sequential (ordered) files: records are ordered by the value of a specific attribute. ― Hash files: Records are placed on disk according to a hash function

Answer 112

Clustered Index: With a clustered index the rows are stored physically on the disk in the same order as the index. ― Only one clustered index is allowed for each table ― If the indexed attribute chosen is a key of a relation, the index is called primary index. Non-Clustered(secondary) index: An index which maintains a logical ordering of data rows without altering the physical ordering of the rows. ― A table could have several secondary indexes ― Foreign keys are good candidates for non clustering indexes.

Answer 113

Denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. ― If performance is unsatisfactory and a relation has a low update and a very high query rate.

Answer 114

They can hold data derived in columns from other tables - stops it having to be derived freshly each time Decompose very large relations into smaller and more manageable pieces.

Answer 115

Transaction: an executing program that forms a logical unit of database operations A transaction includes one or more database operations, e.g., insertion, deletion, updating, retrieval.

Answer 116

* Atomicity: All or nothing e.g shouldn’t take money from A without giving it to B * Consistency: transforms the database from one consistent state to another consistent state e.g money isn't lost or gained Isolation: partial effects of incomplete transactions should not be visible to other transactions e.g. other transactions shouldn’t see such a change until completion. * Durability: successfully committed transactions are permanently recorded in the database, not lost, even in the event of a system failure e.g. After completion, such a change in A and B is saved in the database.

Answer 117

START TRANSACTION; COMMIT; e.g START TRANSACTION; UPDATE Accounts SET balance = balance - 100 WHERE account = 'A'; UPDATE Accounts SET balance = balance + 100 WHERE account = 'B’; COMMIT;

Answer 118

Schedule: an ordering of operations for concurrent transactions. There are Serial schedules and Non-serial schedules Serial schedule: a schedule in which the operations for concurrent transactions are not interleaved. * But, unacceptable in practice. Serial schedule: a schedule in which the operations for concurrent transactions are not interleaved. * But, unacceptable in practice.

Answer 119

Serializability and Recoverability

Answer 120

Conflicting operations: if two operations belong to different transactions, access the same database item, and at least one operation is a writing operation.

Answer 121

A schedule is considered recoverable if, in the presence of failures (such as crashes or errors), it is possible to restore the database to a consistent state. a schedule, where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then Ti should commit before Tj.

Answer 122

Concurrency Control is the process of managing simultaneous operations on the database without having them interfere with one another.

Answer 123

― Pessimistic: Assumes that conflict is likely and take steps to prevent it. * Locking, timestamping ― Optimistic: Assumes that conflict is unlikely and only checks for it when transaction commits

Answer 124

Locking: When a transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results. ― A read operation must acquire a shared/read lock ― An update operation must acquire an exclusive/write lock

Answer 125

Locks can be applied to: ― The whole database ― A file ― A page/a disk block ― A record ― A field value of a record

Answer 126

A transaction must issue read_lock(X), or write_lock(X) before any read(X).

Answer 127

― A transaction must issue write_lock(X) before any write(X)

Answer 128

― A transaction must issue unlock(X) after read(X) or write(X)

Answer 129

Using shared/exclusive locking scheme does not guarantee serializability. More strict protocols are required, e.g. two-phase locking (2PL).

Answer 130

Two-Phase Locking (2PL): All locking operations precede the first unlock operation in a transaction. ― Growing/Expanding phase: during which new locks on items can be acquired but none can be released; ― Shrinking phase: during which existing locks can be released but no new locks can be acquired. * Advantage: it assures serializability

Answer 131

It can cause a deadlock which is a circular situation where each of two (or more) transactions are waiting to acquire locks that are held by the other.

Answer 132

A transaction will wait for a (database defined) period to acquire a lock. * If this time runs out then the whole transaction is rolled back and restarted.

Answer 133

* A Wait-for Graph (WFG) is constructed ― node for each transaction ― directed edge from transaction T1 to transaction T2 if T1 is waiting to lock an item currently held by T2 * A deadlock exists if the graph contains a cycle.

Answer 134

* Once deadlock is detected, DBMS needs abort one or more of the transactions. -The transaction with the minimum cost should be rolled back * Starvation: the same transaction is always chosen as a victim, and the transaction can never complete.

Answer 135

* Wait-die algorithm ― only older transactions can wait for younger ones * Wound-wait algorithm ― only younger transactions can wait for older ones

Answer 136

Timestamp: A unique identifier created by the DBMS that indicates the relative starting time of a transaction. ― Either a logical counter or the system clock. * Timestamping: a concurrency control protocol that orders transactions in such a way that older transactions get priority in the event of conflict

Answer 137

Optimistic techniques assume that conflict is rare. ― Before a transaction commits, a check is performed to determine whether conflict has occurred. ― If yes, roll back and restart.

Answer 138

A backup is a copy of the database taken periodically and stored in a safe place which allows the database to be restored with an acceptable loss of data * Full backup ― The whole database. * Differential backup ― All changes made since the last full backup * Incremental ― All changes made since the last incremental backup`

Answer 139

Recovery: The process of restoring the database to a correct state in the event of a failure. Two main recovery techniques ― Log-based ― Shadow paging

Answer 140

Log: a sequence of records, which maintains the records of actions performed by transactions, recovery can be performed by reversing these operations

Answer 141

Checkpoints are scheduled at predetermined intervals and involve the following operations: ― Write all log records in main memory to secondary storage ― Write the modified blocks in the buffers to secondary storage. ― Write a checkpoint record to the log file. This record contains transactions which are active at the time of the checkpo

Answer 142

When a failure occurs, ― The system check the log file backwards from the end to the last checkpoint. ― REDO all the transactions that committed since the last checkpoint. ― UNDO all the transactions that were active at the time of the crash.

Answer 143

Two types of DB updating schemes: ― Deferred update: Updates are not written to the database until after a transaction has reached its commit point. ― Immediate update: Updates are applied to the database as they occur without waiting to reach the commit point.

Answer 144

Maintain two tables during the lifetime of a transaction: ― A current page table ― A shadow page table During a transaction the current page is copied into a shadow page table and the shadow page table is never modified during transaction execution. If the transaction fails the current page table is discarded and the shadow table is used instead

Answer 145

Advantage: ― The overhead of maintaining the log file is eliminated. ― The recovery is significantly faster since there is no need for UNDO or REDO operations. Disadvantage: ― Data fragmentation ― Need for periodic garbage collection to reclaim inaccessible blocks

Answer 146

Caused by – Unauthorized users – Authorized users – Programmers - DBA

Answer 147

― Confidentiality: the protection of data from unauthorized disclosure ― Integrity: the requirement that information is protected from improper modification. ― Availability

Answer 148

― Discretionary Access Control (DAC) * The owner of the object specifies which subjects can access the object. * Used by most commercial DBMSs * Supported by SQL: GRANT and REVOKE

Answer 149

Mandatory Access Control (MAC) * The system (not the users) specifies which subjects can access specific data objects. * Not supported by SQL

Answer 150

GRANT INSERT ON classicmodels.* TO rfc@localhost; Note: the opposite of GRANT is REVOKE

Answer 151

A subject S is only allowed to read an object O if it has a higher level of clearance than the object ― Class(O): the classification of an object O (relation, tuple, column, view, operation) ― Class(S): the clearance of a subject S (user, account, program)

Answer 152

Top Secret (TS) > Secret (S) > Confidential (C) > Unclassified (U)

Answer 153

The view mechanism provides a powerful and flexible security mechanism by hiding parts of the database from certain users.

Answer 154

― Disk ― Database ― Network traffic

Answer 155

Rather than preventing certain actions, record who did what. ― Keep track of all operations from each user through each login session. ― The audit trail (or system log) itself must be secured

Answer 156

A data storage technique that combines multiple physical disk drive components into one logical unit *improves reliability * increases performance * improves availability

Answer 157

Data striping: the data is segmented into equal-size partitions (the striping unit) which are transparently distribute across multiple disks. ― Improves overall I/O performance by allowing multiple I/Os to be serviced in parallel.

Answer 158

Each byte may have a parity bit associated with it that records whether the number of 1s is even or odd. ― Even parity: number of 1’s including parity bit is even * E.g., 0110110 → 01101100, 0101100 → 01011001 ― Odd parity: number of 1’s including parity bit is odd * E.g., 0110110 → 01101101, 0101100 → 01011000 ― Improves reliability

Answer 159

― RAID 0 : striping ― RAID 1: mirroring ― RAID 10 ― RAID 5

Answer 160

― Pros: improved I/O performance ― Cons: complete vulnerability to drive failures

Answer 161

― Pros: Fast read, availability improvement ― Cons: Write throughput is always slower because every drive must be updated. Most expensive storage solution.

Answer 162

― Pros: improved I/O performance, rebuilding time is fast if one drive fails. ― Cons: Still expensive because of mirroring.

Answer 163

― The parity data is also striped across all the disk. ― Pros: Resilience to a single drive failure, fast reading. ― Cons: Writing is slower (due to the parity needed be calculated), complex technology.

Answer 164

Distributed database: a collection of multiple interconnected databases, which are ― spread physically across various locations ― communicate via a computer network.

Answer 165

1. Distribution/location transparency ― Users perceive the database as a single, logical entity. 2. Local autonomy ― Local data is locally owned and managed. ― Each local database operated independently. ― Local systems are able to operate effectively even if remote databases are not available.

Answer 166

Homogeneous ― All sites use the same type of DBMS. Heterogeneous ― Sites run different DBMSs. ― Translation using gateways: convert the language and model of each different DBMS into the language and model of the relational system.

Answer 167

Fragmentation is to divide a table into a set of smaller tables to be stored in different sites. The subsets of the table are called fragments. ― Horizontal * Subsets of rows are stored at different sites. ― Vertical * Subsets of attributes are stored at different sites. ― Hybrid (Mixed)

Answer 168

Correctness rules: ― Completeness: Each data item can be found in R must appear in at least one fragment. ― Reconstruction * Horizontal: 𝑅 = 𝑅1 ∪ 𝑅2 … ∪ 𝑅𝑛 * Vertical: 𝑅 = 𝑅1 ⋈ 𝑅2 … ⋈ 𝑅𝑛 ― Disjointness * Horizontal: The fragments are disjoint. * Vertical: The fragments are disjoint except for the primary key

Answer 169

Data allocation: The process of assigning each fragment to a particular site in the distributed system.

Answer 170

* Fully replication ― A complete copy of the DB at each site * Non-redundant allocation ― Each fragment is stored at exactly one site * Partial replication ― Certain transactions that access particular parts of the DB are mostly submitted at a particular site, that fragment can be allocated at that site only. ― Data that is accessed at multiple sites can be replicated at those sites. If many updates are performed, it may limit replication.

Answer 171

* Reflects organizational structure * Easier expansion * Increased reliability and availability * Better response/Improved performance

Answer 172

* Complex to design and implementation * Harder to control security and integrity * Lack of standard and experience

Answer 173

Data warehouse is a database, which – is kept separate from the organization's operational database, – no frequent updating – possesses consolidated historical data – helps the organization to analyse its business

Answer 174

― Subject Oriented − e.g., a sales data warehouse, or a credit card transaction data warehouse, etc. ― Integrated − data combined from multiple sources. ― Time Variant − looking for changes over time ― Non-volatile − the previous data is not erased when new data is added to it. Frequent changes in operational database is not reflected in the data warehouse.

Answer 175

Online Analytical Processing (OLAP): ― Uses multi-dimensional view of aggregate data to provide quick access to strategic information

DataBases Flashcards

(201 cards)