Database Theory and Design Flashcards

1
Q

What is a database?

A

An organised collection of related data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a DBMS?

A

Software that manages &controls access
to the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a database application?

A

A program that interacts with the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the disadvantages of file-based systems?

A

Data Duplication. Data dependence. Incompatible file formats. Fixed queries/proliferation of application programs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the differences between Database Systems and File-based systems.

A

In the file-based approach, each application program defines and manages its own data.
While in the database approach, a shared database is stored to control the redundancy and ensure the data consistency and integrity.
In the file-based approach, each application program directly accesses the data.
While in the database approach, all applications access the database through DBMS which provides a few important access control facilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the three level architecture of a DB?

A

External - Describes what part of the database that is relevant to each user.
Conceptual - Describes what data is stored in the database and the relationships among the data.
Internal - Describes how the data is stored in the DB. Physical representation of the DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data independence?

A

Data independence is the capacity that upper levels are unaffected by changes to lowerlevel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two kinds of data independence?

A

Physical and Logical. Physical/Logical is the ability to make changes to the physical storage structures/logical structure without affecting application programs that access the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which model is the most popular logical data model?

A

The Entity-Relationship Model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Database Scheme and Database State?

A

The database schema is the design of the database. It represents the organisation of data and the relationships between different elements. The database state represents the actual data stored in the database in a particular point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 4 components of an er model?

A

Entity, Relationship, Attribute, Multiplicity Constraint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a weak entity?

A

A weak entity is an entity in a database that does not have a primary key attribute of its own and is dependent on another entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is aggregation?

A

An abstraction through which relationships are treated as higher-level entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a relation?

A

A 2d table that has specific characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Entity Integrity?

A

Entity Integrity ensures that row in a tuple is uniquely identified by a unique primary key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Referential Integrity?

A

Referential Integrity ensures the consistency of relationships through foreign keys. Each foreign key refers to the primary key of another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are Domain Constraints?

A

Domain Constraints ensure that values stored in a database conform to predefined data types and constraints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the main objectives of normalisation?

A

Remove redundancy, and remove potential for insert, update and delete anomalies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a functional dependency?

A

When the value of one attribute is determined by the value of another attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a transitive dependency?

A

this occurs when a functional dependency indirectly influences another attribute through a chain of dependencies. For example if A is needed for B, and B is needed for C, then C has a transitive dependency on A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a partial functional dependency?

A

when a non pk attribute is functionally dependent on a part, but not the whole, of a composite pk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How does select work in relational algebra?

A

It selects all tuples that satisfy the condition from a relation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How does projection work in relational algebra?

A

Produces a new relation with only some of the attributes from an old relation. Removes duplicate tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How does rename work in relational algebra?

A

Renames a relation to another name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does Union and Intersection work in relational algebra?

A

Union produces a relation that includes all tuples in R1 or R2, or both R1 and R2. Intersection produces a relation of all the tuples in both R1 and R2. R1 and R2 must be union compatible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How does Difference and Cartesian Product work in relational algebra?

A

Difference produces a relation that includes all tuples in R1 that are not in R2. R1 and R2 must be union compatible. Cartesian Product produces a relation that has the attributes of R1 and R2 and includes all possible combination of tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How does Theta Join Equijoin and Natural Join work in relational algebra?

A

The Theta Join is a generic join operation that combines rows from two relations based on a specified condition that is not necessarily an equality condition. Equijoin is a version of theta join that uses an equality condition. Natural Join is a special case of Equijoin where the join condition is automatically determined by matching columns with the same name in the participating relations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is DDL?

A

Data Definition Language. The SQL code for defining the database structure and controlling access to the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is DML?

A

Data Manipulation Language. Used for retrieving and updating data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does this do?

SELECT empno, ename, job, sal
FROM emp
WHERE hiredate BETWEEN ’01-JAN-81’ AND
’31-DEC-81’
AND job IN (’MANAGER’,’PRESIDENT’)
AND comm IS NULL
AND (sal>=2600 OR sal<=10000)
ORDER BY job, sal DESC;

A

Selects fields empno, ename, job and sal from the emp relation. Selects rows where the hiredate value is between 1 Jan 1981 and 31 Dec 1981
and the job is MANAGER or PRESIDENT, and comm is NULL and the sal is between 2600 and 10000. It then orders by job, then salary in descending order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does this do?

SELECT COUNT(CustomerID), Country
FROM Customers
WHERE Country LIKE ‘U%’
GROUP BY Country
HAVING COUNT(CustomerID)>5;

A

Selects the field Country and the Count of the field CustomerID from the Customers relation. Selects rows where the Country starts with a U. Then groups by the unique values in the Country column, and filters the grouped results to include only countries which have a count of customerID greater than 5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does this do?

SELECT *
FROM Products
WHERE SupplierID IN (SELECT SupplierID FROM Suppliers
WHERE Country=‘UK’)

A

Retrieves all fields from Products. Selects rows where the supplierID is in the result of the subquery which retrieves the supplierID field from the Suppliers relation and filters to rows where the country is the UK.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does this do?

SELECT C.CustomerName, O.OrderID
FROM Customers C
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID
ORDER BY C.CustomerName;

A

Selects the columns CustomerName and OrderID from Customers and Orders, and performs a left join on both tables based on CustomerID. Orders by the customer name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How does insert work in SQL?

A

INSERT INTO table_name (column1, column2, column3, …)
VALUES (value1, value2, value3, …);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How does update work in SQL?

A

UPDATE table_name
SET column1 = value1, column2 = value2, …
WHERE condition;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How does an Inner Join work?

A

Returns only the rows where there is a match in both tables based on the specified condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How does a Left Join work?

A

Returns all rows from the left table and the matching rows from the right table. If no match is found, NULL values are returned for columns from the right table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How does a Right Join work?

A

Returns all rows from the right table and the matching rows from the left table. If no match is found, NULL values are returned for columns from the left table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How does a Full Join work?

A

Returns all rows when there is a match in either the left or the right table. If no match is found, NULL values are returned for columns from the table without a match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does this do?

CREATE TABLE Products(
productId INT AUTO_INCREMENT PRIMARY KEY,
productName VARCHAR(100) NOT NULL,
categoryId INT,
CONSTRAINT fk_category FOREIGN KEY (categoryId)
REFERENCES categories(categoryId)
) ENGINE=INNODB;

A

Creates a table named Products. Defines the columns as: a primary key called productID which is an auto incremented integer, a productName which is a VARCHAR(100) and cant be null, and a categoryID which is an int. It then defines a constraint called fk_category which references the categoryID column with the categoryID column in the Categories table. It lastly defines the storage engine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How does ON DELETE SET NULL and ON DELETE CASCADE WORK?

A

Both refer to what happens to a foreign key when its references record is deleted. SET NULL sets all foreign key values to null, while CASCADE deletes them all.

42
Q

What does this do?

CREATE INDEX emp_n1
ON EMP (eName);

A

Creates an index named emp_n1 on the EMP table based around the eName column.

43
Q

What does this do?

CREATE VIEW Emp_D20_NoSalary AS
SELECT empNo, eName, job, mgr,
hireDate, comm
FROM Emp
WHERE deptNo = 20;

A

Creates a view named EMP_D20_NoSalary and selects fields from the EMP table for it. Filters by rows where deptNo is 20.

44
Q

What is an updatable view?

A

A view in which users can perform data modification operations such as INSERT, UPDATE and DELETE directly on underlying tables through this view.

45
Q

What do COMMIT and ROLLBACK do?

A

They are used for transaction management and allow for transactions to be easily done if needed. COMMIT finalizes changes and ROLLBACK rolls back any changes made in the transaction.

46
Q

What is primary, secondary and stable storage?

A

Primary is RAM, which is fast but volatile and used for temporary data storage. Secondary is HDD and SSDs which are slower but provide long term non volatile storage for files and data. Stable storage is highly reliable and fault-tolerant storage that survives various failures.

47
Q

What is denormalization and when should we use it?

A

It is the process of improving the read performance of a database at the expense of write speed, by adding redundant copies of data or grouping data. It is used when performance is unsatisfactory and a relation has low update and high query rate.

48
Q

What does a block/page mean in a database?

A

A “block” or “page” refers to a unit of data storage within a database file system.

49
Q

What are the four desired properties for database transactions (ACID)?

A

Atomicity, Consistency, Isolation and Durability.

50
Q

What is Atomicity?

A

Ensures that a transaction is treated as a single indivisible unit of work, so either all changes made by the transaction are committed, or none are.

51
Q

What is Consistency?

A

Consistency ensures that a transaction brings the database from one valid state to another. For example if money is being transferred from one account to another, one account must gain that money, and one must lose it.

52
Q

What is Isolation?

A

Isolation ensures that the execution of one transaction is isolated from the execution of other transactions. Even if they are executed concurrently the result should be the same as if they were executed sequentially.

53
Q

What is Durability?

A

Guarantees that once a transaction is committed, its effects persist in the database, even in the event of system failures.

54
Q

What are the two desired properties for a schedule?

A

Serializability and Recoverability. Serializability ensures that the execution of transactions preserves consistency, while recoverability ensures the ability to restore the database to a consistent state after a failure.

55
Q

What is a transaction?

A

A transaction is a unit of work in a database that consists of one or more operations, such as reading or updating data.

56
Q

Given a schedule of a few concurrent transactions,
how do we know if it is serializable?

A

If when a precedence graph is drawn, there are no cycles.

57
Q

What is Concurrency Control?

A

The process of managing simultaneous operations on the database without having them interfere with each other.

58
Q

What is a pessimistic view on concurrency control?

A

That conflict is likely and to take steps to prevent it such as locking and timestamping.

59
Q

What is locking?

A

Locking is used to deny access of parts of a database to other transactions. A read operation must acquire a shared/read lock and an update operation must acquire an exclusive/write lock. After a transaction is done with it it must issue an unlock.

60
Q

What is two phase locking?

A

Two-Phase Locking is a concurrency control mechanism in databases. It has two phases: the growing phase (acquiring locks) and the shrinking phase (releasing locks). Once a transaction releases a lock, it cannot acquire new locks. Ensures serializability and prevents certain types of conflicts.

61
Q

What is an issue with locking?

A

Deadlocking, which is a circular situation where each of two or more transactions are waiting to acquire locks that are held by the other.

62
Q

What are timeouts?

A

A solution to deadlocking. A transaction will wait for a database defined period of time to acquire a lock, if this time runs out the transaction is rolled back and restarted.

63
Q

What is deadlock prevention and recovery?

A

A solution to deadlocking. A wait for graph is constructed, with a line from every T to the T it is waiting for the unlock from. If there is a cycle, there is deadlock. If it detects deadlock then it needs to abort one or more of the transactions. It must decide which transaction to roll back (the transactions that incur minimal costs), how far to rollback (partial or total), and avoid starvation (when the same transaction is always rolled back so never completes.

64
Q

What is Deadlock Prevention?

A

Each transaction is timestamped. Wait-die algorithm is where only older transactions can wait for younger ones. Otherwise, the older transaction is rolled back and restarted with a new timestamp. Wound-wait algorithm is where only younger transactions can wait for older ones. Otherwise, the younger transaction is rolled back and restarted with a new timestamp.

65
Q

What is a Timestamp and Timestamping?

A

A timestamp is a unique identifier created by the DBMS that indicates the relative start time of a transaction. Timestamping is a concurrency control protocol that orders transactions in such a way that older transactions get priority in the event of conflict.

66
Q

What are the three types of backups?

A

A full backup which backs up the entire database. A differential backup which adds all changes made since last full backup. Incremental backup which adds all changes made since last incremental backup.

67
Q

What is a Deferred Update scheme?

A

Updates are not written to the database until after a transaction has commited.

67
Q

What is an Immediate Update scheme?

A

Updates are applied to the database as they occur.

67
Q

What is a log file?

A

A log file is a sequence of records which contains the actions performed by each transaction. These contains checkpoints which are the points of synchronization between the database and the log file. Checkpoints are schedules at predetermined intervals and back up the current form of the database (log records, modified blocks in the buffers), and write a checkpoint record to the log file which contains the active transactions at the time of the checkpoint.

68
Q

How does Log-Based Recovery Work?

A

When a failure occurs, the system checks the log file backwards until it finds a checkpoint. It redoes all the transactions that committed since the last checkpoint and undoes all the transactions that were active at the time of the crash.

If a DB is using Deferred Updates, then in the event of failure go back to the last checkpoint, redo transactions which have been committed after the last checkpoint in the order they are written in the log.

If a DB is using Immediate Updates, then in the event of failure go back to the last checkpoint, redo transactions which have been committed after the last checkpoint in the order they are written in the log, and undo transactions with start but no commit in the reverse order to which they were written in the log.

68
Q

What is shadow paging?

A

Shadow paging uses the non-volatile properties of virtual memory to create copies stored on the secondary storage. Every time a transaction commences, a shadow page is stored in the virtual memory, and only the actual page is changed. Once committed the actual page is copied into the shadow page. As the shadow page is stored in the virtual memory, which is non-volatile, if a system crash occurs, then all the shadow pages are copied to the current pages.

69
Q

What are the advantages and disadvantages of Shadow Paging?

A

Advantages:
- The overhead of maintaining the Log File is eliminated.
- The recovery is significantly faster since there is no need for UNDO or REDO operations.

Disadvantages:
- Data Fragmentation
- Need for periodic garbage collection to reclaim inaccessible blocks.

70
Q

What is the DBA?

A

Database Administrator. The central authority for managing a database system securely. They manage account creation, grant and revoke privileges, and do security level assignment.

70
Q

What are CIA?

A

Confidentiality (The protection of data from unauthorized disclosure), Integrity (The requirement that information is protected from improper modification) and Availability (The requirement that data is available to those who should have access).

71
Q

What is Authorization?

A

The granting of a right or privilege that enables a subject to have legitimate access to a system or a systems object.

72
Q

What is Authentication?

A

Proving someone is who they claim to be. Type 1: What you know (password); Type 2: what you have (physical card); Type 3: What you are (e.g Biometrics).

72
Q

What is DAC?

A

Discretionary Access Control. With DAC the owner of the object specifies which subjects can access the object. This is used by most commercial DMBSs and is supported by SQL.

73
Q

What is MAC?

A

Mandatory Access Control. The system specifies which subjects can access specific data objects. This is not supported by SQL.

74
Q

How can views be used for security?

A

The view mechanism provides a powerful and flexible security mechanism by hiding parts of the database form certain users.

75
Q

How can encryption be used for security?

A

Encoding of the data so that it is unreadable if without the decryption key. This includes an encryption key, encryption algorithm, decryption key and decryption algorithm. - With encryption you can target the disk, database, or network traffic.

76
Q

How can an Audit be used for security?

A

Rather than preventing certain actions, record who did what. Keep track of all operations from each user through each login centre. The audit trail itself must be secure.

77
Q

What is RAID?

A

Redundant Array of Independent Disks. A data storage technique that combines multiple physical disk drive components into one logical unit to improve reliability, increase performance, and improve availability.

78
Q

What is Data Striping with RAID?

A

Data Striping is when data is segmented into equal-size partitions which are transparently distributed across multiple disks. It improves overall I/O performance by allowing multiple I/O’s to be serviced in parallel

79
Q

What is a Parity Scheme

A

Each byte may have a parity bit associated with it that records whether the number of 1s is even or odd. This improves reliability.

79
Q

How does RAID 0 work?

A

Data Striping. Had good I/O performance however if one drive fails, all do.

80
Q

How does RAID 1 work?

A

Mirroring. Fast read times and availability improvement. Cons are write throughput is always slower and it is expensive and does not increase storage size.

81
Q

How does RAID 10 work?

A

Striping and Mirroring. Improved I/O performance, and rebuilding time is fast if one drive fails, however still expensive.

82
Q

How does RAID 5 work?

A

Striping and Parity. The parity data is also striped across the disk. It is resilient to a single drive failure and has a fast read time. However, it has a slow write time due to calculation of parity.

83
Q

What is a Distributed Database?

A

A distributed database is a collection of multiple interconnected databases which are spread physically across several locations and communicate via a computer network.

84
Q

What are the desired characteristics of a distributed database?

A
  • Users perceive the database as a single, logical entity.
  • Local data is locally owned and managed.
  • Each local database operated independently.
  • Local systems can operate effectively even if remote databases are not available.
85
Q

What are Homogenous and Heterogenous Distributed Databases?

A

Homogenous is when all sites run the same type of DBMS, Heterogeneous is when sites run different DMBSs.

86
Q

What is Fragmentation?

A

Fragmentation is to divide a table into a set of smaller tables to be stored in different sites. The subsets of the table are called fragments. It can be done in two ways horizontal (splitting by rows) or vertical (splitting by columns). When doing horizontal fragmentation subsets of rows are stored at different sites. For vertical fragmentation subsets of attributes are stored at different sites.

87
Q

What are the Correctness rules?

A
  • Completeness: Each data item can be found in R must appear in at least one fragment.
  • Reconstruction: This ensures the ability to re-construct the original table from the fragments created. If table R is portioned into fragments R1, R2, R3 etc, then R = R1 U R2 U R3 U … U Rn.
  • Disjointness: No record will become a part of two or more different fragments apart from the primary key.
88
Q

What are the three types of data allocation?

A
  • Full Replication: A complete copy of the DB at each site.
  • Non-redundant allocation: Each fragment is stored at exactly one site.
  • Partial Replication: Certain Transactions that access parts of the DB are mostly submitted at a particular site, that fragment can be allocated at that site only. Data that is accessed at multiple sites can be replicated at those sites. If many updates are performed, it may limit replication.
89
Q

What are the advantages of a distributed database?

A
  • Reflects organizational structure.
  • Easier expansion.
  • Increased reliability and availability.
  • Better response/Improved performance.
90
Q

What are the disadvantages of a distributed database?

A
  • Complex to design and implementation.
  • Harder to control security and integrity.
  • Lack of standard and experience.
90
Q

What is a data warehouse?

A

A subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process. Data warehouses are mainly used in things such as financial services, banking services, consumer goods, retail sectors and controlled manufacturing.

90
Q

What is ETL (Extraction, Transformation, Loading)?

A

Extraction: Get the data; Transformation: Make it useful; Loading: Save it to the warehouse.

91
Q

What is OLAP(Online Analytical Processing)?

A

Uses multi-dimensional view of aggregate data to provide quick access to strategic information.

92
Q

What is Data Mining?

A

Process of discovering meaningful new correlation, patterns, and trends by using statistical, mathematical, and AI techniques.