DataBases Flashcards

(201 cards)

1
Q

Definition of a database

A

an organised collection of
related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of DBMS

A

Database Management System (DBMS):
software that manages & controls access to
the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a database application?

A

Database Application: program that interacts
with the database at some point in its
execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a database system?

A

Database System: a collection of programs that interact with the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a file based & database system and the difference between them

A

File-based system: A collection of application programs that perform services for the end-users. Each program defines and manages its own data.
* Database system: A shared collection of logically related data and its description, designed to meet the information needs of an organisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are advantages and disadvantages of a file-based approach?

A

Advantages:
* User control:
― End user has total control of the stored data
― Can modify application or data at will
* Applications and data file can be optimised for particular
tasks

Disadvantages:
* Data Duplication
* Data Dependence
* Incompatible file formats
* Proliferation of application programs (APPS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define data duplication

A

A record is duplicated in > 1 files
― A change in details requires updates to all relevant files
* Integrity of records may be compromised if the files are not all updated
― i.e. Which file holds the correct information?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define data dependence and the two types

A

Definition:

Data dependence: Refers to the relationship between database operations based on their reliance on each other’s results.
Types:

RAW Dependence (Read-after-Write):

Operation depends on the result of a preceding write operation.
Example: Retrieving data after it has been updated.
WAR Dependence (Write-after-Read):

Write operation depends on the result of a preceding read operation.
Example: Updating data based on a prior retrieval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the proliferation of apps and name a problem that occurs with it

A

In time, users might want to get more information from their data files than what they originally anticipated when they initially wrote the programs
* To do so they would need to add more application programs to handle the new queries
* This leads to a proliferation of files and application programs which each user has to handle

This can be addressed via the use of DBMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name 5 advantages of using the DBMS

A
  • Control of data redundancy
  • Data Consistency
  • Sharing of Data
  • Improved data integrity
  • Improved Security
  • Enforcement of Standards
  • Economy of Scale
  • Balance of conflicting requirements
  • Improved data accessibility and responsiveness
  • Increased Productivity
  • Improved maintenance through data independence
  • Increased concurrency
  • Backup and Recovery services
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

explain how the DBMS helps with the economy of scale

A
  • Combining the entire organisation’s data into one database and
    creating a set of applications that work on this one source
    results in cost savings
    ― For instance, only one information system department and one set of computer systems in the organisation will be responsible for the
    database as opposed to separate information systems in the different departments
    ― This is much cheaper than having many separate small departmental information systems units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

explain how the DBMS helps with improved maintenance through data
independence

A
  • A DBMS separates data management from the application
    programs
    ― Hence if changes are made to the underlying data structure, the
    DBMS can handle this without changing application programs
  • The DBMS will provide the necessary mapping between
    application program and data stored on the database (using
    system catalogue)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some disadvantages of DBMS (6 total)

A
  • Complexity
    ― DBMSs are complex pieces of software and hence require those
    operating and using them to fully understand them if they are to be used to the best advantage
  • Size
    ― The complexity and breadth of functionality of the DBMS makes it a large piece of software that requires a large amount of memory to run efficiently
  • Cost of DBMS
    ― DBMS cost varies depending on whether it is for a single user or a big organisation. Maintenance and operations costs also need to be factored in. For large organisations with lots of data, hardware costs (e.g. hard disks, high performance processors, memory) are
    considerable
  • Cost of conversion
    ― When converting from a file-based system to a DBMS, there are data conversion costs, hardware and software procurement costs, as well as costs of training and recruiting personnel
  • Performance
    ― The DBMS is written for general use. Hence some applications may not run as fast as customised file-based applications
  • Greater impact of failure
    ― Centralisation increases system vulnerability. Failure of the DBMS may result in everyone being affected within an organisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the three levels present in three level database architecture(ANSI-SPARC)

A

External level
Conceptual level
Internal level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the External level of databases

A

It deals with how data is presented to and accessed by end users or applications.
Users at this level define their individual views of the database, specifying what data they need and how they want it to be displayed.
Multiple external views can exist, tailored to the specific needs and requirements of different user groups or applications.
Changes to the external level, such as modifications to views or access patterns, do not affect the underlying database structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the conceptual level

A

The conceptual level represents the abstract, logical structure of the entire database.
It defines the relationships among data elements, the constraints, and the integrity rules that must be maintained.
The schema at this level provides a comprehensive, unified view of the entire database, independent of any specific user’s perspective.
Modifications at the conceptual level, such as changes to the data model or structure, impact all external views that rely on it.
The conceptual level acts as a bridge between the external and internal levels, providing a logical representation that abstracts the physical storage details.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe the Internal level

A

The internal level is the lowest layer, dealing with the physical storage and retrieval of data.
It is concerned with how data is stored, indexed, and processed at the machine level.
The internal schema defines the storage structures, access paths, and techniques used for efficient data retrieval.
Changes at this level, such as modifications to indexing or storage methods, do not affect the external or conceptual levels.
The internal level provides a level of abstraction over the physical storage details, allowing changes in storage technology without impacting the higher levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define data abstraction

A

Hide storage details and present the users with a conceptual view of the database
Data abstraction in databases refers to the process of simplifying complex details and presenting a high-level view of data to users and applications. It involves creating abstract representations of data that hide the implementation details and complexities of the underlying database structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define data independence

A

Data independence in databases refers to the ability to modify the database schema or organization without affecting the applications that use the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the difference between physical and logical data independence

A

Physical data independence hides how data is stored, allowing changes to storage structure without affecting users or applications while Logical data independence shields users from changes in the database structure, permitting modifications without disrupting user interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is a database schema

A

Database schema: the description of the structure of the
database
Database schema: the description of the structure of the
database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a conceptual data model?

A

Identifies the high level data structure
* Independent of DBMS, application programs, physical considerations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a Logical data model

A
  • Describes the data in terms of data structures such as
    ― Records (nodes) and sets (edges)
  • e.g., Hierarchical model, network model
    ― Relational tables and columns
  • E.g., Relational model
    ― Object-oriented classes
  • E.g., Object-oriented model, object-relational model
  • Independent of a particular DBMS product and storage
    technology.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a physical data model

A
  • Describe how data is stored in the computer, representing
    record structures, record orderings, and access paths.
  • Highly dependent on the target DBMS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Define a relationship and what a degree of relationship means
Relationship (type): meaningful associations among two or more entities. * Degree of Relationship: The number of participating entity types in a relationship e.g Binary if 2, ternary if three and Quaternary if 4
26
What is a recursive relationship?
Degree one: recursive ― A relationship between occurrences of the same entity type ― E.g., AcademicStaff (professor) supervises AcademicStaff (lecturer).
27
Define a candidate Key and a primary key
Candidate Key: The minimal number of attribute(s) whose value(s) uniquely identify each entity occurrence ― A candidate key can not contain a null Primary Key: The candidate key that is selected to uniquely identify each occurrence of an entity type * Principles of choosing the primary key: ― Attribute length ― Minimal number of attributes required ― The future certainty of uniqueness
28
What is a strong/weak entity type?
Weak Entity Type: existence-dependent on some other entity ― Each entity occurrence cannot be uniquely identified using only the attributes associated with that entity type strong is not that ig
29
Define multiplicity
Multiplicity: the number of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship e.g that 1..* stuff Representation of Multiplicity Constraint Meaning 0..1 Zero or one entity occurrence 1..1 (or just 1) Exactly one entity occurrence 0..* (or just *) Zero or many entity occurrences 1..* One or many entity occurrences 3..7 Minimum of 3 up to a maximum of 7entity occurrences 1, 5, 10-12 1 or 5 or 10 to 12 entity occurrences
30
What is cardinality and Participation in multiplicity ?
Cardinality: the maximum values for the multiplicity ranges on either side of the relationship. ― One-to-one (1:1) ― One-to-many (1:*) ― Many-to-many (*:*) Participation: the minimum values for the multiplicity ranges on either side of the relationship ― Optional participation (if 0) ― Mandatory participation (if 1 or more)
31
Which side of the relationship does participation appear on?
The participation of entities appears on the opposite side of the relationship - google if u confused it's kinda wierd
32
What is a fan trap and when may it occur?
A fan trap is a type of join path between three tables when a "one-to-many" join links a table which is in turn linked by another "one-to-many" join. The fanning out effect of "one-to-many" joins can cause incorrect results to be returned when a query includes objects based on both tables.
33
What is a chasm trap and when may it occur?
A chasm trap is a type of join path between three tables when two "many-to-one" joins converge on a single table, and there is no context in place that separates the converging join paths.
34
What are 9 steps of conceptual database design
* [Step 1] Identify entity types * [Step 2] Identify relationship types * [Step 3] Identify and associate attributes with entity and relationship types * [Step 4] Determine primary keys for entity types * [Step 5] Identify the multiplicity constraints * [Step 6] Consider use of enhanced modeling (optional) * [Step 7] Check model for redundancy, traps * [Step 8] Validate conceptual model against user transactions * [Step 9] Review conceptual data model with user
35
Explain specialization in terms of Entities
Specialisation: The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics. e.g making more entities from one(Manager, SalesPersonnel and Secretary from Staff)
36
Explain generalization in terms of Entities
Generalisation: The process of minimizing the differences between entities by identifying their common characteristics.
37
What are some properties of relations
* A relation has a name that is distinct from all other relation names in the database * Each cell of the relation contains exactly one atomic (single) value * Each attribute has a distinct name * The values of an attribute are all from the same domain * Each tuple is distinct – there are no duplicate tuples * The order of attributes has no significance * The order of tuples has no significance
38
What is a relation schema and a relation instance
Relation schema: a named relation defined by the relation name (table name) and attributes. ― 𝑅(𝐴1, 𝐴2, … , 𝐴𝑛) Student (stdID, stdEmail, stdFstName, stdSecName, gender, programCode) Primary Key: stdID, Alternate Key: stdEmail Foreign Key: programCode references Program(programCode) * Relation instance/state: a set of tuples from a relation schema. ― r(R) ^This is like an actual database table
39
Name the three Integrity constraints put on attribute values
― Entity integrity ― Referential integrity ― Domain constrain
40
Define Entity Integrity
― Uniqueness: No two tuples can have identical values for candidate keys. ― Not-Null: No attribute of a candidate key can be a NULL. * Null: A value of an attribute is currently unknown or not applicable for this tuple.
41
Define referential integrity
Referential integrity: If a foreign key exists in a relation, ― either the foreign key value matches a primary key value in its home relation ― or the foreign key value is NULL
42
Define Domain constraints
Domain Constraints: enables the RDBMS to carry out checks on data errors and to determine the range of operations that can be carried out on the domain ― E.g., if the attribute Age is specified to be in the range16-30, any number outside would give an error ― E.g., If the attribute Name is to hold a set of characters, then any multiplication or division operations on the attribute values are illegal
43
Define views
View: a subset of base relation(s) Base relation: A named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database
44
Name some features of views
Virtual/derived relation. It does not necessarily exist in the database, but can be produced upon request. * Dynamic. The changes made to the base relation(s) that affect the view are immediately reflected in the view. * Generated by applying appropriate relational operations (e.g., 𝜎, 𝜋, x, ⋈, ….)
45
What is the purpose of views in databases
* Security mechanism * Customize user’s need * Simplify complex operations on the base relations
46
How are composite attributes represented in relation schemas
For composite attributes, include only the constituent simple attributes. So for example staffName which is a composite attribute containing staffFName and staffLName would be ignored and the relation schema would look like this: AcademicStaff(staffID, staffFName,staffLname, academicPosition, gender, DOB)
47
What is Normalisation and what is the purpose of it?
Normalization: A technique for producing a set of relations with desired properties. ― remove redundancy ― remove potential for insertion, modification, deletion anomalies
48
If A and B are attributes of relation R, when is B functionally dependent on A? Out of A and B which one is the determinant
if each value of A is associated with exactly one value of B, this is denoted A -> B In this case A is the determinant
49
If A -> B and B -> C where is the transitive dependency
C is transitively dependent on A via B ... A -> C
50
Given A -> B what would make this a partial functional dependency?
if by removing some attribute(s) from A, the dependency still holds.
51
Given A -> B what would make this a full functional dependency
B is not functionally dependent on any subset of A.
52
What should be done to a database to put it into first normal form?
Repeating groups must be removed, a primary key must exist and duplicate rows must be removed. Additionaly there must be no order dependency
53
What should be done to a database to put it into second normal form?
All partial dependencies must be removed
54
Define a partial dependency
Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key.
55
What should be done to a database to put it into third normal form
All transitive dependencies must be removed
56
Define a transitive dependency
A transitive dependency refers to some non-prime attribute other than the candidate key that depends on another non-prime attribute that is dependent entirely on the candidate key. ^this is hard to remember but it's not that hard to actually do Show_ID Telecast_ID Telecast_Type CD_Cost ($) F08 S09 Thriller 50 F03 S05 Romantic 30 F05 S09 Comedy 20 In this case Telecast_Type needs to be removed cos it depends on telecast ID which in turn depends on Show ID
57
What needs to occur for a database to be in Boyce-Codd Normal Form(BCNF)
Every determinant must be a candidate key or Every non-prime attribute must be functionally dependent on the entire primary key
58
What are the Unary operations that are represented by these symbols 𝜋, 𝜎, 𝜌
― Projection (𝜋 ) ― Selection (𝜎) ― Rename (𝜌)
59
What are the set operations that are represented by these symbols ∪, ∩, −, X
― Union (∪) ― Intersection (∩) ― Set difference (−) ― Cartesian Product (X)
60
What are the Binary operations represented by these symbols ⋈, (⋉, ⋊), ÷
― Join (⋈) ― Outer Join (⋉, ⋊) ― Division (÷)
61
How is the projection operator used?𝜋
𝝅<𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆 𝒍𝒊𝒔𝒕>(𝑹): Produce a new relation that has only some of 𝑅‘s columns e.g 𝝅ID, Name, Salary(Lecturer) Removes duplicate tuples
62
how is the selection operator used? 𝜎
𝝈<𝒔𝒆𝒍𝒆𝒄𝒕 𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏>(𝑹): Produce a new relation that contains only those tuples of R that satisfy the condition. e.g 𝝈𝒅𝒆𝒑𝒕𝑰𝑫=′𝟎𝟐′∧ 𝒂𝒈𝒆>𝟓𝟎(𝑳𝒆𝒄𝒕𝒖𝒓𝒆𝒓) or 𝝈barName='Sues'(Sells)
63
How is the rename operator used? 𝜌
𝝆𝑺 𝑬 : Rename the expression E to S * 𝝆𝑺(𝒂𝟏,𝒂𝟐,…,𝒂𝒏) 𝑬 : Rename the relation E to S, and also re-names the attributes as 𝑎1, 𝑎2, … , 𝑎𝑛. e.g Q: List all the professors, rename the output relation as Professor: 𝜌𝑃𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟(𝜎𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=′𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟′(𝐿𝑒𝑐𝑡𝑢𝑟𝑒𝑟))
64
How is the union operator used? ∪
Will take the union of both tables (R U S ), both tables must have the same relation schema for this to work
65
How is the intersection operator used? ∩
Will take the intersections of both tables (R ∩ S ), both tables must have the same relation schema for this to work
66
How is the set difference operator used? -
Will take the difference (R - S), records that exist in both R and S are remove from the R table and then that's outputted. Both tables must have the same relation schema for this to work 𝑅 − 𝑆 ≠ 𝑆 − 𝑅
67
How is the Cartesian product operator used?
This one is kinda weird * 𝑹 × 𝑺: Produces a relation that is the concatenation of every tuple of relation 𝑅 with every tuple of relation 𝑆. ―If both 𝑅 and 𝑆 have the same attribute 𝐴, then use: 𝑅. 𝐴 and 𝑆. 𝐴 Basically just stick S on the back of R and if they have the same attribute then call it R.attribute and S.attribute. Like for example Bar.barName and Sells.barName
68
What are all the join operations
― Theta join (Θ-join) ― Equijoin ― Natural join ― Outer join
69
Explain how theta (Θ) join works
* 𝑹 ⋈<𝒋𝒐𝒊𝒏 𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏> 𝑺: Defines a relation that contains all combinations of tuples from 𝑅 and 𝑆 that satisfy the join condition * 𝑅 ⋈<𝑗𝑜𝑖𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛> 𝑆 = 𝜎<𝑗𝑜𝑖𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛>(𝑅 × 𝑆)
70
Explain how Equijoin works
* Equijoin: Produces all the combinations of tuples from that satisfy a join condition with only equality comparisons. ―Equijoin is a special case of theta-join.
71
Explain how natural join works
* 𝑹 ⋈ 𝑺: an equijoin of the two relations R and S over all common attributes.
72
What is SQL used for
* SQL is a standard language for interacting with a relational database
73
What does Data Manipulation Language do and what are the statements used in it?
It retrieves and updates statements: -Select -Insert -Update -Delete
74
What does Data Definition Language do and what are the statements used in it?
Define DB structure & control access: ― CREATE ― ALTER ― DROP ― GRANT ― REVOKE
75
What are the 5 basic select conditions?
* Comparison: =, <> (!=), <, <=, >=, >, OR, AND, NOT * Range: BETWEEN ... AND, NOT BETWEEN ... AND * Set membership: IN, NOT IN * Pattern match: LIKE, NOT LIKE * IS NULL, IS NOT NULL
76
What are the 5 Aggregate functions?>
* COUNT * MIN * MAX * SUM * AVG
77
What does a Select - Between statement look like?
SELECT * FROM Products WHERE Price BETWEEN 10 AND 15; -Note you can add a not in front of between if needed
78
What does a Select - In statement look like
SELECT * FROM Customers WHERE Country IN ('Germany', 'France', 'UK'); -Note you can add a not before IN for the opposite result
79
What does a Select - Like statement look like?
SELECT * FROM Customers WHERE CustomerName LIKE 'a%';
80
What are the two types of wildcards in a select Like statement and what do they do?
% -- represents zero, one, or multiple characters _ -- represents a single character
81
How can Is Null/ Is not Null be used in select statements
SELECT CustomerName FROM Customers WHERE Address IS NULL ; -Not it must be IS NULL or IS NOT NULL, the IS is necessary
82
What are the 5 aggregate functions?
* COUNT * MIN * MAX * SUM * AVG
83
Write a query to count how many countries there are in the customers table
SELECT COUNT(DISTINCT Country) FROM Customers;
84
Write a query to calculate the minimal price, maximal price and average price of all products.
SELECT MIN(Price), MAX(Price), AVG(Price)q FROM Products;
85
What are GROUP BY and HAVING used for in a SELECT statement
GROUP BY: forms group of rows with the same column value. HAVING: filters the groups subject to some condition This is an example of where they are used in a query SELECT [DISTINCT] column_name(s) FROM table_name(s) [WHERE condition] [GROUP BY column_name(s)] [HAVING condition] [ORDER BY column_name(s)];
86
Using GROUP BY how can i create an SQL statement to count how many customers are from each country
SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;
87
In Grouping queries what can the SELECT list only contain?
― The grouping columns ― Aggregate functions. An example is this: SELECT COUNT(CustomerID), Country, City FROM Customers GROUP BY Country, City;
88
When should the HAVING clause replace the WHERE clause?
When aggregate functions are involved; -HAVING clause always includes at least one aggregate function. -WHERE clause can not include aggregate function.
89
How do Aliases work?
An alias is a temporary name assigned to a table or a column for the duration of a query. Aliases are used to make the output of a query more readable or to shorten lengthy table or column names
90
Define a subquery, which clause it must be inside and how it works
Subquery (or nested query) : a complete SELECT statement embedded in another query. ― In WHERE, or HAVING clause SELECT * FROM Customers WHERE Country = (SELECT Country FROM Customers WHERE CustomerName= ‘Ernst Handel’);
91
How does IN / NOT IN work for subqueries?
Used in cases where the subquery will output more than one value If the inner SELECT statement outputs a single value, use = If multiple values, use IN
92
How does EXISTS / NOT EXISTS work for subqueries?
The EXISTS operator returns true if the subquery returns one or more rows. It doesn't matter what the actual values of the rows are; the existence of any row is sufficient for the EXISTS condition to be true.
93
How do inner joins work?
Inner joins in SQL are used to combine rows from two or more tables based on a related column between them. The result includes only the rows where the specified condition is met.
94
How do outer joins work?
Outer joins in SQL are used to retrieve rows from one or more tables, even if there is no match between the joined tables. Unlike inner joins, which return only the matching rows, outer joins return all rows from at least one of the tables, with the non-matching rows filled with NULL values for columns from the table that does not have a corresponding match.
95
What are the types of outer join and the difference between them?
― LEFT JOIN: returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match. ― RIGHT JOIN: … ― FULL JOIN: Retain both the left table and right table rows. The result is NULL for those no matches.
96
How could i use inner join to satisfy this query? List all the customer’s orderID and order dates. Assume there is a customers and Orders table
SELECT o.OrderID, c.CustomerName, o.OrderDate FROM Orders o Natural JOIN Customers c;
97
What does self join do?
A self join in SQL is a specific case where a table is joined with itself. This is useful when you want to compare rows within the same table, treating it as if it were two separate tables. To perform a self join, you typically use aliases to differentiate between the columns from the same table
98
How does an INSERT INTO statement work
Insert adds new rows to the table Syntax is: INSERT INTO (Column1, Column2) VALUES (19, 'ThatValue'), (23, 'ThisValue'); or with a whole record the columns are not needed INSERT INTO Orders VALUES (10447, 90, 6, ‘2017-05-25’,3);
99
What is the syntax for an update statement
Example 1 : Give all the products’ price a 3% increase UPDATE Products SET Price=Price*1.03; Example 2 UPDATE Customers SET ContactName = ‘Alfred Schmidt’, City= ‘Frankfurt’ WHERE CustomerID = ‘001’;
100
What is the syntax for a delete statement
DELETE FROM
WHERE
101
What is the syntax for table creation
CREATE TABLE IF NOT EXISTS Tasks ( task_id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255) NOT NULL, start_date DATE, due_date DATE, status TINYINT NOT NULL, description TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); IF NOT EXISTS is optional
102
What are the main data types in MySql
Numeric types -INT, TINYINT, DECIMAL, FLOAT, DOUBLE string types; -CHAR, VARCHAR, TEXT Dat/time types -DATE -TIME -DATETIME
103
What are the 5 constraints?
– NOT NULL – UNIQUE – PRIMARY KEY – FOREIGN KEY – CHECK
104
What does the NOT NULL constraint do?
NOT NULL: ensures values stored in a column are not NULL.
105
What does the UNIQUE constraint do?
UNIQUE: ensures values in a column or a group of columns are different used like : CREATE TABLE table_name( ..., column_name data_type UNIQUE, ... );
106
How can constraints work for more than one column
CREATE TABLE table_name( ..., [CONSTRAINT constraint_name] UNIQUE(Column_list) );
107
What does the check constraint do?
CHECK: ensures that values stored in a column or group of columns satisfy a Boolean expression. used like : CREATE TABLE table_name( ..., column_name data_type CHECK(Boolean_expression), ... );
108
How is the primary key constraint used?
PRIMARY KEY ― Implicitly means NOT NULL + UNIQUE ― A table can have one and only one primary key. CREATE TABLE table_name( ..., column_name data_type PRIMARY KEY, ... );
109
How is the foreign key constraint used?
* FOREIGN KEY: refers to the PRIMARY KEY in another table -The domain of the foreign key must be the same as the domain used in the referenced table CREATE TABLE Products( productId INT AUTO_INCREMENT PRIMARY KEY, productName VARCHAR(100) NOT NULL, categoryId INT, CONSTRAINT fk_category FOREIGN KEY (categoryId) REFERENCES categories(categoryId) ) ENGINE=INNODB;
110
What do these three referential options that can be applied to the foreign key do? CASCADE, SET NULL and RESTRICT/NO ACTION
― CASCADE: If a row from the parent table is deleted or updated, the values of the matching rows in the child table automatically deleted or updated. ― SET NULL: When referenced data in the parent key is deleted or updated, all rows in the child table that depend on those values have their foreign keys set to NULL. ― RESTRICT/NO ACTION: Reject the delete/update operation from the parent table, if a row from the parent table has a matching row in the child table.
111
Using ALTER TABLE how could you add / drop columns?
ALTER TABLE Persons ADD COLUMN dateOfBirth date NOT NULL, ADD COLUMN joinDate date default CURRENT_DATE(); ALTER TABLE Persons DROP COLUMN dateOfBirth;
112
Using ALTER TABLE how could you modify existing columns?
ALTER ALTER TABLE Vehicles MODIFY year SMALLINT NOT NULL, MODIFY color VARCHAR(20) NULL AFTER make;
113
Using ALTER TABLE how can i add or drop a table constraint?
Add a table constraint: ALTER TABLE table_name ADD [CONSTRAINT constraint_name] UNIQUE/CHECK/PRIMARY KEY/FOREIGN KEY (...); drop a table constraint ALTER TABLE table_name DROP CONSTRAINT constraint_name; drop primary/ foreign key ALTER TABLE table_name DROP PRIMARY KEY; ALTER TABLE table_name DROP FOREIGN KEY constraint_name;
114
Using ALTER TABLE how can i add a constraint for a single table attribute?
ALTER TABLE Persons ADD CONSTRAINT UC_person UNIQUE(ID,lastName);
115
How do i drop a table in SQL?
DROP TABLE Shippers;
116
What is AUTO_INCREMENT and how can it be assigned as a constraint
Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table. ― Typically for primary key ― In MySQL, assign AUTO_INCREMENT to a column example: CREATE TABLE Persons ( personId int NOT NULL AUTO_INCREMENT PRIMARY KEY, lastName varchar(255) NOT NULL, firstName varchar(255), Age int );
117
How can we let the AUTO_INCREMEMENT start with another value instead of 1?
ALTER TABLE Persons AUTO_INCREMENT 100; ^add a number on the end ¯\_(ツ)_/¯
118
What is an index and what does it do?
A data structure such as B-Tree that improves the speed of data retrieval on a table at the cost of additional writes and storage to maintain it. Note: Updating a table with indexes takes more time than updating a table without. So, only create indexes on columns that will be frequently searched.
119
How can we index an attribute in SQL and how can we add an index to an already existing attribute?
INDEX ( attributeName ) like : CREATE TABLE Persons ( personid int NOT NULL AUTO_INCREMENT PRIMARY KEY, lastName varchar(255) NOT NULL, firstName varchar(255), age int, INDEX (lastName) ); add to existing attribute: CREATE INDEX idx_persons ON Persons(lastName);
120
Note : MYSQL automatically creates a special index named PRIMARY for the primary key
^
121
How can i create a unique index and what does it do?
CREATE UNIQUE INDEX emp_u1 ON EMP (eName, job); This will Index and ensure that each combination of eName and job are unique.
122
What is the basic syntax to create a view?
CREATE [OR REPLACE] VIEW ViewName [column1,…] AS SELECT …; example horizontal view CREATE VIEW Emp_D20 AS SELECT * FROM Emp WHERE deptNo = 20; example vertical view CREATE VIEW Emp_NoSalary AS SELECT empNo, eName, job, mgr, hireDate, comm, deptNo FROM Emp;
123
How can i create a view for multiple tables?
CREATE VIEW Num_Emp_Job(deptNo,job,cnt) AS SELECT deptNo,job,COUNT(*) FROM Emp GROUP BY deptNo, job ORDER BY deptNo;
124
What does view updatability mean and in what case will it not work?
All updates to a base table are immediately reflected in all views that encompass that base table. Similarly, we may expect that if a view is updated then the base table(s) will reflect that change. ― For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the base table. e.g SELECT statement defining the view must be simple, meaning it doesn't involve complex expressions, aggregates, GROUP BY, DISTINCT, JOINs, or subqueries
125
Define a commit and a rollback
* COMMIT ― Causes all outstanding changes that you have made to become permanent. * ROLLBACK ― Causes all uncommitted changes to be discarded
126
Define database implementation
* Physical realization of the database and application designs ― Use DDL to create database schemas ― Use DML to populate the data ― Develop the application programs * Database transactions are implemented using the DML, possibly embedded in a host programming language.
127
Define Physical Database Design (PDD)
The process of producing a description of the implementation of the database on secondary storage. ― Describe the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associate integrity constraints and security measures.
128
What are some prerequisites before PDD can occur?
― Logical database design * Minimally in 3rd normal form ― Transaction analyses * Most frequent transactions * Most complex or resource-insensitive transactions * Distributions of transactions, over time * Most critical transactions to the applications ― Which DBMS software to use
129
What are the steps of PDD
STEP1: Translate the logical data model for target DBMS ― 1.1 Design base relations ― 1.2 Design representation of derived data ― 1.3 Design general constraints STEP2: Design file organizations and indexes ― 2.1 Choose file organizations ― 2.2 Choose indexes ― 2.3 Estimate disk space requirement STEP3: Design user views STEP4: Design security mechanisms STEP5: Consider controlled redundancy STEP6: Monitor and tune the operational system
130
How do you design base relations in PDD?
* Defining Tables and Columns: including naming objects, choosing data types and lengths, and handling null, default values * Defining Primary Keys: may considering the use of surrogate keys * Handling referential integrity for foreign keys. ^This is basically this thingy , OrderDetails(OrderID, ProductID, Quantity)
131
What is a surrogate key?
A surrogate key is a unique identifier assigned to each record or row in a database table. Unlike natural keys, which are based on existing data attributes and have some inherent meaning, surrogate keys are system-generated and lack any meaningful information about the data they represent. Surrogate keys are typically used as primary keys in database tables.
132
How do you represent derived data in PDD?
Remember derived data is the data that's calculated based on the database and not physically stored 1. Use a generated column 2. Create a view 3. Use triggers* Triggers are stored programs executed automatically to respond to specific events such as inserting, updating or deleting data. 4. Calculate it each time
133
What is the storage structure for a database?
A database is stored as a collection of files. A database file is partitioned into fixed-length storage units called blocks. What is a block? Block/Page: A contiguous sequence of sectors from a single track
134
What are some properties about primary secondary and stable storage for databases and give an example of each
-Primary storage is volatile ( e.g main memory/cache memory -Primary storage is where the operations happen -Data blocks can be moved from primary to secondary storage -Secondary storage is where the entire database is stored e.g(magnetic disk, flash drives etc) -Stable storage is where the database is backed up (e.g RAID or a Redundant Array of Independent Disks) -Stable storage is always non-volatile with independent failure modes
135
Define File Organisation
The physical arrangement of data in a file
136
What are 3 ways to organize blocks/pages in a file
― Heap (unsorted) files: Records are placed on disk in no particular order ― Sequential (ordered) files: records are ordered by the value of a specific attribute. ― Hash files: Records are placed on disk according to a hash function
137
What are the two types of index and what do they do?
Clustered Index: With a clustered index the rows are stored physically on the disk in the same order as the index. ― Only one clustered index is allowed for each table ― If the indexed attribute chosen is a key of a relation, the index is called primary index. Non-Clustered(secondary) index: An index which maintains a logical ordering of data rows without altering the physical ordering of the rows. ― A table could have several secondary indexes ― Foreign keys are good candidates for non clustering indexes.
138
Define Deonormalization and when it should be used
Denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. ― If performance is unsatisfactory and a relation has a low update and a very high query rate.
139
When attempting controlled redundancy why would we create new tables/views? Why would we Split tables?
They can hold data derived in columns from other tables - stops it having to be derived freshly each time Decompose very large relations into smaller and more manageable pieces.
140
What is a transaction?
Transaction: an executing program that forms a logical unit of database operations A transaction includes one or more database operations, e.g., insertion, deletion, updating, retrieval.
141
Define ACID in database transactions and what each letter stands for
* Atomicity: All or nothing e.g shouldn’t take money from A without giving it to B * Consistency: transforms the database from one consistent state to another consistent state e.g money isn't lost or gained Isolation: partial effects of incomplete transactions should not be visible to other transactions e.g. other transactions shouldn’t see such a change until completion. * Durability: successfully committed transactions are permanently recorded in the database, not lost, even in the event of a system failure e.g. After completion, such a change in A and B is saved in the database.
142
What is the SQL to start a transaction
START TRANSACTION; COMMIT; e.g START TRANSACTION; UPDATE Accounts SET balance = balance - 100 WHERE account = 'A'; UPDATE Accounts SET balance = balance + 100 WHERE account = 'B’; COMMIT;
143
What is a database Schedule and name and define the two types
Schedule: an ordering of operations for concurrent transactions. There are Serial schedules and Non-serial schedules Serial schedule: a schedule in which the operations for concurrent transactions are not interleaved. * But, unacceptable in practice. Serial schedule: a schedule in which the operations for concurrent transactions are not interleaved. * But, unacceptable in practice.
144
What are two desired properties for a schedule
Serializability and Recoverability
145
Define a Conflicting operation in a database schedule
Conflicting operations: if two operations belong to different transactions, access the same database item, and at least one operation is a writing operation.
146
Define when a Transaction has a recoverable schedule
A schedule is considered recoverable if, in the presence of failures (such as crashes or errors), it is possible to restore the database to a consistent state. a schedule, where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then Ti should commit before Tj.
147
Define concurrency control
Concurrency Control is the process of managing simultaneous operations on the database without having them interfere with one another.
148
State the two types of concurrency control and what they ential
― Pessimistic: Assumes that conflict is likely and take steps to prevent it. * Locking, timestamping ― Optimistic: Assumes that conflict is unlikely and only checks for it when transaction commits
149
What does a lock do in database transactions?
Locking: When a transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results. ― A read operation must acquire a shared/read lock ― An update operation must acquire an exclusive/write lock
150
What can locks be applied to?
Locks can be applied to: ― The whole database ― A file ― A page/a disk block ― A record ― A field value of a record
151
In a shared/exclusive locking scheme what must be issued before any read(X)
A transaction must issue read_lock(X), or write_lock(X) before any read(X).
152
In a shared/exclusive locking scheme what must be issued before any write(X)
― A transaction must issue write_lock(X) before any write(X)
153
In a shared/exclusive locking scheme what must be issued after read(X) or write(X)
― A transaction must issue unlock(X) after read(X) or write(X)
154
Does a shared/exclusive locking scheme guarantee stability?
Using shared/exclusive locking scheme does not guarantee serializability. More strict protocols are required, e.g. two-phase locking (2PL).
155
Define two-phase locking and each of the two phases in it
Two-Phase Locking (2PL): All locking operations precede the first unlock operation in a transaction. ― Growing/Expanding phase: during which new locks on items can be acquired but none can be released; ― Shrinking phase: during which existing locks can be released but no new locks can be acquired. * Advantage: it assures serializability
156
What problem can two phase locking cause
It can cause a deadlock which is a circular situation where each of two (or more) transactions are waiting to acquire locks that are held by the other.
157
How can deadlocks be prevented by using timeouts?
A transaction will wait for a (database defined) period to acquire a lock. * If this time runs out then the whole transaction is rolled back and restarted.
158
How can deadlocks be prevented by the use of a wait-for-graph(WFG)
* A Wait-for Graph (WFG) is constructed ― node for each transaction ― directed edge from transaction T1 to transaction T2 if T1 is waiting to lock an item currently held by T2 * A deadlock exists if the graph contains a cycle.
159
What needs to happen when a deadlock is detected? One issue that can occur in deadlock prevention is starvation, define starvation?
* Once deadlock is detected, DBMS needs abort one or more of the transactions. -The transaction with the minimum cost should be rolled back * Starvation: the same transaction is always chosen as a victim, and the transaction can never complete.
160
What are the two deadlock prevention algorithms and what do they do?
* Wait-die algorithm ― only older transactions can wait for younger ones * Wound-wait algorithm ― only younger transactions can wait for older ones
161
Define a timestamp in transactions and explain timestamping
Timestamp: A unique identifier created by the DBMS that indicates the relative starting time of a transaction. ― Either a logical counter or the system clock. * Timestamping: a concurrency control protocol that orders transactions in such a way that older transactions get priority in the event of conflict
162
Explain optimistic techniques
Optimistic techniques assume that conflict is rare. ― Before a transaction commits, a check is performed to determine whether conflict has occurred. ― If yes, roll back and restart.
163
Define a backup and the three different kinds
A backup is a copy of the database taken periodically and stored in a safe place which allows the database to be restored with an acceptable loss of data * Full backup ― The whole database. * Differential backup ― All changes made since the last full backup * Incremental ― All changes made since the last incremental backup`
164
Define database recovery and the two main techniques used for it
Recovery: The process of restoring the database to a correct state in the event of a failure. Two main recovery techniques ― Log-based ― Shadow paging
165
Explain a log and how it can be used as recovery
Log: a sequence of records, which maintains the records of actions performed by transactions, recovery can be performed by reversing these operations
166
Explain check-pointing in a log file and the operations involved
Checkpoints are scheduled at predetermined intervals and involve the following operations: ― Write all log records in main memory to secondary storage ― Write the modified blocks in the buffers to secondary storage. ― Write a checkpoint record to the log file. This record contains transactions which are active at the time of the checkpo
167
When a crash occurs in a database how are logs and Checkpoints used?
When a failure occurs, ― The system check the log file backwards from the end to the last checkpoint. ― REDO all the transactions that committed since the last checkpoint. ― UNDO all the transactions that were active at the time of the crash.
168
What are the two types of database updating schemes and what do they entail?
Two types of DB updating schemes: ― Deferred update: Updates are not written to the database until after a transaction has reached its commit point. ― Immediate update: Updates are applied to the database as they occur without waiting to reach the commit point.
169
Define shadow paging
Maintain two tables during the lifetime of a transaction: ― A current page table ― A shadow page table During a transaction the current page is copied into a shadow page table and the shadow page table is never modified during transaction execution. If the transaction fails the current page table is discarded and the shadow table is used instead
170
What are advantages and disadvantages of shadow paging?
Advantage: ― The overhead of maintaining the log file is eliminated. ― The recovery is significantly faster since there is no need for UNDO or REDO operations. Disadvantage: ― Data fragmentation ― Need for periodic garbage collection to reclaim inaccessible blocks
171
What are some things that can cause security issues in databases
Caused by – Unauthorized users – Authorized users – Programmers - DBA
172
What can bad security cause a loss of (CIA)
― Confidentiality: the protection of data from unauthorized disclosure ― Integrity: the requirement that information is protected from improper modification. ― Availability
173
Explain Discretionary Access Control (DAC)
― Discretionary Access Control (DAC) * The owner of the object specifies which subjects can access the object. * Used by most commercial DBMSs * Supported by SQL: GRANT and REVOKE
174
Explain Mandatory Access control
Mandatory Access Control (MAC) * The system (not the users) specifies which subjects can access specific data objects. * Not supported by SQL
175
How would i use SQL to grant rfc@localhost the INSERT privilege on classicmodels database?
GRANT INSERT ON classicmodels.* TO rfc@localhost; Note: the opposite of GRANT is REVOKE
176
How does the Bell-LaPadula model work?
A subject S is only allowed to read an object O if it has a higher level of clearance than the object ― Class(O): the classification of an object O (relation, tuple, column, view, operation) ― Class(S): the clearance of a subject S (user, account, program)
177
What are the 4 security Levels in the Bell-LaPadula model?
Top Secret (TS) > Secret (S) > Confidential (C) > Unclassified (U)
178
How can views be used for security?
The view mechanism provides a powerful and flexible security mechanism by hiding parts of the database from certain users.
179
What are the 3 different targets of encryption in a database?
― Disk ― Database ― Network traffic
180
How can an audit be used in database security?
Rather than preventing certain actions, record who did what. ― Keep track of all operations from each user through each login session. ― The audit trail (or system log) itself must be secured
181
What is a Redundant Array of Independent Disks(RAID) and what is it used to do?
A data storage technique that combines multiple physical disk drive components into one logical unit *improves reliability * increases performance * improves availability
182
What is Data striping is a technique used in a RAID, what does it do?
Data striping: the data is segmented into equal-size partitions (the striping unit) which are transparently distribute across multiple disks. ― Improves overall I/O performance by allowing multiple I/Os to be serviced in parallel.
183
What is a Parity (or error correcting) scheme ?
Each byte may have a parity bit associated with it that records whether the number of 1s is even or odd. ― Even parity: number of 1’s including parity bit is even * E.g., 0110110 → 01101100, 0101100 → 01011001 ― Odd parity: number of 1’s including parity bit is odd * E.g., 0110110 → 01101101, 0101100 → 01011000 ― Improves reliability
184
What are the 4 different RAID schemes?
― RAID 0 : striping ― RAID 1: mirroring ― RAID 10 ― RAID 5
185
What is a pro and a con for RAID 0: striping
― Pros: improved I/O performance ― Cons: complete vulnerability to drive failures
186
What is a pro and a con for RAID 1: mirroring
― Pros: Fast read, availability improvement ― Cons: Write throughput is always slower because every drive must be updated. Most expensive storage solution.
187
What is a pro and a con for RAID 10: striping and mirroring
― Pros: improved I/O performance, rebuilding time is fast if one drive fails. ― Cons: Still expensive because of mirroring.
188
What is a pro and a con for RAID 5: striping and parity
― The parity data is also striped across all the disk. ― Pros: Resilience to a single drive failure, fast reading. ― Cons: Writing is slower (due to the parity needed be calculated), complex technology.
189
What is a distributed database?
Distributed database: a collection of multiple interconnected databases, which are ― spread physically across various locations ― communicate via a computer network.
190
What are two desired characteristics for distributed databases?
1. Distribution/location transparency ― Users perceive the database as a single, logical entity. 2. Local autonomy ― Local data is locally owned and managed. ― Each local database operated independently. ― Local systems are able to operate effectively even if remote databases are not available.
191
What are two types of distributed Database?
Homogeneous ― All sites use the same type of DBMS. Heterogeneous ― Sites run different DBMSs. ― Translation using gateways: convert the language and model of each different DBMS into the language and model of the relational system.
192
What is Data fragmentation in databases?
Fragmentation is to divide a table into a set of smaller tables to be stored in different sites. The subsets of the table are called fragments. ― Horizontal * Subsets of rows are stored at different sites. ― Vertical * Subsets of attributes are stored at different sites. ― Hybrid (Mixed)
193
What is the correctness rule in Data Fragmentation
Correctness rules: ― Completeness: Each data item can be found in R must appear in at least one fragment. ― Reconstruction * Horizontal: 𝑅 = 𝑅1 ∪ 𝑅2 … ∪ 𝑅𝑛 * Vertical: 𝑅 = 𝑅1 ⋈ 𝑅2 … ⋈ 𝑅𝑛 ― Disjointness * Horizontal: The fragments are disjoint. * Vertical: The fragments are disjoint except for the primary key
194
Define data allocation
Data allocation: The process of assigning each fragment to a particular site in the distributed system.
195
What are three types of Data replication and allocation?
* Fully replication ― A complete copy of the DB at each site * Non-redundant allocation ― Each fragment is stored at exactly one site * Partial replication ― Certain transactions that access particular parts of the DB are mostly submitted at a particular site, that fragment can be allocated at that site only. ― Data that is accessed at multiple sites can be replicated at those sites. If many updates are performed, it may limit replication.
196
What are some advantages of a distributed database?
* Reflects organizational structure * Easier expansion * Increased reliability and availability * Better response/Improved performance
197
What are some disadvantages of a distributed database?
* Complex to design and implementation * Harder to control security and integrity * Lack of standard and experience
198
Define a data warehouse and some properties
Data warehouse is a database, which – is kept separate from the organization's operational database, – no frequent updating – possesses consolidated historical data – helps the organization to analyse its business
199
What are some key features on a data warehouse
― Subject Oriented − e.g., a sales data warehouse, or a credit card transaction data warehouse, etc. ― Integrated − data combined from multiple sources. ― Time Variant − looking for changes over time ― Non-volatile − the previous data is not erased when new data is added to it. Frequent changes in operational database is not reflected in the data warehouse.
200
Define OLAP
Online Analytical Processing (OLAP): ― Uses multi-dimensional view of aggregate data to provide quick access to strategic information
201