Insert adds new rows to the table
Syntax is:
INSERT INTO
(Column1, Column2)
VALUES
(19, 'ThatValue'),
(23, 'ThisValue');
or with a whole record the columns are not needed
INSERT INTO Orders
VALUES
(10447, 90, 6, ‘2017-05-25’,3);
What is the syntax for an update statement
Example 1 : Give all the products’ price a 3% increase
UPDATE Products
SET Price=Price*1.03;
Example 2
UPDATE Customers
SET ContactName = ‘Alfred Schmidt’,
City= ‘Frankfurt’
WHERE CustomerID = ‘001’;
What is the syntax for a delete statement
DELETE FROM
WHERE
What is the syntax for table creation
CREATE TABLE IF NOT EXISTS Tasks (
task_id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
start_date DATE,
due_date DATE,
status TINYINT NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
IF NOT EXISTS is optional
What are the main data types in MySql
Numeric types
-INT, TINYINT, DECIMAL, FLOAT, DOUBLE
string types;
-CHAR, VARCHAR, TEXT
Dat/time types
-DATE
-TIME
-DATETIME
What are the 5 constraints?
– NOT NULL
– UNIQUE
– PRIMARY KEY
– FOREIGN KEY
– CHECK
What does the NOT NULL constraint do?
NOT NULL: ensures values stored in a column are
not NULL.
What does the UNIQUE constraint do?
UNIQUE: ensures values in a column or a group of
columns are different
used like :
CREATE TABLE table_name(
...,
column_name data_type UNIQUE,
...
);
How can constraints work for more than one column
CREATE TABLE table_name(
...,
[CONSTRAINT constraint_name] UNIQUE(Column_list)
);
What does the check constraint do?
CHECK: ensures that values stored in a column or group
of columns satisfy a Boolean expression.
used like :
CREATE TABLE table_name(
...,
column_name data_type CHECK(Boolean_expression),
...
);
How is the primary key constraint used?
PRIMARY KEY
― Implicitly means NOT NULL + UNIQUE
― A table can have one and only one primary key.
CREATE TABLE table_name(
...,
column_name data_type PRIMARY KEY,
...
);
How is the foreign key constraint used?
* FOREIGN KEY: refers to the PRIMARY KEY in another table
-The domain of the foreign key must be the same as the domain used in the referenced table
CREATE TABLE Products(
productId INT AUTO_INCREMENT PRIMARY KEY,
productName VARCHAR(100) NOT NULL,
categoryId INT,
CONSTRAINT fk_category FOREIGN KEY (categoryId)
REFERENCES categories(categoryId)
) ENGINE=INNODB;
What do these three referential options that can be applied to the foreign key do? CASCADE, SET NULL and RESTRICT/NO ACTION
― CASCADE: If a row from the parent table is deleted or updated, the
values of the matching rows in the child table automatically deleted or
updated.
― SET NULL: When referenced data in the parent key is deleted or updated,
all rows in the child table that depend on those values have their foreign
keys set to NULL.
― RESTRICT/NO ACTION: Reject the delete/update operation from the
parent table, if a row from the parent table has a matching row in the
child table.
Using ALTER TABLE how could you add / drop columns?
ALTER TABLE Persons
ADD COLUMN dateOfBirth date NOT NULL,
ADD COLUMN joinDate date default CURRENT_DATE();
ALTER TABLE Persons
DROP COLUMN dateOfBirth;
Using ALTER TABLE how could you modify existing columns?
ALTER ALTER TABLE Vehicles
MODIFY year SMALLINT NOT NULL,
MODIFY color VARCHAR(20) NULL AFTER make;
Using ALTER TABLE how can i add or drop a table constraint?
Add a table constraint:
ALTER TABLE table_name
ADD [CONSTRAINT constraint_name] UNIQUE/CHECK/PRIMARY KEY/FOREIGN KEY (...);
drop a table constraint
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;
drop primary/ foreign key
ALTER TABLE table_name
DROP PRIMARY KEY;
ALTER TABLE table_name
DROP FOREIGN KEY constraint_name;
Using ALTER TABLE how can i add a constraint for a single table attribute?
ALTER TABLE Persons
ADD CONSTRAINT UC_person UNIQUE(ID,lastName);
How do i drop a table in SQL?
What is AUTO_INCREMENT and how can it be assigned as a constraint
Auto-increment allows a unique number to be generated
automatically when a new record is inserted into a table.
― Typically for primary key
― In MySQL, assign AUTO_INCREMENT to a column
example:
CREATE TABLE Persons (
personId int NOT NULL AUTO_INCREMENT PRIMARY KEY,
lastName varchar(255) NOT NULL,
firstName varchar(255),
Age int
);
How can we let the AUTO_INCREMEMENT start with another value instead of 1?
ALTER TABLE Persons AUTO_INCREMENT 100;
^add a number on the end ¯\_(ツ)_/¯
What is an index and what does it do?
A data structure such as B-Tree that improves the speed of
data retrieval on a table at the cost of additional writes and
storage to maintain it.
Note: Updating a table with indexes takes more time than updating a table without. So, only create indexes on columns that will be frequently searched.
How can we index an attribute in SQL and how can we add an index to an already existing attribute?
INDEX ( attributeName )
like :
CREATE TABLE Persons (
personid int NOT NULL AUTO_INCREMENT PRIMARY KEY,
lastName varchar(255) NOT NULL,
firstName varchar(255),
age int,
INDEX (lastName)
);
add to existing attribute:
CREATE INDEX idx_persons ON Persons(lastName);
Note : MYSQL automatically creates a special index named PRIMARY for the primary key
How can i create a unique index and what does it do?
CREATE UNIQUE INDEX emp_u1
ON EMP (eName, job);
This will Index and ensure that each combination of eName and job are unique.
What is the basic syntax to create a view?
CREATE [OR REPLACE] VIEW ViewName [column1,…] AS
SELECT …;
example horizontal view
CREATE VIEW Emp_D20 AS
SELECT * FROM Emp
WHERE deptNo = 20;
example vertical view
CREATE VIEW Emp_NoSalary AS
SELECT empNo, eName, job, mgr, hireDate, comm, deptNo
FROM Emp;
How can i create a view for multiple tables?
CREATE VIEW Num_Emp_Job(deptNo,job,cnt) AS
SELECT deptNo,job,COUNT(*)
FROM Emp
GROUP BY deptNo, job
ORDER BY deptNo;
What does view updatability mean and in what case will it not work?
All updates to a base table are immediately reflected in all views
that encompass that base table. Similarly, we may expect that if a
view is updated then the base table(s) will reflect that change.
― For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the base table. e.g SELECT statement defining the view must be simple, meaning it doesn't involve complex expressions, aggregates, GROUP BY, DISTINCT, JOINs, or subqueries
Define a commit and a rollback
* COMMIT
― Causes all outstanding changes that you have made to become
permanent.
* ROLLBACK
― Causes all uncommitted changes to be discarded
Define database implementation
* Physical realization of the database and application designs
― Use DDL to create database schemas
― Use DML to populate the data
― Develop the application programs
* Database transactions are implemented using the DML, possibly
embedded in a host programming language.
Define Physical Database Design (PDD)
The process of producing a description of the implementation
of the database on secondary storage.
― Describe the base relations, file organizations, and indexes used to
achieve efficient access to the data, and any associate integrity
constraints and security measures.
What are some prerequisites before PDD can occur?
― Logical database design
* Minimally in 3rd normal form
― Transaction analyses
* Most frequent transactions
* Most complex or resource-insensitive transactions
* Distributions of transactions, over time
* Most critical transactions to the applications
― Which DBMS software to use
What are the steps of PDD
STEP1: Translate the logical data model for target DBMS
― 1.1 Design base relations
― 1.2 Design representation of derived data
― 1.3 Design general constraints
STEP2: Design file organizations and indexes
― 2.1 Choose file organizations
― 2.2 Choose indexes
― 2.3 Estimate disk space requirement
STEP3: Design user views
STEP4: Design security mechanisms
STEP5: Consider controlled redundancy
STEP6: Monitor and tune the operational system
How do you design base relations in PDD?
* Defining Tables and Columns: including naming objects,
choosing data types and lengths, and handling null, default values
* Defining Primary Keys: may considering the use of surrogate keys
* Handling referential integrity for foreign keys.
^This is basically this thingy , OrderDetails(OrderID, ProductID, Quantity)
A surrogate key is a unique identifier assigned to each record or row in a database table. Unlike natural keys, which are based on existing data attributes and have some inherent meaning, surrogate keys are system-generated and lack any meaningful information about the data they represent. Surrogate keys are typically used as primary keys in database tables.
How do you represent derived data in PDD?
Remember derived data is the data that's calculated based on the database and not physically stored
1. Use a generated column
2. Create a view
3. Use triggers*
Triggers are stored programs executed automatically to
respond to specific events such as inserting, updating or
deleting data.
4. Calculate it each time
What is the storage structure for a database?
A database is stored as a collection of files. A database file is
partitioned into fixed-length storage units called blocks.
What is a block?
Block/Page: A contiguous sequence of sectors from a single
track
What are some properties about primary secondary and stable storage for databases and give an example of each
-Primary storage is volatile ( e.g main memory/cache memory
-Primary storage is where the operations happen
-Data blocks can be moved from primary to secondary storage
-Secondary storage is where the entire database is stored e.g(magnetic disk, flash drives etc)
-Stable storage is where the database is backed up (e.g RAID or a Redundant Array of Independent Disks)
-Stable storage is always non-volatile with independent failure modes
The physical arrangement of data in a file
What are 3 ways to organize blocks/pages in a file
― Heap (unsorted) files: Records are placed on disk in no particular
order
― Sequential (ordered) files: records are ordered by the value of a
specific attribute.
― Hash files: Records are placed on disk according to a hash function
What are the two types of index and what do they do?
Clustered Index:
With a clustered index the rows are stored physically on the
disk in the same order as the index.
― Only one clustered index is allowed for each table
― If the indexed attribute chosen is a key of a relation, the index is
called primary index.
Non-Clustered(secondary) index:
An index which maintains a
logical ordering of data rows without altering the physical
ordering of the rows.
― A table could have several secondary indexes
― Foreign keys are good candidates for non clustering indexes.
Define Deonormalization and when it should be used
Denormalization is the process of trying to improve the read
performance of a database, at the expense of losing some write
performance, by adding redundant copies of data or by
grouping data.
― If performance is unsatisfactory and a relation has a low update and a very high query rate.
When attempting controlled redundancy why would we create new tables/views?
Why would we Split tables?
They can hold data derived in columns from other tables - stops it having to be derived freshly each time
Decompose very large relations into smaller and more
manageable pieces.
Transaction: an executing program that forms a logical unit
of database operations
A transaction includes one or more database operations, e.g.,
insertion, deletion, updating, retrieval.
Define ACID in database transactions and what each letter stands for
* Atomicity: All or nothing
e.g shouldn’t take money from A without giving it to B
* Consistency: transforms the database from one
consistent state to another consistent state
e.g money isn't lost or gained
Isolation: partial effects of incomplete transactions
should not be visible to other transactions
e.g. other transactions shouldn’t see such a change until completion.
* Durability: successfully committed transactions are
permanently recorded in the database, not lost, even in
the event of a system failure
e.g. After completion, such a change in A and B is saved in the database.
What is the SQL to start a transaction
START TRANSACTION;
COMMIT;
e.g
START TRANSACTION;
UPDATE Accounts
SET balance = balance - 100
WHERE account = 'A';
UPDATE Accounts
SET balance = balance + 100
WHERE account = 'B’;
COMMIT;
What is a database Schedule and name and define the two types
Schedule: an ordering of operations for concurrent transactions. There are Serial schedules and Non-serial schedules
Serial schedule: a schedule in which the operations for concurrent
transactions are not interleaved.
* But, unacceptable in practice.
Serial schedule: a schedule in which the operations for concurrent
transactions are not interleaved.
* But, unacceptable in practice.
What are two desired properties for a schedule
Serializability and Recoverability
Define a Conflicting operation in a database schedule
Conflicting operations: if two operations belong to different
transactions, access the same database item, and at least one
operation is a writing operation.
Define when a Transaction has a recoverable schedule
A schedule is considered recoverable if, in the presence of failures (such as crashes or errors), it is possible to restore the database to a consistent state.
a schedule, where, for each pair of transactions Ti and Tj, if Tj reads a
data item previously written by Ti, then Ti should commit before Tj.
Define concurrency control
Concurrency Control is the process of managing simultaneous
operations on the database without having them interfere with
one another.
State the two types of concurrency control and what they ential
― Pessimistic: Assumes that conflict is likely and take steps to prevent it.
* Locking, timestamping
― Optimistic: Assumes that conflict is unlikely and only checks for it
when transaction commits
What does a lock do in database transactions?
Locking: When a transaction is accessing the database, a lock may
deny access to other transactions to prevent incorrect results.
― A read operation must acquire a shared/read lock
― An update operation must acquire an exclusive/write lock
What can locks be applied to?
Locks can be applied to:
― The whole database
― A file
― A page/a disk block
― A record
― A field value of a record
In a shared/exclusive locking scheme what must be issued before any read(X)
A transaction must issue read_lock(X), or write_lock(X) before
any read(X).
In a shared/exclusive locking scheme what must be issued before any write(X)
― A transaction must issue write_lock(X) before any write(X)
In a shared/exclusive locking scheme what must be issued after read(X) or write(X)
― A transaction must issue unlock(X) after read(X) or write(X)
Does a shared/exclusive locking scheme guarantee stability?
Using shared/exclusive locking scheme does not
guarantee serializability. More strict protocols are
required, e.g. two-phase locking (2PL).
Define two-phase locking and each of the two phases in it
Two-Phase Locking (2PL): All locking operations precede the first
unlock operation in a transaction.
― Growing/Expanding phase: during which new locks on items can be acquired but none can be released;
― Shrinking phase: during which existing locks can be released but no new locks can be acquired.
* Advantage: it assures serializability
What problem can two phase locking cause
It can cause a deadlock which is a circular situation where each of two (or more) transactions are waiting to acquire locks that are held by the other.
How can deadlocks be prevented by using timeouts?
A transaction will wait for a (database defined) period
to acquire a lock.
* If this time runs out then the whole transaction is
rolled back and restarted.
How can deadlocks be prevented by the use of a wait-for-graph(WFG)
* A Wait-for Graph (WFG) is constructed
― node for each transaction
― directed edge from transaction T1 to transaction T2 if T1 is waiting to lock an item currently held by T2
* A deadlock exists if the graph contains a cycle.
What needs to happen when a deadlock is detected? One issue that can occur in deadlock prevention is starvation, define starvation?
* Once deadlock is detected, DBMS needs abort one or more
of the transactions.
-The transaction with the minimum cost should be rolled back
* Starvation: the same transaction is always chosen as a victim,
and the transaction can never complete.
What are the two deadlock prevention algorithms and what do they do?
* Wait-die algorithm
― only older transactions can wait for younger ones
* Wound-wait algorithm
― only younger transactions can wait for older ones
Define a timestamp in transactions and explain timestamping
Timestamp: A unique identifier created by the DBMS that
indicates the relative starting time of a transaction.
― Either a logical counter or the system clock.
* Timestamping: a concurrency control protocol that orders
transactions in such a way that older transactions get priority in
the event of conflict
Explain optimistic techniques
Optimistic techniques assume that conflict is rare.
― Before a transaction commits, a check is performed to determine
whether conflict has occurred.
― If yes, roll back and restart.
Define a backup and the three different kinds
A backup is a copy of the database taken periodically and stored in a safe place which allows the database to be restored with an acceptable loss of data
* Full backup
― The whole database.
* Differential backup
― All changes made since the last full backup
* Incremental
― All changes made since the last incremental backup`
Define database recovery and the two main techniques used for it
Recovery: The process of restoring the database to a correct
state in the event of a failure.
Two main recovery techniques
― Log-based
― Shadow paging
Explain a log and how it can be used as recovery
Log: a sequence of records, which maintains the records of
actions performed by transactions, recovery can be performed by reversing these operations
Explain check-pointing in a log file and the operations involved
Checkpoints are scheduled at predetermined intervals and
involve the following operations:
― Write all log records in main memory to secondary storage
― Write the modified blocks in the buffers to secondary storage.
― Write a checkpoint record to the log file. This record contains
transactions which are active at the time of the checkpo
When a crash occurs in a database how are logs and Checkpoints used?
When a failure occurs,
― The system check the log file backwards from the end to the last
checkpoint.
― REDO all the transactions that committed since the last checkpoint.
― UNDO all the transactions that were active at the time of the crash.
What are the two types of database updating schemes and what do they entail?
Two types of DB updating schemes:
― Deferred update: Updates are not written to the database until after a transaction has reached its commit point.
― Immediate update: Updates are applied to the database as they occur without waiting to reach the commit point.
Maintain two tables during the lifetime of a transaction:
― A current page table
― A shadow page table
During a transaction the current page is copied into a shadow page table and the shadow page table is never modified during transaction execution. If the transaction fails the current page table is discarded and the shadow table is used instead
What are advantages and disadvantages of shadow paging?
Advantage:
― The overhead of maintaining the log file is eliminated.
― The recovery is significantly faster since there is no need for
UNDO or REDO operations.
Disadvantage:
― Data fragmentation
― Need for periodic garbage collection to reclaim inaccessible
blocks
What are some things that can cause security issues in databases
Caused by
– Unauthorized users
– Authorized users
– Programmers
- DBA
What can bad security cause a loss of (CIA)
― Confidentiality: the protection of data from unauthorized disclosure
― Integrity: the requirement that information is protected from improper
modification.
― Availability
Explain Discretionary Access Control (DAC)
― Discretionary Access Control (DAC)
* The owner of the object specifies which subjects can access the object.
* Used by most commercial DBMSs
* Supported by SQL: GRANT and REVOKE
Explain Mandatory Access control
Mandatory Access Control (MAC)
* The system (not the users) specifies which subjects can access specific data objects.
* Not supported by SQL
How would i use SQL to grant rfc@localhost the INSERT privilege on classicmodels database?
GRANT INSERT
ON classicmodels.*
TO rfc@localhost;
Note: the opposite of GRANT is REVOKE
How does the Bell-LaPadula model work?
A subject S is only allowed to read an object O if it has a higher level of clearance than the object
― Class(O): the classification of an object O (relation, tuple, column, view, operation)
― Class(S): the clearance of a subject S (user, account, program)
What are the 4 security Levels in the Bell-LaPadula model?
Top Secret (TS) > Secret (S) > Confidential (C) > Unclassified (U)
How can views be used for security?
The view mechanism provides a powerful and flexible security
mechanism by hiding parts of the database from certain users.
What are the 3 different targets of encryption in a database?
― Disk
― Database
― Network traffic
How can an audit be used in database security?
Rather than preventing certain actions, record who did what.
― Keep track of all operations from each user through each login
session.
― The audit trail (or system log) itself must be secured
What is a Redundant Array of Independent Disks(RAID) and what is it used to do?
A data storage technique that combines multiple physical disk drive
components into one logical unit
*improves reliability
* increases performance
* improves availability
What is Data striping is a technique used in a RAID, what does it do?
Data striping: the data is segmented into equal-size partitions (the
striping unit) which are transparently distribute across multiple
disks.
― Improves overall I/O performance by allowing multiple I/Os to be serviced in parallel.
What is a Parity (or error correcting) scheme ?
Each byte may have a parity bit associated with it that records whether the number of 1s is even or odd.
― Even parity: number of 1’s including parity bit is even
* E.g., 0110110 → 01101100, 0101100 → 01011001
― Odd parity: number of 1’s including parity bit is odd
* E.g., 0110110 → 01101101, 0101100 → 01011000
― Improves reliability
What are the 4 different RAID schemes?
― RAID 0 : striping
― RAID 1: mirroring
― RAID 10
― RAID 5
What is a pro and a con for RAID 0: striping
― Pros: improved I/O performance
― Cons: complete vulnerability to drive failures
What is a pro and a con for RAID 1: mirroring
― Pros: Fast read, availability improvement
― Cons: Write throughput is always slower because every drive
must be updated. Most expensive storage solution.
What is a pro and a con for RAID 10: striping and mirroring
― Pros: improved I/O performance, rebuilding time is fast if one drive fails.
― Cons: Still expensive because of mirroring.
What is a pro and a con for RAID 5: striping and parity
― The parity data is also striped across all the disk.
― Pros: Resilience to a single drive failure, fast reading.
― Cons: Writing is slower (due to the parity needed be calculated), complex technology.
What is a distributed database?
Distributed database: a collection of multiple interconnected
databases, which are
― spread physically across various locations
― communicate via a computer network.
What are two desired characteristics for distributed databases?
1. Distribution/location transparency
― Users perceive the database as a single, logical entity.
2. Local autonomy
― Local data is locally owned and managed.
― Each local database operated independently.
― Local systems are able to operate effectively even if remote databases
are not available.
What are two types of distributed Database?
Homogeneous
― All sites use the same type of DBMS.
Heterogeneous
― Sites run different DBMSs.
― Translation using gateways: convert the language and model of each different DBMS into the language and model of the relational system.
What is Data fragmentation in databases?
Fragmentation is to divide a table into a set of smaller tables to be
stored in different sites. The subsets of the table are
called fragments.
― Horizontal
* Subsets of rows are stored at different sites.
― Vertical
* Subsets of attributes are stored at different sites.
― Hybrid (Mixed)
What is the correctness rule in Data Fragmentation
Correctness rules:
― Completeness: Each data item can be found in R must appear in at least one fragment.
― Reconstruction
* Horizontal: 𝑅 = 𝑅1 ∪ 𝑅2 … ∪ 𝑅𝑛
* Vertical: 𝑅 = 𝑅1 ⋈ 𝑅2 … ⋈ 𝑅𝑛
― Disjointness
* Horizontal: The fragments are disjoint.
* Vertical: The fragments are disjoint except for the primary key
Data allocation: The process of assigning each fragment to a
particular site in the distributed system.
What are three types of Data replication and allocation?
* Fully replication
― A complete copy of the DB at each site
* Non-redundant allocation
― Each fragment is stored at exactly one site
* Partial replication
― Certain transactions that access particular parts of the DB are mostly submitted at a particular site, that fragment can be allocated at that site only.
― Data that is accessed at multiple sites can be replicated at those sites. If many updates are performed, it may limit replication.
What are some advantages of a distributed database?
* Reflects organizational structure
* Easier expansion
* Increased reliability and availability
* Better response/Improved performance
What are some disadvantages of a distributed database?
* Complex to design and implementation
* Harder to control security and integrity
* Lack of standard and experience
Define a data warehouse and some properties
Data warehouse is a database, which
– is kept separate from the organization's operational database,
– no frequent updating
– possesses consolidated historical data
– helps the organization to analyse its business
What are some key features on a data warehouse
― Subject Oriented − e.g., a sales data warehouse, or a credit card
transaction data warehouse, etc.
― Integrated − data combined from multiple sources.
― Time Variant − looking for changes over time
― Non-volatile − the previous data is not erased when new data is added to it. Frequent changes in operational database is not reflected in the data warehouse.
Online Analytical Processing (OLAP):
― Uses multi-dimensional view of aggregate data to provide quick access to strategic information