databases Flashcards

(129 cards)

1
Q

top down database design

A

create ER diagrams (graphical description): start with a set of system requirements and identify entities and attributes then construct relational data model i.e. tables for entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

bottom up database design

A

start with initial tables for entities and their attributes –> redesign in a “better” way: trickier for larger databases so need a formalization of process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

purpose of normalization

A
  • relations represent real world entities
  • single valued columns
  • avoid redundancy
  • easier to update and maintain correctly without anomalies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

no redundancy

A

each data item is stored only once: -minimises space required, - simplifies maintenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

consequence of redundancy

A
  • takes up unnecessary space
  • when modifying data, have to modify in different places
  • risk of inconsistencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

exception for redundancy

A

foreign keys: act as pointers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what causes redundancy

A
  • set valued attributes –> multiple rows in corresponding table. e.g. more than one attribute for a particular entry in the table
  • dependency e.g. postcode and town are dependent on each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define redundancy

A

repeating data in multiple different locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

modification anomaly

A

failure to maintain all existing instances of a specific value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

deletion anomaly

A

losing other values as a side effect of deleting data. e.g. deleting staff member and deleting info of their workplace simultaneously if unnormalised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

insertion anomaly

A

when adding more data items, much more irrelevant data needs to be added. adding rows forces us to add info about other entities –> can lead to inconsistency
e.g. adding details of new surgery, with no staff, add null to staff fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

overcoming redundancy

A

schema refinement (decomposition): use 2+ relations to store the original data. can be done manually for smaller tables but formalisation needed for larger DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

functional dependencies specify…

A
  • specify which are candidate, primary and foreign keys

- specify which attributes to combine in the new tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

simple key

A

key consists of only one attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

composite key

A

key consists of several attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

candidate key

A

minimal set of attributes whose values uniquely identify tuples.
functionally determines all attributes in a relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

primary key

A

the candidate key selected to identify rows uniquely within a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is functional dependency

A

describes relationship between two attributes in the same relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

if B is functionally dependent on A

A

Represent as A -> B
If we know attribute values for A, then we know the unique attribute values of B.
Each value of A is associated with exactly one value of B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

determinant

A

set of attributes on the left hand side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

dependent

A

set of attributes on the right hand side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

full functional dependency

A

B is not functionally dependent on any proper subset of A, i.e. B is functionally dependent on ALL the primary key if composite key, not just individual parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

partial functional dependency

A

B remains functionally dependent on at least one proper subset of A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

transitive functional dependency

A

if A->B and B->C functional dependencies exist, then the functional dependency A -> C also exists = transitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
how to determine functional dependencies
either -obvious/common sense | or -require discussion/specification from customers
26
closure, F+ (+ is raised), of a set of functional dependencies, F
set of all functional dependencies implied by dependencies in F. inference rules required to compute this
27
Armstrong's axioms (complete and sound)
1. reflexivity (if B is a subset of A, then A -> B) 2. augmentation (A -> B then A, C -> B, C) 3. transitivity (if A ->B and B-> C then A->C)
28
what does it mean by inference rules being complete
given a set X functional dependencies, all dependencies implied by X can be derived from X using these rules
29
what does it mean by inference rules being sound
no additional functional dependencies (not implied by X) can be derived from these rules
30
further rules derived from armstrong's axioms
- decomposition: if A -> B, C then A -> B and A -> C - union: if A -> B and A -> C then A -> B, C - composition: if A -> B and C -> D then A, C -> B, D
31
proof of union rule using armstrong's axioms
A-> B: augmentation with C: A, C -> B, C A -> C: augmentation with A: A, A -> A, C ( = A -> A,C) transitivity from above 2 dependencies: A -> B, C
32
pseudocode for computing the closure F+
F+ = F (initialisation) repeat until F+ doesn't change any more: apply rules of reflexivity and augmentation to each functional dependency, f in set F+, and add these new func. dep. to F+. for each f1, f2: if imply a func dep f3 using transitivity then add to F+
33
F = set of transitive dependencies A = set of attributes in a relation what is A+
set of all attributes that can be implied by the attributes of A using functional dependencies from F
34
how to compute A+
similarly to F+, start with functional dependencies of F, with attributes A on the LHS and repeatedly apply Armstrongs axioms
35
candidate key in terms of relational dependencies
minimal set of attributes such that A+ (under F+) includes/ which functionally determine all attributes in a relation
36
1NF
- no repeating groups i.e. an attribute that occurs with multiple values for a single occurrence of the primary key - no identical rows
37
why not repeat columns horizontally to get into 1NF
- waste of space, not all column spaces used - harder to query and reference specific columns - need a fixed upper limit on number of repetitions
38
why not one long string with all attributes for 1NF
-harder to query
39
1NF: one table solution
-repeat data for each attribute so each belongs to one entry, but redundant data (i.e. fill in blanks/flatten the table)
40
1NF: two table solution
-place repeating data in a separate relation, with original primary key as a foreign key. iterate until no repeated groups remain
41
2NF
- 1NF | - no partial functional dependencies: every non-key attribute is dependent on the whole of the composite primary key
42
How to place in 2NF
-remove partially dependent attributes and place in separate relation with the copy of their determinant
43
3NF
- 2NF - no transitive functional dependencies: no non-key attribute is transitively dependent on the primary key: all attributes are dependent on the key, the whole key and nothing but the key
44
How to place in 3NF
-remove transitively dependent attributes and place in new relation. take the attributes of their determinant as the primary key in the new table
45
how to find primary key
- intuitive observations from data | - computing functional dependency closure
46
how to depict functional dependencies
-dependency diagram
47
Limitations of file based approach
- Data duplication in different programs: different values/formats/waste of space - Incompatibility: programs written in different languages- difficult to access others' files - Data dependencies: structure of the file is defined in the code of the program - Separation and isolation of data: programs maintain their own data and users may be unaware of useful data in other programs - Limited queries: each program has a defined purpose, for different queries, more application programs have to be designed - Hard to recover when crashes occur - Inefficient searching large data sets - Many users: controlling simultaneous access = locking+ can't hide sensitive data = inefficient - Not synchronised: inconsistencies+lost updates
48
reason for limitations of file based approach
- definition of data is embedded in the application program (instead of stored separately and independently) - no central control over the access/manipulation of data
49
database
a large collection of logically related data, designed to meet the needs of an organisation: - single repository of data - minimum duplication - large databases have a data dictionary (metadata repository i.e. data describing the data)
50
DBMS
a software system that enables the user to define/create/maintain/control access to the DB
51
basic features of DBMS
- DDL (data definition language): defines the database- specify the structure, data types + constraints - DML (data manipulation language): allows users to insert/delete/update/retrieve data from the database. Query language is the part of this that retrieves data. unlimited queries (adv.) Efficiency: good DBMS can answer SQL queries quickly
52
DBMS allows controlled access to the DB
- security system - concurrency control - recovery control
53
DB application program
computer program that interacts with the user + DBMS. Sends SQL statements to the DBMS + can be conventional (local) or online applications
54
Components of DBMS environment
- hardware: PC or computer network - software: DBMS, application program, OS (network software if necessary) - data: used by the organisation. schema = abstract description of the data - procedures: documented instructions on how to use/run the system e.g. how to log on/use an application/make backups - person: anyone involved in the system
55
people in a DBMS environment
- DB designers - DB admins: maintenance of DBMS+OS, security+integrity control, satisfactory performance of user applications - application developers - end user
56
DB designers
- Logical designer: what to store- entities, attributes, relationships+constraints - Physical designer: maps logical designs to tables+integrity constraints, selects access methods+storage structures
57
advantages of databases
- shared data - concurrency control - better data backup/recovery procedures - consistency - less redundancy - economy of scale - easier data access - data security+integrity - faster development of new applications
58
disadvantages of databases
- more hardware - complexity/size of system - high cost of implementation of DBMS - slow processing of some applications - high impact of failure
59
need to trust a DBMS so have mechanisms to ensure the database is
-reliable -always in a consistent state especially when hardware/software failures and multiple users access the database simultaneously
60
database recovery
process of restoring database to a correct state after a failure
61
concurrency control protocols
prevent database accesses interfering with each other
62
transaction
an action carried out by a single person/program which reads/updates the database
63
during a transaction vs after a transaction
database may not be in a consistent state vs database in a consistent state + valid integrity/referential constraints
64
transaction outcomes
committed- completes successfully rollback- not completed successfully (performed entirely or not at all)
65
concurrency control
process of managing simultaneous operations on the DB without having them interfere with each other operations may be correct individually but cause inconsistency when executed simultaneously
66
how is DBMS different from multi user OS
-an OS allows two people to edit a document at the same time but if both try to write simultaneously -> ones changes get lost
67
issue with DDL
too low level to describe data organisation in a simple way for users: so have data model
68
data model
a collection of intuitive concepts describing entities, relationships + constraints
69
three characterisations of data
- structured - semi structured - unstructured
70
-structured data
data is represented in a strict format (relational data model) DBMS checks the data follows: the structures and integrity/referential constraints specified in the schema
71
-semi structured data
self describing data, schema info is mixed in with the data values ad hoc collected data, when not known how it will be stored/managed in advance may have some structure but not all data has the same structure, each data object has different attributes not known in advance
72
-unstructured data
very little indication of the type/structure of data e. g. text document with some info in it e. g. web doc with some HTML in it
73
relational data model
``` relations = tables attributes = columns tuples = rows ```
74
issue with relational data model
too low level for big companies so express in a non-technical way: E-R model
75
E-R model
top down approach to DB design, graphical description represents entities, attributes and relationships and constraints on these uses UML notation + crows foot notation
76
3-Level ANSI-SPARC architecture
external level: user's view of data conceptual level: logical structure of data as seen by DB administrator internal level: physical representation of data: algorithms, data structures
77
DB schema
abstract description of the collection of data in a DB: | defines schema, domain for each attribute and specifies integrity constraints
78
DB instance
data at a particular moment
79
objectives of DB schema
all users have access at every point to the same DB instance w/ customised views of parts of data
80
data independence
upper level of DB schema unaffected by changes to lower levels e.g. physical data independence: if internal schema changed- conceptual level unchanged, and users only notice a change in performance
81
three main phases of database design
conceptual design logical design physical design
82
conceptual design
acts as fundamental understanding of DB high level graphical representation via ER diagrams based off user requirements specification (completely independent of physical considerations)
83
logical design
create relational data model using conceptual design then normalise to eliminate redundancy/anomalies
84
physical design
describe database implementation of logical design: - specific storage structures/access methods/security protection - aim is optimum performance
85
classification of design phases into 3-level ANSI-SPARC
external schema + conceptual schema =logical/conceptual database design internal schema+physical storage =physical DB design
86
relational database
collection of normalised relations
87
a relation is 'normalised' if
it is appropriately structured i.e. one value per cell and no repeating rows
88
domain
set of allowable values for an attribute
89
degree of a relation
number of attributes
90
cardinality of a relation
number of tuples
91
tuple
row of a relation with concrete values for each of the attributes
92
attribute
named column of a relation: each with a unique name- properties/common characteristics shared by all entity instances of an entity type
93
relation
table with rows and columns
94
how does this differ from mathematical relations
ordering of attributes doesn't matter
95
foreign key
attribute in one table A whose values must match the primary key of another table B (or be null) then A references B
96
entity integrity
every attribute of a primary key can't be null: | ensures every entity has a unique identifier and foreign key values can reference primary key values
97
referential integrity
ensures a foreign key matches the primary key from another relation or are null: ensures references between tables are valid and prevents deleting a row in a table if there is a foreign key in another relation referencing it
98
types of relations
base relations: physically stored in the database view (virtual relation): not physically stored in the database but derived from content of base relations. used to show customised info to every user and calculate dynamic quantities. computed upon request from user and changes when base relations change.
99
alternatives to relational data model
- network data model: tuples modelled as nodes and relationships (foreign keys) as edges - hierarchal data model: type of NDM but graph is a tree graph and structure models 'parent-child' relationship (limitations: deleting parent or adding record without a parent)
100
objective of ER model
ER model (graphical representation of DB): understand nature and relationships among data help derive tables in RDM
101
entity
data objects: real thing (or abstract notion) that we recognise as a separate concern within the database
102
entity type
group of all objects with the same properties, identified as having an independent existence
103
entity occurence
a uniquely identifiable instance of an entity type
104
relationship
named association between two entity types which has some meaning in the context of the database
105
cardinality of a relationship
the number of entity occurrences that are related to a single occurrence of an associated entity type through this relationship
106
entity participates 'optionally' (indicated by O on other side of relationship line)
it has partial participation otherwise has total participation (indicated by line on other side of relationship line)
107
simple attribute
consists of a single component- can't be broken down further
108
composite attribute
multiple components: can be subdivided into smaller components
109
single valued attribute multi valued attribute derived attribute
- only holds one value - it can hold many values - it can be derived from the value of a related attribute
110
literals {a}, [a], (...) |
constants used in SQL statements {a} = required element [a] = optional element ... = optional repetition | = choice
111
SQL is 'free format'
free format = parts of statements don't need to be written on specific parts of the screen
112
types of query languages
SQL: formal definition of a new relation (which data is to be retrieved) from existing DB relations relational algebra: how to build a new relation from existing DB relations
113
'WHERE' 'SELECT DISTINCT' testing for null
filters rows according to some condition eliminates duplicates WHERE comment IS NULL
114
% vs _
``` % = wildcard, represents an arbitrary sequence of 0 or more characters _ = represents an arbitrary single character ```
115
aggregate function
operates on a single column and returns a single numerical value
116
GROUP BY
partitions data into groups, producing single summary row
117
types of subquery
- single-value/scalar subquery: single column and simple row - multiple-value subquery: single column and multiple rows - table subquery: multiple rows and columns
118
nested queries
- two scalar queries 'WHERE branchNo = (select...) | - scalar + mulivalued subquery 'WHERE branchNo IN (select...)
119
multivalued subquery
-ANY or SOME before subquery so WHERE is true if satisfied by at least one value returned by subquery and ALL if satisfied by all values returned by subquery 'WHERE salary > SOME (select ...)
120
alias for a column
in the FROM clause leave a space and write alias for column
121
inner join vs natural inner join
natural inner join demands equality for columns with the same name in the two tables e.g. a.salary = b.salary but inner join without 'natural' can demand different relations between column values
122
inner join vs outer join
inner join: omits all rows that don't satisfy join conditions outer join: retains (some of) the rows that don't satisfy the join conditions (left outer join: retains rows of the left table that are unmatched with rows from right table. full outer join: retains unmatched rows from both tables (fills necessary fields with NULL)
123
database updates
``` INSERT INTO TableName [(columnList)] VALUES (data ValueList) ---- UPDATE TableName SET columnName1 = dataValue1 [,columnName2= data Value2..] [WHERE searchCondition] ----- DELETE FROM TableName [WHERE searchCondition] ```
124
DDL basic commands
CREATE table: assign name to table and defines names and domains to each column in table ALTER table: amends relation schema/table structure bc of design error or design change DROP table: deletes a table Specify integrity + referential constraints: -PRIMARY KEY -FOREIGN KEY Create view- creates virtual table derived from base relation according to query to provide user with a virtual relation
125
BIT(n) INTEGER SMALLINT NUMERIC(p,d)
BIT(n)- bit string of length n INTEGER- large +ve or -ve integer SMALLINT- small +ve or -ve integer NUMERIC(p,d)- +ve/-ve decimal number with at most p precision (num digits) and d scale (num digits after .)
126
define custom domain types + with constraints
CREATE DOMAIN Name AS VARCHAR(10); CREATE DOMAIN SEX AS CHAR(1) CHECK (VALUE IN ('M', 'F')) or replace bracket ('M', 'F') with a SELECT statement
127
view
relation that depends on other relations and isn't physically stored as a table
128
query language is relationally complete if
it can be used to produce any relation that can be created from relation calculus/algebra expressions
129
teleprocessing architecture
one computer with single CPU: many end terminals cabled to central computer, all processing done in central computer: recieves requests, processes and responds with information downsizing to more cost effective PC network with better/same performance: dec performance bc burden on central computer