WEEK 8 Flashcards

1
Q

Scenario S1

A

Access Routine S1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scenario S2

A

Access Routine S2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scenario S3

A

Access Routine S3

  • Can use hash search
    • Since we are doing equality search
    • On a key field
      a) Hash the search value to find pointer
      to the bucket
      b) May involve search of overflow
      buckets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scenario S4

A
  • In our sample data, all values for age are unique, but in a realistic data set we would not expect this to be so
  • In this case we may be retrieving multiple records
  • S4 Primary index, may be multiple records if condition involves <, <=, =>, > on the search field then we find record satisfying the equality condition (age = 36) and then retrieve all subsequent records in ordered file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scenario S5

A

Access Routine S5
* Clustering index search
* One index entry for each distinct value of clustering field
* Binary search of index file to find first block of data containing matching search record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Scenario S6

A
  • Secondary index (B+ tree) equality and others;
    • multiple records may match
  • Operates in much the same way as S4
  • Use index to find required record(s) in main file
    • This time traverse tree rather than search
      ordered file
  • Retrieval behaviour depends on operator
  • < : Find first occurrence of search value, retrieve all preceding
    records
  • > : Find last occurrence of value, retrieve all subsequent records
  • <= : Find last occurrence of value, retrieve it and all preceding
    records
  • > = : Find first occurrence of value, retrieve it and all subsequent
    records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S6 example - consider leaf nodes

A
  • Remember a B+ tree usually has pointer from each leaf block to next leaf block
    • Once first matching record located, can traverse as a linked list to obtain other records for >= comparison
    • If nodes have back pointer, can traverse as doubly linked-list for <= comparison
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Conjunctive Selection Conditions

A
  • Boolean expressions involving AND operator
  • General form
    • (simple condition) AND (simple condition) AND…
  • S7: If we can use one of methods S2-S6 for an attribute in one of the simple conditions; do so; examine the records thus retrieved to see if they meet other conditions

-S8: Composite index: if two or more attributes involved in equality conditions also appear in a composite index, use the index directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Scenario S9

A

Access Routine S9

  • Intersection of record pointers: If secondary indices available on all fields involved in equality conditions and indices include record pointers (i.e. inverted files): retrieve set of record pointers from each index involved and take intersection as final result set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Optimisation

A
  • Query optimisation for select needed for conjunctive conditions when different simple conditions offer an access path
  • Access path that results in the fewest record retrieval operations should be chosen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Disjunctive conditions (OR)

A
  • Little can be done to optimise
  • If any one of the simple conditions involve an attribute with no access path, we must use serial access (brute force)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Join Operations

A
  1. One of the most time consuming operations in query processing
  2. M*N operation
    a. Cartesian product followed by a select
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Methods for implementing Joins

A

J1: Nested-loop join
- Default (brute force) algorithm
- Used when no access paths available on
either file in the join
- For each record t in R (outer loop)
- Retrieve every record s from S (inner loop)
- Test whether the two records satisfy the join condition
- t[A] = s[B]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain J2: Single loop join

A

Using an access structure to retrieve the matching records
a) If index (or hash key) exists for one of the two join attributes e.g. B or file S
- Retrieve each record of t in R (loop over file R)
- Use access structure for B to retrieve all matching records from S that satisfy s[B] = t[A]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain J3: Sort-merge join

A
  1. Possible if records of R and S are physically ordered by the value of the join attribute A, B
  2. Can implement join in most efficient way possible
  3. Both files are scanned concurrently in order of the join attributes
    a. Match up records that have same values
    for A and B
  4. Pairs of file blocks copied into memory buffers in order and records of each file only scanned once
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain J4: Hash-join

A
  1. Records (or record pointers) of R and S are hashed to same hash file using same hash function on attributes R[A] AND S[B] as hash keys
  2. Single pass through each file fills the hash buckets
  3. Each bucket then examined from records R and S that match to produce the join result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PROJECT Operations

A

If <attribute> include a key of R, then result has same number of tuples as R but only the attributes given in <attribute></attribute></attribute>

  • If not, we must detect duplicate tuples and eliminate them
  • Two possible elimination techniques are:
    • Sort the result and then eliminated duplicates (which will now be duplicate tuples)
    • Hash each projected tuple,check against those already in the bucket and discard if a double
  • Remember in SQL, we use the DISTINCT keyword to force the elimination of project duplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Composite index

A
  • Composite (or concatenated) index is one based on more than one attribute. It can be clustering or otherwise
  • E.g. Person (NI_No, Surname, Forename, DOB…)
  • We can specify a composite index on surname, forename
  • Advantages over single attribute indexes:
    a. Consider the query “how many people are there with last name ‘Smith’ and first name ‘John’?
    b. A query on all attributes of a compound query will likely return far fewer records than a query on only some of those attributes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Query processing

A
  • What happens when we execute a query on a database?
  • Like all languages, high level statements must be interpreted (or compiled) prior to execution
  • Runtime database processor performs sequence of operations before returning query result
20
Q

Query processing
Query process operations

21
Q

Scanning, parsing and validation

A
  • Scanner
  • Identifies language tokens such as SQL keywords, attribute names, relations etc in text of query
  • Parser
  • Checks query syntax to determine if it is formulated according to the grammar rules of
    the query language
  • Validation
  • Check all attribute and relation names are valid and semantically meaningful names in
    the database schema
  • Internal representation of query
  • Usually a query tree, may be a query graph
22
Q

Query processing
Query Optimiser, Code Generator, Runtime database processor

A
  • Query optimiser
    • Responsible for devising an execution plan for retrieving the result of the query from the DB
    • Query can have several possible execution plans, optimiser chooses a suitable one
  • Code generator
    • Generates the code to execute the plan produced by the optimiser
  • Runtime database processor
    • Runs query code to produce the query result
    • Code may be executed in compiled or interpreted mode
    • Error message generated if runtime error occurs
23
Q

Query optimisation
Process of selecting a suitable execution plan for the query, Three Main Techniques

A
  • Process of selecting a suitable execution plan for the query
    • May or may not be ‘optimal’ plan
    • Seeks to provide a reasonably efficient strategy
    • Finding optimal strategy is usually too costly except for simplest of queries
  • Three main techniques
    • Cost based
    • Semantics based
    • Heuristic based
    • May be used in conjunction with each other
24
Q

Cost based optimisation

A
  • Search solution space to find the solution that minimises cost
    • Compare costs of executing a query using different execution strategies and choose the lowest cost estimate
    • Cost functions are estimates, so a solution that is not the optimal may be chosen
25
Cost based optimisation * What is cost and how do we calculate it?
Cost: The amount of time to complete an operation
26
Cost based optimisation Cost components
* Access cost to secondary storage * Storage cost * Computation cost * Memory usage cost * Communication cost
27
Cost components Access cost to secondary storage
* Cost of searching for, reading and writing data blocks residing on secondary storage * Depends on type of access structures on the datafile * Also on whether block allocation is contiguous (more relevant for HDD than SSD)
28
Cost components Storage cost
Cost of storing intermediate files generated by an execution strategy
29
Cost components Computation cost
* Cost of performing in memory operations on data buffers during execution * Include operations such as searching for and sorting records, merging records for JOIN and performing computations on field values
30
Cost components Memory cost
Cost pertaining to number of memory buffers needed during execution
31
Cost components Communication cost
* Cost of shipping query and it’s results between originating client and database (and any inter-database communication)
32
Cost minimisation aims for differing scenarios
* Large databases * Minimise access cost to secondary storage * Minimise no of block transfers between disk and main memory : reduce volume of data moved * Smaller databases * Minimise computation cost * Most of data involved in query can be stored completely in main memory * Distributed databases * Minimise communication cost
33
Semantic query optimisation
* Use constraints specified on the schema to inform query execution * E.g. unique attributes, nullable fields * Consider: * SELECT * FROM Customers WHERE CName IS NULL * If Cname is declared as NOT NULL then this query will never return any results * Semantic optimisation says don’t run the query at all! * As well as schema constraints, custom constraints can be implemented
34
Heuristic based optimisation Query process operations
Intermediate query form is usually a ‘query tree’ whose leaves are the relations involved in the query and whose nodes are the RA operators
35
Query tree example - sample data
36
Canonical query tree
37
Query tree notation
* Leaf nodes: input relations of query * Internal nodes: RA operations * Execution of query tree * Execute an internal node operation whenever it’s operands are available * Replace internal node by resulting relation * Execution terminates when route node executed and query result is obtained
38
Relational operators as a query mechanism
39
Multi relational queries * Data we require may not be present in any single existing relation. * The relational operators allow us to transform and combine existing relations in order to produce a single relation which holds the answer to our question * "What are the names of the crew members on board the C-37D?”
* "What are the names of the crew members on board the C37D?” * This involves the Pnames and Snames fields; we do not currently have a single relation holding these fields. In order to answer this query, we must combine all three relations in the database portion. * Personnel -> PID, Pname, Rank, Bplace * Ship -> SID, Sname, Class, Crew
40
Optimisation of high level query language expressions
* Low level navigational languages (hierarchic and network)  hand optimised by the programmer. * High level relational languages  Declarative (what rather than how) requires optimisation
41
Heuristic optimisation of query trees
* Main (most obvious?) heuristic is to apply select and project before joins * Select and project usually reduce the size of a relation, they never increase it * As join is M * N operation where M,N are size of relations involved, decreasing M and/or N is an optimisation
42
Heuristic Optimisation of query trees Example
43
Intermediate step
* Second two splits are more complex * They refer to two relations each * Can’t apply these directly * Have to first combine the two relations they refer to
44
End of step 1
45
Step2
Final steps * Finally we can replace any cartesian product operations with JOIN operations
46
General heuristic rules
* Select and Project operations moved down the tree * Most restrictive joins and selects also moved down * Project optimisations are also applied (not shown)
47
Heuristic query optimisation Summary
* Previous example shows that a query tree can be transformed step by step into another query tree that is more efficient to execute * Must make sure that transformation steps always lead to an equivalent query tree * Query optimiser needs equivalence preserving transformation rules * See Elmasri and Navathe for list of a number of transformation rules and a simple optimisation algorithm employing those rule