WEEK 8 Flashcards

1
Q

Scenario S1

A

Access Routine S1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scenario S2

A

Access Routine S2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scenario S3

A

Access Routine S3

  • Can use hash search
    • Since we are doing equality search
    • On a key field
      a) Hash the search value to find pointer
      to the bucket
      b) May involve search of overflow
      buckets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scenario S4

A
  • In our sample data, all values for age are unique, but in a realistic data set we would not expect this to be so
  • In this case we may be retrieving multiple records
  • S4 Primary index, may be multiple records if condition involves <, <=, =>, > on the search field then we find record satisfying the equality condition (age = 36) and then retrieve all subsequent records in ordered file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scenario S5

A

Access Routine S5
* Clustering index search
* One index entry for each distinct value of clustering field
* Binary search of index file to find first block of data containing matching search record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Scenario S6

A
  • Secondary index (B+ tree) equality and others;
    • multiple records may match
  • Operates in much the same way as S4
  • Use index to find required record(s) in main file
    • This time traverse tree rather than search
      ordered file
  • Retrieval behaviour depends on operator
  • < : Find first occurrence of search value, retrieve all preceding
    records
  • > : Find last occurrence of value, retrieve all subsequent records
  • <= : Find last occurrence of value, retrieve it and all preceding
    records
  • > = : Find first occurrence of value, retrieve it and all subsequent
    records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S6 example - consider leaf nodes

A
  • Remember a B+ tree usually has pointer from each leaf block to next leaf block
    • Once first matching record located, can traverse as a linked list to obtain other records for >= comparison
    • If nodes have back pointer, can traverse as doubly linked-list for <= comparison
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Conjunctive Selection Conditions

A
  • Boolean expressions involving AND operator
  • General form
    • (simple condition) AND (simple condition) AND…
  • S7: If we can use one of methods S2-S6 for an attribute in one of the simple conditions; do so; examine the records thus retrieved to see if they meet other conditions

-S8: Composite index: if two or more attributes involved in equality conditions also appear in a composite index, use the index directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Scenario S9

A

Access Routine S9

  • Intersection of record pointers: If secondary indices available on all fields involved in equality conditions and indices include record pointers (i.e. inverted files): retrieve set of record pointers from each index involved and take intersection as final result set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Optimisation

A
  • Query optimisation for select needed for conjunctive conditions when different simple conditions offer an access path
  • Access path that results in the fewest record retrieval operations should be chosen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Disjunctive conditions (OR)

A
  • Little can be done to optimise
  • If any one of the simple conditions involve an attribute with no access path, we must use serial access (brute force)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Join Operations

A
  1. One of the most time consuming operations in query processing
  2. M*N operation
    a. Cartesian product followed by a select
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Methods for implementing Joins

A

J1: Nested-loop join
- Default (brute force) algorithm
- Used when no access paths available on
either file in the join
- For each record t in R (outer loop)
- Retrieve every record s from S (inner loop)
- Test whether the two records satisfy the join condition
- t[A] = s[B]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain J2: Single loop join

A

Using an access structure to retrieve the matching records
a) If index (or hash key) exists for one of the two join attributes e.g. B or file S
- Retrieve each record of t in R (loop over file R)
- Use access structure for B to retrieve all matching records from S that satisfy s[B] = t[A]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain J3: Sort-merge join

A
  1. Possible if records of R and S are physically ordered by the value of the join attribute A, B
  2. Can implement join in most efficient way possible
  3. Both files are scanned concurrently in order of the join attributes
    a. Match up records that have same values
    for A and B
  4. Pairs of file blocks copied into memory buffers in order and records of each file only scanned once
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain J4: Hash-join

A
  1. Records (or record pointers) of R and S are hashed to same hash file using same hash function on attributes R[A] AND S[B] as hash keys
  2. Single pass through each file fills the hash buckets
  3. Each bucket then examined from records R and S that match to produce the join result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PROJECT Operations

A

If <attribute> include a key of R, then result has same number of tuples as R but only the attributes given in <attribute></attribute></attribute>

  • If not, we must detect duplicate tuples and eliminate them
  • Two possible elimination techniques are:
    • Sort the result and then eliminated duplicates (which will now be duplicate tuples)
    • Hash each projected tuple,check against those already in the bucket and discard if a double
  • Remember in SQL, we use the DISTINCT keyword to force the elimination of project duplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Composite index

A
  • Composite (or concatenated) index is one based on more than one attribute. It can be clustering or otherwise
  • E.g. Person (NI_No, Surname, Forename, DOB…)
  • We can specify a composite index on surname, forename
  • Advantages over single attribute indexes:
    a. Consider the query “how many people are there with last name ‘Smith’ and first name ‘John’?
    b. A query on all attributes of a compound query will likely return far fewer records than a query on only some of those attributes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Query processing

A
  • What happens when we execute a query on a database?
  • Like all languages, high level statements must be interpreted (or compiled) prior to execution
  • Runtime database processor performs sequence of operations before returning query result
20
Q

Query processing
Query process operations

A
21
Q

Scanning, parsing and validation

A
  • Scanner
  • Identifies language tokens such as SQL keywords, attribute names, relations etc in text of query
  • Parser
  • Checks query syntax to determine if it is formulated according to the grammar rules of
    the query language
  • Validation
  • Check all attribute and relation names are valid and semantically meaningful names in
    the database schema
  • Internal representation of query
  • Usually a query tree, may be a query graph
22
Q

Query processing
Query Optimiser, Code Generator, Runtime database processor

A
  • Query optimiser
    • Responsible for devising an execution plan for retrieving the result of the query from the DB
    • Query can have several possible execution plans, optimiser chooses a suitable one
  • Code generator
    • Generates the code to execute the plan produced by the optimiser
  • Runtime database processor
    • Runs query code to produce the query result
    • Code may be executed in compiled or interpreted mode
    • Error message generated if runtime error occurs
23
Q

Query optimisation
Process of selecting a suitable execution plan for the query, Three Main Techniques

A
  • Process of selecting a suitable execution plan for the query
    • May or may not be ‘optimal’ plan
    • Seeks to provide a reasonably efficient strategy
    • Finding optimal strategy is usually too costly except for simplest of queries
  • Three main techniques
    • Cost based
    • Semantics based
    • Heuristic based
    • May be used in conjunction with each other
24
Q

Cost based optimisation

A
  • Search solution space to find the solution that minimises cost
    • Compare costs of executing a query using different execution strategies and choose the lowest cost estimate
    • Cost functions are estimates, so a solution that is not the optimal may be chosen
25
Q

Cost based optimisation
* What is cost and how do we calculate it?

A

Cost: The amount of time to complete an operation

26
Q

Cost based optimisation
Cost components

A
  • Access cost to secondary storage
  • Storage cost
  • Computation cost
  • Memory usage cost
  • Communication cost
27
Q

Cost components
Access cost to secondary storage

A
  • Cost of searching for, reading and writing data blocks residing on
    secondary storage
  • Depends on type of access structures on the datafile
  • Also on whether block allocation is contiguous (more relevant for HDD than SSD)
28
Q

Cost components
Storage cost

A

Cost of storing intermediate files generated by an execution strategy

29
Q

Cost components
Computation cost

A
  • Cost of performing in memory operations on data buffers during execution
  • Include operations such as searching for and sorting records, merging records for JOIN and performing computations on field values
30
Q

Cost components
Memory cost

A

Cost pertaining to number of memory buffers needed during execution

31
Q

Cost components
Communication cost

A
  • Cost of shipping query and it’s results between originating client and database (and any inter-database communication)
32
Q

Cost minimisation aims for differing
scenarios

A
  • Large databases
    • Minimise access cost to secondary storage
    • Minimise no of block transfers between disk and main memory : reduce volume of data moved
  • Smaller databases
    • Minimise computation cost
    • Most of data involved in query can be stored completely in main memory
  • Distributed databases
    • Minimise communication cost
33
Q

Semantic query optimisation

A
  • Use constraints specified on the schema to inform query execution
    • E.g. unique attributes, nullable fields
  • Consider:
    • SELECT * FROM Customers WHERE CName IS NULL
    • If Cname is declared as NOT NULL then this query will never return any results
    • Semantic optimisation says don’t run the query at all!
  • As well as schema constraints, custom constraints can be implemented
34
Q

Heuristic based optimisation
Query process operations

A

Intermediate query form is usually a ‘query tree’ whose leaves are the relations involved in the query and whose nodes are the RA operators

35
Q

Query tree example - sample data

A
36
Q

Canonical query tree

A
37
Q

Query tree notation

A
  • Leaf nodes: input relations of query
  • Internal nodes: RA operations
  • Execution of query tree
    • Execute an internal node operation whenever it’s operands are available
    • Replace internal node by resulting relation
    • Execution terminates when route node executed and query result is obtained
38
Q

Relational operators as a query mechanism

A
39
Q

Multi relational queries
* Data we require may not be present in any single existing relation.
* The relational operators allow us to transform and combine existing relations in order to produce a single relation which holds the answer to our question
* “What are the names of the crew members on board the C-37D?”

A
  • “What are the names of the crew members on board the C37D?”
  • This involves the Pnames and Snames fields; we do not currently have a single relation holding these fields. In order to answer this query, we must combine all three relations in the database portion.
  • Personnel -> PID, Pname, Rank, Bplace
  • Ship -> SID, Sname, Class, Crew
40
Q

Optimisation of high level query language
expressions

A
  • Low level navigational languages (hierarchic and network)  hand optimised by the programmer.
  • High level relational languages  Declarative (what rather than how) requires optimisation
41
Q

Heuristic optimisation of query trees

A
  • Main (most obvious?) heuristic is to apply select and project before joins
  • Select and project usually reduce the size of a relation, they never increase it
  • As join is M * N operation where M,N are size of relations involved, decreasing M and/or N is an optimisation
42
Q

Heuristic Optimisation of query trees
Example

A
43
Q

Intermediate step

A
  • Second two splits are more complex
    • They refer to two relations each
  • Can’t apply these directly
    • Have to first combine the two relations they refer to
44
Q

End of step 1

A
45
Q

Step2

A

Final steps
* Finally we can replace any cartesian product operations with JOIN operations

46
Q

General heuristic rules

A
  • Select and Project operations moved down the tree
  • Most restrictive joins and selects also moved down
  • Project optimisations are also applied (not shown)
47
Q

Heuristic query optimisation
Summary

A
  • Previous example shows that a query tree can be transformed step by step into another query tree that is more efficient to execute
  • Must make sure that transformation steps always lead to an equivalent query tree
    • Query optimiser needs equivalence preserving transformation rules
  • See Elmasri and Navathe for list of a number of transformation rules and a simple optimisation algorithm employing those rule