WEEK 8 Flashcards
Scenario S1
Access Routine S1
Scenario S2
Access Routine S2
Scenario S3
Access Routine S3
- Can use hash search
- Since we are doing equality search
- On a key field
a) Hash the search value to find pointer
to the bucket
b) May involve search of overflow
buckets
Scenario S4
- In our sample data, all values for age are unique, but in a realistic data set we would not expect this to be so
- In this case we may be retrieving multiple records
- S4 Primary index, may be multiple records if condition involves <, <=, =>, > on the search field then we find record satisfying the equality condition (age = 36) and then retrieve all subsequent records in ordered file
Scenario S5
Access Routine S5
* Clustering index search
* One index entry for each distinct value of clustering field
* Binary search of index file to find first block of data containing matching search record
Scenario S6
- Secondary index (B+ tree) equality and others;
- multiple records may match
- Operates in much the same way as S4
- Use index to find required record(s) in main file
- This time traverse tree rather than search
ordered file
- This time traverse tree rather than search
- Retrieval behaviour depends on operator
- < : Find first occurrence of search value, retrieve all preceding
records - > : Find last occurrence of value, retrieve all subsequent records
- <= : Find last occurrence of value, retrieve it and all preceding
records - > = : Find first occurrence of value, retrieve it and all subsequent
records
S6 example - consider leaf nodes
- Remember a B+ tree usually has pointer from each leaf block to next leaf block
- Once first matching record located, can traverse as a linked list to obtain other records for >= comparison
- If nodes have back pointer, can traverse as doubly linked-list for <= comparison
What is the Conjunctive Selection Conditions
- Boolean expressions involving AND operator
- General form
- (simple condition) AND (simple condition) AND…
- S7: If we can use one of methods S2-S6 for an attribute in one of the simple conditions; do so; examine the records thus retrieved to see if they meet other conditions
-S8: Composite index: if two or more attributes involved in equality conditions also appear in a composite index, use the index directly
Scenario S9
Access Routine S9
- Intersection of record pointers: If secondary indices available on all fields involved in equality conditions and indices include record pointers (i.e. inverted files): retrieve set of record pointers from each index involved and take intersection as final result set
Optimisation
- Query optimisation for select needed for conjunctive conditions when different simple conditions offer an access path
- Access path that results in the fewest record retrieval operations should be chosen
Disjunctive conditions (OR)
- Little can be done to optimise
- If any one of the simple conditions involve an attribute with no access path, we must use serial access (brute force)
Join Operations
- One of the most time consuming operations in query processing
- M*N operation
a. Cartesian product followed by a select
Methods for implementing Joins
J1: Nested-loop join
- Default (brute force) algorithm
- Used when no access paths available on
either file in the join
- For each record t in R (outer loop)
- Retrieve every record s from S (inner loop)
- Test whether the two records satisfy the join condition
- t[A] = s[B]
Explain J2: Single loop join
Using an access structure to retrieve the matching records
a) If index (or hash key) exists for one of the two join attributes e.g. B or file S
- Retrieve each record of t in R (loop over file R)
- Use access structure for B to retrieve all matching records from S that satisfy s[B] = t[A]
Explain J3: Sort-merge join
- Possible if records of R and S are physically ordered by the value of the join attribute A, B
- Can implement join in most efficient way possible
- Both files are scanned concurrently in order of the join attributes
a. Match up records that have same values
for A and B - Pairs of file blocks copied into memory buffers in order and records of each file only scanned once
Explain J4: Hash-join
- Records (or record pointers) of R and S are hashed to same hash file using same hash function on attributes R[A] AND S[B] as hash keys
- Single pass through each file fills the hash buckets
- Each bucket then examined from records R and S that match to produce the join result
PROJECT Operations
If <attribute> include a key of R, then result has same number of tuples as R but only the attributes given in <attribute></attribute></attribute>
- If not, we must detect duplicate tuples and eliminate them
- Two possible elimination techniques are:
- Sort the result and then eliminated duplicates (which will now be duplicate tuples)
- Hash each projected tuple,check against those already in the bucket and discard if a double
- Remember in SQL, we use the DISTINCT keyword to force the elimination of project duplicates
Composite index
- Composite (or concatenated) index is one based on more than one attribute. It can be clustering or otherwise
- E.g. Person (NI_No, Surname, Forename, DOB…)
- We can specify a composite index on surname, forename
- Advantages over single attribute indexes:
a. Consider the query “how many people are there with last name ‘Smith’ and first name ‘John’?
b. A query on all attributes of a compound query will likely return far fewer records than a query on only some of those attributes.
Query processing
- What happens when we execute a query on a database?
- Like all languages, high level statements must be interpreted (or compiled) prior to execution
- Runtime database processor performs sequence of operations before returning query result
Query processing
Query process operations
Scanning, parsing and validation
- Scanner
- Identifies language tokens such as SQL keywords, attribute names, relations etc in text of query
- Parser
- Checks query syntax to determine if it is formulated according to the grammar rules of
the query language - Validation
- Check all attribute and relation names are valid and semantically meaningful names in
the database schema - Internal representation of query
- Usually a query tree, may be a query graph
Query processing
Query Optimiser, Code Generator, Runtime database processor
- Query optimiser
- Responsible for devising an execution plan for retrieving the result of the query from the DB
- Query can have several possible execution plans, optimiser chooses a suitable one
- Code generator
- Generates the code to execute the plan produced by the optimiser
- Runtime database processor
- Runs query code to produce the query result
- Code may be executed in compiled or interpreted mode
- Error message generated if runtime error occurs
Query optimisation
Process of selecting a suitable execution plan for the query, Three Main Techniques
- Process of selecting a suitable execution plan for the query
- May or may not be ‘optimal’ plan
- Seeks to provide a reasonably efficient strategy
- Finding optimal strategy is usually too costly except for simplest of queries
- Three main techniques
- Cost based
- Semantics based
- Heuristic based
- May be used in conjunction with each other
Cost based optimisation
- Search solution space to find the solution that minimises cost
- Compare costs of executing a query using different execution strategies and choose the lowest cost estimate
- Cost functions are estimates, so a solution that is not the optimal may be chosen