Query Processing Flashcards

1
Q

Query Processing

A

This is where we look at the query compiler, and execution engine. This is responsible for transforming SQL queries into sequences of database operations and then execute the queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Queries

A

In queries, we typically tell the DBMS what we want, not how we want it.
We’re typically going to look at select statements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relational Algebra

A

Set of operations that can be applied to relations to compute new relations (things like select, projection, natural join, union etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relational Algebra Expression

A

Often called a logical query plan, and are often represented as trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logical Query Plan example

A

SELECT department, name
FROM Stores, Employees
WHERE department=worksAt AND city=’Liverpool’

can be converted to

π_(department, name)(σ_(department=worksAt AND city=’Liverpool’)(Stores x Employees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

σ

A

Selection, used for conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

π

A

Projection, restricts attributes in attribute list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ρ

A

Renaming, very simply used for representing one thing as another (like using AS in SQL).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A

Cartesian product, pairs each tuple together (like using SELECT DISINCT * FROM _)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Query Plan steps

A
  • Compute an intermediate result for each node
  • For a leaf labeled with relation R, the intermediate result is R
  • For an inner node labeled with operator op, get the intermediate result by applying op to the children’s intermediate results.
  • Result of query -> intermediate result of the root.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Equijoins

A

Represented as R ⋈_a=b S.
Joins things together based upon attributes given (in this case, a and b).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Merging (merge sort)

A

Way of understanding equijoin:
1) Check values of a and b. If a is smaller than b, move down in a, and if b is smaller than a, move down in b.
2) If a and b are equal, we merge the cell of a and b, and move down in b.
3) Repeat
If there are duplicates in a, we remember that they are duplicates and repeat the b table again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fast Join Algorithms

A

There is a fast join algorithm which is the sort join algorithm, and can be a fast way of computing R ⋈_a=b S.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Other Join Algorithms

A
  • Index join
  • Hash join
  • Multiway join
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relations on disk

A

Relations are stored on disk, and RAM is not often big enough to store everything, but if we stored everything on disk then it would take longer than if on RAM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Disk parameters

A
  • B = size of the disk block (512 to 4096 bytes)
  • M = number of disk blocks that fit into available RAM.
17
Q

Reading a relation R

A

No. of elementary operations =
O(|R|)
No. of disk access =
O(|R|/ B)

18
Q

Sorting R on attribute A

A

No. of elementary operations =
O(|R| log_2 |R|)
No. of disk access =
O((|R|/ B) log_2 (|R|/ B))

19
Q

O(|R|)

A

Since we are reading them one by one linearly.

20
Q

O(|R| log_2 |R|)

A

O(n log n), based upon sort join algorithm.

21
Q

O(|R|/B)

A

We can just read the blocks instead. Since it under the presumption that each record has a constant size.

22
Q

O((|R|/ B) log_2 (|R|/ B))

A

We are sorting all disk blocks, not individual records.

23
Q

Index

A

Given the values for one or more attributes of a relation R, provides quick access to the tuples with these values.

24
Q

Types of index

A

Primary -> defines how data is sorted on disk.
Secondary -> merely points to location records on disk.

25
Q

Forms of index

A

B+ Trees
Hash Tables

26
Q

B+ Trees

A

Good if select condition specifies a range, and it most widely used.

27
Q

Hash Tables

A

Good if selection involves equality only.

28
Q

Virtual Columns

A

If we have indices over multiple columns (instead of doing one column, like year, we can do two columns, like programme), we create a virtual column which concatenates both of them, with the first one being priority over the second one (like using GROUP BY).

29
Q

Heuristics

A
  • Push selections as far down the tree as possible
  • Push projections as far down as possible, or insert projections where appropriate.
  • Where possible, introduce equijoins for cross product followed by selections.
30
Q

Physical Query Plan

A

Adds information required to execute the optimised query plan.
Things like:
- Which algorithm to use for execution of operators (naive selection or index selection, nested block join or sort join or hash join)
- How to pass information from one operator to another (write to disk, keep in memory, pipelining operators)
- Good ordering for computer joins, unions etc
- Additional operations such as sorting
are all considered.

31
Q

Using physical query plans for cost

A

We generate many different physical query plans and estimate the cost of execution for each plan (time, disk access, memory etc). From then, we select the lowest cost.

32
Q

Estimation cost of execution

A

Relies on number of disk access operations.

33
Q

Number of disk operations influences

A

Number of disk accesses are influenced by many factors:
- Selection of algorithms for the individual operators.
- Method for passing information.
- Size of intermediate results.

34
Q

Disk access operation parameters

A
  • Size of relations
  • Number of distinct items per attribute per relation.
35
Q

Projection example

A

For σ_a=b(R)
|R| / number of distinct values in column a of relation R.

36
Q

Joins example

A

For R ⋈ S, we have:
(|R| x |S|) / max number of distinct values for A in R or S.

37
Q

How to generate physical query plans?

A

A sensible approach is to go either top-down or bottom-up:
- Selection for a suitable algorithm for each operator based upon size of intermediate result
- Select of a good join order based upon size of intermediate result

38
Q

Passing Information

A

Either materialisation or pipelining.