Query Processing Flashcards by Angel De Gala

is a request posed to a database
(or data system) for data retrieval,
updating, deletion, insertion, etc.

Query

How well did you know this?

Not at all

Perfectly

is the stage in a Database Management System (DBMS) where the system interprets and executes
a user’s query (usually written in SQL) to retrieve or modify data efficiently from the database

Query Processing

How well did you know this?

Not at all

Perfectly

It ensures all table names, columns, and keywords are valid.

Parsing and Translation

How well did you know this?

Not at all

Perfectly

The query is then translated into an internal form (like relational
algebra).

Parsing and Translation

How well did you know this?

Not at all

Perfectly

finds the most efficient way to execute the query.

Query Optimizer

How well did you know this?

Not at all

Perfectly

It considers multiple query plans (e.g., using indexes, join
methods, or sorting strategies).

Optimization

How well did you know this?

Not at all

Perfectly

The goal is to minimize cost, such as CPU time, disk I/O, and memory use

Optimization

How well did you know this?

Not at all

Perfectly

runs the optimized query plan

Execution

How well did you know this?

Not at all

Perfectly

It retrieves or modifies the data from the database storage

Execution

How well did you know this?

Not at all

Perfectly

Checks SQL correctness.

Parsing

How well did you know this?

Not at all

Perfectly

Chooses whether to use an index on age or scan the whole table

Optimization

How well did you know this?

Not at all

Perfectly

Retrieves matching records and returns the name field.

Execution

How well did you know this?

Not at all

Perfectly

refer to the factors a DBMS (Database Management System) uses to estimate how expensive a query plan is

Measures of Query Cost

How well did you know this?

Not at all

Perfectly

to decide the most efficient way to run a query

Measures of Query Cost

How well did you know this?

Not at all

Perfectly

Most important factor in cost estimation

Disk I/O Cost

How well did you know this?

Not at all

Perfectly

Measures how many disk accesses (reads/writes) are needed

Disk I/O Cost

How well did you know this?

Not at all

Perfectly

Measures the time spent by the processor

CPU Cost

How well did you know this?

Not at all

Perfectly

Usually smaller than disk I/O, but becomes important for in-memory
databases.

CPU Cost

How well did you know this?

Not at all

Perfectly

Measures how much RAM is used during query execution.

Memory Usage (Buffer Cost)

How well did you know this?

Not at all

Perfectly

Affects how much data can be processed in memory without writing to disk

Memory Usage (Buffer Cost)

How well did you know this?

Not at all

Perfectly

High memory use may cause paging or spilling to disk, increasing
total cost

Memory Usage (Buffer Cost)

How well did you know this?

Not at all

Perfectly

data may be stored on multiple servers.

Communication Cost (in Distributed Databases)

How well did you know this?

Not at all

Perfectly

Critical in distributed query processing and cloud databases.

Communication Cost (in Distributed Databases)

How well did you know this?

Not at all

Perfectly

The total time taken from query submission to result delivery.

Query Response Time

How well did you know this?

Not at all

Perfectly

It’s the end-user perspective measure.

Query Response Time

Accessing data on disk

Disk I/O Cost

Processing tuples and computations

CPU Cost

Space used for intermediate results

Memory Cost

Data transfer between servers

Communication Cost

Total elapsed time for query result

Query Response Time

is a relational algebra operation used to choose rows (tuples) from a table (relation) that satisfy a specific condition (predicate).

selection operation

It’s one of the most basic and essential operations in query processing.

selection operation

Combines multiple conditions using AND (∧), OR (∨), or NOT (¬)

Compound Selection

Uses comparison operators like =, ≠, <, ≤, >, ≥.

Theta (θ) Selection

is the step where the DBMS arranges tuples (rows) of a relation (table) in a specific order — usually ascending or descending — based on one or more attributes (columns).

Sorting

It is a key operation used in many SQL queries and internal DBMS processes.

Sorting

The DBMS divides the data into chunks (runs) that fit into main memory.

Sort Phase

Each run is sorted in memory using an internal sorting algorithm (like quicksort).

Sort Phase

The sorted runs are then written back to disk.

Sort Phase

The sorted runs are merged together into larger sorted files.

Merge Phase

The process continues until one fully sorted file remains.

Merge Phase

is one of the most important — and often most expensive — operations in query processing.

join operation

It combines tuples (rows) from two or more relations (tables) based on a related attribute between them

join operation

A theta join where the condition uses equality (=) only.

Equi-Join

Automatically joins tables based on common attribute names.

Natural Join

Keeps all tuples from left table.

Left Outer Join

Keeps all tuples from right table.

Right Outer Join

Keeps all tuples from both tables, filling missing values with NULLs

Full Outer Join

For every tuple in relation R, compare it with every tuple in S.

Nested-Loop Join

Used when: Relations are small or one fits in memory.

Nested-Loop Join

Processes one block of R at a time and compares it with all blocks of S.

Block Nested-Loop Join

Used when: One relation is much smaller than the other.

Block Nested-Loop Join

Used when: Join attribute in one relation is indexed.

Indexed Nested-Loop Join

Both relations are sorted on the join attribute.

Sort-Merge Join

Used when: Both relations are already sorted or can be sorted easily.

Sort-Merge Join

divides both R and S into buckets based on the join attribute

Hash Join

Used when: Join attribute is not sorted or indexed.

Hash Join

refers to how the database system executes a relational algebra expression (or an SQL query) to produce the final result efficiently

evaluation of expression

It’s the stage where the DBMS translates the logical query (what to do) into a physical plan (how to do it)

evaluation of expression

is a combination of relational algebra operations

query expression

Each operation produces an intermediate table stored temporarily

Materialized Evaluation

Output of one operation is passed directly as input to the next

Pipelined Evaluation

efers to the process of executing database queries directly in main memory (RAM) instead of relying primarily on slower disk-based storage

IN MEMORY QUERY PROCESSING

This approach is central to modern high-performance DBMSs, especially in memory databases such as SAP HANA, Redis, and MemSQL.

IN MEMORY QUERY PROCESSING

store data on disk and move it into memory buffers only when needed.

Traditional DBMSs

Data is stored in main memory (RAM) instead of disk

Data Storage

Usually column-oriented (columnar) rather than row-oriented for faster analytics.

Data Storage

Data is periodically backed up to disk for durability

Data Storage

Queries are processed entirely in memory, using CPU caches and vectorized operations

Query Execution

The query engine uses algorithms optimized for memory (e.g., in memory joins, scans).

Query Execution

Uses write-ahead logging (WAL) or checkpointing to ensure recovery in case of failure.

Transaction Management

ACID properties are still maintained

Transaction Management

SQL query is parsed and validated

Query Parsing

Plan is generated and optimized

Query Optimization

Operators run directly on memory data structures

Query Execution

Results returned to user

Result Output

Uses cost models for memory, not disk

Query Optimization

Uses memory-resident indexes and vectorized execution

Query Execution

Minimal delay since data is already in memory

Result Output

dominant cost (since disk I/O is minimal).

CPU Time

how fast data can be transferred between memory and CPU.

Memory Bandwidth

how efficiently CPU cache is used.

Cache Utilization

multiple queries sharing memory resource

Concurrency Overhead

Query Processing Flashcards

(83 cards)