Week 9 - Physical design Flashcards
What is the goal of the Conceptual Design phase in database design?
To study the problem domain and identify the entities and relationships involved, creating an abstract model of the data without worrying about how it will be stored.
What happens during the Logical Design phase?
Entities and relationships are mapped to tables, and normalization is applied to remove redundancy and ensure data integrity.
What is the focus of the Physical Design phase?
To optimize performance by adjusting table design for storage on disk, choosing indexes, and fine-tuning how data will be physically stored and accessed.
What is the purpose of Physical Design in database systems?
The goal is to improve the efficiency of the system while maintaining correctness.
Physical design (or database tuning) focuses on:
How data is stored on disk
Choosing indexes, partitioning, and access paths
Making performance-enhancing decisions based on expected usage
It doesn’t involve building a DBMS from scratch but helps you understand what problems are addressed and why tuning techniques work.
Where are database records stored during implementation?
On permanent storage media like hard disks, in special files managed by the DBMS.
How does a DBMS access data from storage?
It loads data in pages (e.g., 2KB to 32KB) as needed, rather than reading files sequentially.
Why is disk access a concern in physical design?
Because disk access is much slower than main memory or CPU, so minimizing disk I/O is crucial for performance.
What is a buffer pool in a DBMS?
It’s a part of main memory used to store pages temporarily, allowing faster access to recently used data.
What happens to a page after it’s fetched and processed?
It’s kept in the buffer pool instead of being discarded immediately, improving performance if it’s needed again soon.
Can the buffer pool store the entire database?
No — the buffer pool is limited, so only a portion of the database can be held in memory at a time.
Why should the number of pages used to store a table be minimized?
Because fewer pages mean less disk I/O, which improves performance by reducing the time spent reading from storage.
Why is scanning every record of a large table expensive?
Because it requires loading many pages from disk, which is slow and resource-intensive.
What is the selection operation (σC), and how is it typically evaluated?
The selection operation picks rows based on a condition C. Typically, it requires scanning every row to check if the condition applies. If the condition is simple, the DBMS may know exactly which pages contain the relevant rows.
How can the selection operation σC be optimized?
If there’s an index on the attribute(s) in the condition, the DBMS can locate only the relevant pages, avoiding the need to scan every row.
What is the natural join (T₁ ⋈ T₂) operation, and how is it typically executed?
The natural join combines rows from T₁ and T₂ that agree on the common attributes. It usually involves comparing every row of T₁ with every row of T₂, which can be inefficient for large tables.
How can the natural join (T₁ ⋈ T₂) be optimized to avoid unnecessary comparisons?
The goal is to reduce comparisons by focusing only on rows that are likely to match, using techniques such as hashing or sorting to speed up the join process.
Why must the intermediate result of a query be “reasonably small” for optimization to be effective?
Because many optimizations (like pushing selections or reordering joins) rely on operating over smaller tables—large intermediate results negate their benefit.
Why are joins considered an expensive operation in query evaluation?
Because a naive join compares rows of one table with rows of another, leading to potentially huge numbers of comparisons and disk I/O.
What guideline helps keep join costs under control with respect to result size?
Ensure that the size of the join result does not exceed the size of the input tables by a large factor—smaller results are faster to process.
How can a DBMS be “guided” to find matching rows quickly and avoid non-matches?
By providing indexes, join hints, or up-to-date statistics so the optimizer knows where to look and which join algorithms to choose.