What are iterators?
SQL Server executes a query by using a set of physical operators. Because these operators iterate through rowsets, they are also called iterators.
If a table is organized as a heap, what is the only access method available?
If a table is organized as a heap, then the only access method available to SQL Server is a table scan.
What is a table scan?
In a table scan, SQL Server literally scans every data page that was allocated by the heap. Any query against the heap causes a table scan operation.
Even if you select only a few columns from this table, and even if you use a very selective WHERE clause that limits the result set to a single row, SQL Server uses a Table Scan iterator to retrieve the data.
To determine which pages have been allocated for the heap, SQL Server must refer back to the IAM. A table scan is known as an “allocation order scan” since it uses the IAM to scan the data pages in the order in which they were allocated by SQL Server.
Note that SQL Server can use the allocation order scan when a table is clustered as well.
An allocation order scan is faster if a table is less physically fragmented. Allocation order scans are not affected by logical fragmentation.
When using a clustered table, what is the difference between an “allocation order scan” and an “index order scan”?
When SQL Server performs an unordered clustered index scan (i.e. the query does not request any specific order), it refers back to the IAM to scan the data pages of the clustered index using an allocation order scan (just like a table scan).
However, SQL Server has another option for clustered indexes, the ordered clustered index scan (aka the index scan). In an index scan, SQL Server can follow the doubly linked list at the leaf node level instead of referring back to the IAM. The scan is performed in clustered index order.
In each of these cases, the Clustered Index Scan iterator is used.
What iterator is used to scan a nonclustered index?
SQL Server uses the Index Scan iterator to scan a nonclustered index.
As with the Clustered Index Scan iterator, SQL Server can perform an allocation or an index order scan when scanning a nonclustered index.
Can SQL Server Query Optimizer cover a query with multiple nonclustered indexes?
Recall that a nonclustered index can cover a query (included columns, etc). Covering a query means that SQL Server can find all data needed for the query in a nonclustered index and does not need to do any lookups in the base table.
In some cases, the Query Optimizer can even decide to cover a query with multiple nonclustered indexes. SQL Server can join nonclustered indexes. All nonclustered indexes of a table always have some data in common that can be used for a join. If a table is organized as a heap, then this data is the RID; if the table is clustered, the this is the clustering key.
Can allocation order scans be unsafe?
Yes. With an allocation order scan, SQL Server can skip some rows and read other rows multiple times.
This can happen if you use the Read Uncommitted isolation level in a read-write environment.
While one query is performing an allocation order scan, another command could update the data and cause movement of one or more rows. The scanning query might have already read these rows and could read the rows again after movement. Or the scanning query might already have passed the page to which a row was moved from a page that was not scanned yet and the scanning query never reads this row.
Why might a row be moved?
A row can move because of multiple causes. For example, a command might update a variable length column and replace a short value with a long one. A page might be full, and thus the updated row has to move to another page.
A set of rows can move if there are inserts in a clustered table and a page split occurs. Approx. half of the rows are moved to the new page.
What is a “seek and partial scan”?
When you scan an index, SQL Server is not limited to a full scan. If you limit a rowset returned by a query and the scan is ordered, then SQL Server can seek for the first value of the rowset needed and then perform a partial scan of subsequent values in the logical order of an index.
SQL Server can use a seek and partial scan for both clustered indexes and covering nonclustered indexes.
After the first record is found, SQL Server does not use seek for subsequent orders; instead it performs a partial scan.
What is the most common access method SQL Server uses in OLTP environments?
A nonclustered index seek with an ordered partial scan and then a lookup into the base table for each row found in the nonclustered index. Such plans are common for selective queries. The base tables can be organized as a heap or as a balanced tree.
When the table is organized as a heap, SQL Server uses the RID Lookup operator to retrieve the rows from the base table.
If the table is clustered, then SQL Server uses the Key Lookup operator instead of the RID lookup operator.
What is a partial scan?
Where SQL Server seeks to find the first qualifying row and then scans to the end of the qualifying range.
What are the basic JOIN algorithms?
SQL Server supports 3 basic join algorithms: nested loops, merge joins, and hash joins. A hash join can be further optimized by using a bitmap filtering (i.e. a bitmap filtered hash join).
How does the nested loops algorithm work?
The nested loops algorithm is very simple, and in many cases, efficient algorithm. SQL Server uses one table for the outer loop, typically the table with fewer rows. For each row in this outer input, SQL Server seeks for matching rows in the second table which is the inner table.
SQL Server uses a join condition to find the matching rows. The join can be a non-equijoin meaning that the Equals operator does not need to be part of the join predicate.
If the inner table has no supporting index for a seek, the SQL Server scans the inner input for each row of the outer input. This is not an efficient scenario. A nested loops join is efficient when SQL Server can perform an index seek in the inner input.
How does the merge join algorithm work?
Merge join is a very efficient algorithm. However, it has its own limitations. It needs at least one equijoin predicate and sorted inputs from both tables. This means that the merge join should be supported by indexes on both tables involved in the join.
If one input is much smaller than another, then the nested loops join could be more efficient than a merge join. In other words, it works better on larger tables.
In a one-to-one or one-to-many scenario, the merge join scans both inputs only once. It starts by finding the first rows on both sides. If the end of the input is not reached, the merge join checks the join predicate to determine whether the rows match. If the rows match, they are added to the output. Then the algorithm checks the next rows from the other side and adds them to the output if they match the predicate.
If the rows from the inputs do not match, then then algorithm reads the next row from the side with the lower value. It reads from this side and compares the row to the row from the other side until the value is bigger than the value from the other side. Then it continues reading from the other side and so on.
In a many-to-many scenario, the merge join algorithm uses a work table to put the rows from one input side aside for reusage when duplicate matching rows from the other input exist.
How does the hash join algorithm work?
If none of the inputs is supported by an index and an equijoin predicate is used, then the hash join algorithm might be the most efficient one. It uses a searching structure named a hash table which SQL Server builds internally. It uses a hash function to split the rows from the smaller input into buckets. This is called the “build” phase.
SQL Server uses the smaller input for building the hash table because it wants to keep the hash table in memory. If it needs to get spilled to disk, then the algorithm might become much slower. The hash function creates buckets of approx equal size.
After the hash table is built, SQL Server applies the hash function on each of the rows from the other input. It checks to see into which bucket the row fits. Then it scans through all rows from the bucket. This phase is called the “probe” phase.
In a single-thread mode hash join is usually slower than merge and nested loops join that are supported by indexes. However, SQL Server can split rows from the probe input in advance, and perform partial joins in multiple threads. The hash join is actually very scalable. This kind of optimization of a hash join is called a bitmap filtered hash join. It is typically used in a data warehousing scenarios where you have large inputs fro a query and a few concurrent users, so SQL Server can execute a query in parallel.
Although a regular hash join can be executed in parallel as well, the bitmap filtered hash join is more efficient because SQL Server can use bitmaps for early elimination of rows not used in the join from the bigger table.
Which join algorithm is the only one to support non-equijoins?
Summarize the characteristics of the three join types?
Index on Joining Columns: Inner table: Not indexed Outer table: Optional Optimal condition: Small outer table, large inner table. Usual Size of Joining tables: Any Presorted: No Join Clause: Equijoin
Index on Joining Columns: Both tables: must Optimal condition: Clustered or covering index on both. Usual Size of Joining tables: Large Presorted: Yes Join Clause: Equijoin
Nested loops -
Index on Joining Columns: Inner table: must Outer table preferable. Usual Size of Joining tables: Small Presorted: Optional Join Clause: All
When does SQL Server use the Sort operator?
SQL Server uses the Sort operator whenever it has to sort an input. There may be a number of reasons to sort the input. For example, SQL Server might decide to sort an input so it can use the merge join algorithm. A very typical example of Sort operator usage is for queries that request an ordered rowset when the order is not supported by an index.
The sort operation could be very expensive; for good performance, you should make sure that the Sort operator used for small inputs only.
What two different algorithms does SQL Server use for calculating aggregations?
If an input is ordered by the columns used in the GROUP BY clause, then SQL Server uses the “stream aggregation” algorithm, which is implemented in the Stream Aggregate operator. Stream Aggregation is very efficient. SQL Server might even decide to sort the input before performing the aggregation in order to make it possible to use the Stream Aggregate operator.
If the input for the aggregation is not ordered and the input is so big that sorting it would be inefficient, then SQL Server uses the “hash aggregation” algorithm. The operator used for this kind of aggregation is the Hash Match Aggregate operator.
The hash aggregation algorithm builds the hash table from the input like it builds it for a hash join. However, the buckets are used to store the groups.
Similarly to a hash join, hash aggregation is scalable as well. Like the stream aggregation algorithm, the hash aggregation algorithm can compute multiple groups simultaneously in multiple threads.
What aggregation algorithms does SQL Server use?
Stream aggregation and hash aggregation.
Which operator is used when SQL Server performs a non clustered index seek to find a row, but then also needs data from the underlying table, which is organized as a clustered index?
Key lookup. SQL Server uses the Key Lookup operator to get data from the underlying clustered table after a non clustered index seek.
What is the scan called when SQL Server scans a clustered index in logical order of the index?
Index order scan. When SQL Server scans the data in the logical order of an index, this scan is called the index order scan.
What is an allocation scan?
When SQL Server scans the data in the physical order, this scan is called the allocation order scan.
Why does SQL Server try to parameterize your queries?
The SQL Server Query Optimizer has a lot of work to do to determine a good execution plan. After the plan is built, SQL Server caches it in order to make it available for possible reuse. SQL Server tries to parameterize your queries and thus makes it more likely that a plan will be reused.
Does SQL Server parameterize ad hoc queries automatically?
Yes. SQL Server parameterizes ad hoc queries automatically.
Why might SQL Server choose not to reuse a cached plan?
(1) Changes in parameter data type for a parameter in the parameterized predicate (WHERE orderid=10248 vs oderid=10249.0)
(2) SQL Server cannot be sure about the “selectivity” of the query.
(3) Some SET options changed that could influence the query result, e.g. SET CONCAT_NULL_YIELDS_NULL OFF;
All of these reasons can produce a new plan when you would expect reuse of an existing plan. Or they could cause a query to not be parameterized.
How can you get information about cached plans and the number of times plans were reused?
You can get information about cached plans and the number of times the plans were reused by querying the sys.dm_exec_query_stats dynamic management function. You can get the exact text of the query from the sys.dm_exec_sql_text dynamic management function.
What does DBCC FREEPROCCACHE do?
DBCC FREEPROCCACHE clears the procedure cache.
What are two ways to enforce plan reuse?
Using sys.sp_executesql and dynamic SQL as well as stored procedures. The latter is a better practice.
How can you force a new/different execution plan with each call to a stored procedure?
A procedure might return a different amount of rows based on a parameter value. For some values, a query inside the procedure might be very selective, and for other values, not selective at all. Therefore, you might want to have a different execution plan for each call. You can force SQL Server to recompile a stored procedure if you create it with the WITH RECOMPILE option.
In addition, you can force recompilation at the query level. Instead of recompiling the complete procedure, you can recompile only the critical statements.
CREATE PROCEDURE Sales.GetCustomerOrders (@custid INT) WITH RECOMPILE AS SELECT * FROM Sales.Orders
What is batch mode processing?
A columnstore index has two modes of operation: batch mode and row mode. Of the two, batch mode is much faster. It takes advantage of a special method of accessing the data that allows for pulling large sets of data, in batches, so that the CPU needs to deal with metadata for a row only once per batch.
With columnstore indexes, SQL Server can sometimes perform batch operations directly on compressed data, thus skipping the uncompressing action as well.
The batch mode processing also has extra benefits when working with multiprocessor machines.
What operators support batch mode processing?
(4) Local hash (partial) aggregation
(5) Hash inner join
(6) Batch hash table build
How would you determine whether SQL Server used the batch processing mode for a specific iterator?
You can check the iterator’s Actual Execution Mode property.
What is the main advantage of batch mode processing?
It lowers the CPU burden. Lower CPU usage is the main advantage to batch mode processing.
What options do you have for influencing/changing the execution plan the Query Optimizer selects?
Optimizer hints and plan guides.
What are optimizer hints?
Optimizer hints are actually directives for the query execution. There are three kinds of hints:
(1) table hints
(2) query hints
(3) join hints.
You specify query hints as part of the OPTION clause of the SELECT, INSERT, UPDATE, DELETE, and MERGE statements. You cannot use query hints in subqueries, only in the outermost query. If multiple queries are involved in a UNION operation, you can specify the OPTION clause only after the last query. You can specify query hints in an INSERT statement except when a SELECT clause is used inside the statement.
--Forcing stream aggregate SELECT qty, COUNT(*) as num FROM Sales.OrderDetails GROUP BY qty OPTION (ORDER GROUP);
** Use hints as the last resort and after you use them, validate whether they are still useful from time to time.
What are table hints?
You can give SQL Server a hint for a single table in a query. Table hints influence locking and access methods for a single table or view only. You can use the table hints in the FROM clause and introduce them by using the WITH keyword.
Maybe the most popular optimizer hint is the table hint that forces a specific index usage.
SELECT orderid, productid, qty
FROM Sales.OrderDetails WITH (INDEX(idx_nc_productid))
WHERE productid BETWEEN 10 AND 20
ORDER BY productid;
What are join hints?
SQL Server 2012 also supports the following join hints in the FROM clause:
–forced merge join
SELECT O.custid, O.orderdate, OD.orderid, OD.productid, OD.qty
FROM Sales.Orders AS O
INNER MERGE JOIN Sales.OrderDetails AS OD
ON O.orderid = OD.orderid;
What is a plan guide?
Plan guides let you optimize the performance of queries when you cannot or do not want to directly change the text of the actual query with hints.
Plan guides can be useful when a small subset of queries in a database application provided by a third-party vendor are not performing as expected, i.e. in places where you cannot directly modify the application code.
In the plan guide, you specify the Transact-SQL statement that you want optimized and either an OPTION clause that contains the query hints you want to use or a specific query plan you want to use to optimize the query.
When the query executes, SQL Server matches the Transact-SQL statement to the plan guide and attaches the OPTION clause to the query at run time or uses the specified query plan.
Basically, you can force optimization of the query in a procedure for a specific parameter. Even if a different parameter is passed, the query will be optimized for that one parameter.
What are the different types of plan guides?
(1) OBJECT plan guides - Used by the Query Optimizer to match queries inside stored procedures, scalar user-defined functions, multi-statement table-valued user-defined functions, and DML triggers.
(2) SQL plan guides - Used by the Query Optimizer to match stand-alone queries or queries in ad hoc batches.
(3) TEMPLATE plan guides - Used by the Query Optimizer to match stand-alone queries that can be parameterized to be a specified form.
How do you create/modify plan guides?
You create plan guides by using the sys.sp_create_plan_guide system stored procedure.
@name = ‘Cust71’
‘SELECT orderid, custid, empid, orderdate
@type = ‘OBJECT’
@module_or_batch = ‘Sales.GetCustomerOrders’
@params = NULL
@hints = ‘OPTION (OPTIMIZE FOR (@custid = 71))’
You can disable, enable, or drop a plan guide by using sys.sp_control_plan_guide system procedure.
You can create a plan guide from a cached query plan by using the sys.sp_create_plan_guide_from_handle system procedure.
You can validate a plan by using the sys.fn_validate_plan_guide system function. Note that a plan guide might become invalid if a database schema changes.
You can use the sys.sp_get_query_template system stored procedure to get the parameterized form of a query.
How can you get a list of all plan guides in a database?
You can always get a list of all plan guides in a database by querying the sys.plan_guides catalog view.
You can also list all of the hints used in each plan guide like the following query shows:
SELECT plan_guide_id, name, scope_type_desc, is_disabled, query_text, hints
Why might you prefer using plan guides instead of optimizer hints?
With plan guides, you do not need to change the query text.
What does the OPTION (ORDER GROUP) query hint force?