What is the relational engine?
The relational engine is an internal component that works on a logical level. It parses the query which is then processed by the Query Optimizer. The Query Optimizer generates the execution plan.
What is the storage engine?
The storage engine carries out the physical operations and performs the actual execution. This is where processes such as locking, index maintenance, and transactions occur.
What are the 4 query execution phases?
(1) Parsing, (2) Binding, (3) Optimization, (4) Execution.
What occurs during the parsing phase?
During the parsing phase, the relational engine checks whether your query is syntactically correct. The result of this phase, if the query passed the syntax check, is a tree of logical operators known as a parse tree. The parse tree represents the logical steps necessary to execute the query that has been requested.
What occurs during the binding phase?
In the binding phase, the relational engine resolves the object names (tables, columns, and types) in the query and binds them to logical operators. The result of this phase is the algebrized tree, which is a tree of logical operators bound to actual objects. The algebrized tree is passed on to the query optimizer.
What occurs during the optimization phase?
In the optimization phase, using the algebrized tree and the statistics about the data, the query optimizer (in the relational engine) works out what it thinks will be the optimal way to execute the query - that is, it generates an execution plan. The optimizer figures out how best to implement the request represented by the T-SQL query submitted. It decides if the data can be accessed through indexes, what types of joins to use and much more. The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan - in terms of the required CPU processing and I/O. This is known as a cost-based plan. The SQL Server Query Optimizer is known as a cost-based optimizer.
The optimizer will generate and evaluate many plans (unless there is already a cached plan) and generally speaking, it will choose the lowest-cost plan.
The result of this phase is the actual execution plan which is a single tree with physical operators.
What occurs during the execution phase?
The actual execution of the selected plan is performed by the storage engine which carries out the physical operations.
The result of the execution phase is the desired result set. In addition, the result of the execution phase might also be a cached plan.
How does the SQL Server optimizer calculate the cost?
SQL Server calculates the cost of an operation by determining the algorithm used by a physical operator and by estimating the number of rows that have to be processed. The estimation of the number of rows is called “cardinality estimation”. The estimation of the number of rows processed by each physical operator comes from “optimizer statistics”.
The cost expresses usage of physical resources such as the amount of disk I/O, CPU time, and memory needed for execution. After the optimizer gets the cost for all operators in a plan, it can calculate the cost of the whole plan.
A higher cost means a more complex plan, and a more complex plan, means a slower query.
What is the plan cache?
Caching the selected execution plan in the “plan cache” can speed up the next execution of the same query or an equivalent query from the execution perspective. SQL Server can cache an execution plan in order to have it ready for the next execution, thus avoiding doing the optimization.
SQL Server actually tries to parameterize your queries in order to have one plan for multiple equivalent queries. Equivalent queries are queries that can be executed in the same way.
SELECT col1 FROM TableA WHERE col2=3;
SELECT col1 FROM TableA WHERE col2 = 5;
SQL Server caches the execution plan separately from the actual value (the execution context). This way, SQL Server can reuse the same execution plan multiple times.
Why might using a cached plan not always be the best solution?
The number of rows in a table might grow substantially. All plans that include scans of that table, which might be fast enough for a small table, could suddenly become suboptimal.
Plans in cache can also become obsolete because metadata changes in a database. For example, an index could be added to a table or a constraint could be altered.
What is parameter sniffing?
The Query Optimizer sometimes has to guess the cardinality estimation because it cannot detect for sure what it is from your parameters. This problem is known as a parameter sniffing problem. Parameter sniffing is a process where SQL Server tries to guess (or sniff) the current parameter value during compilation and passes it to the Query Optimizer.
What are optimizer statistics?
SQL Server maintains statistics
about the total number of rows and distribution of the number of rows over key values of
an index for each index. It’s “data about the data” which provides the information necessary for the optimizer to make it’s cost calculations.
What are the reasons a suboptimal plan might be selected?
– The selected plan is not the best because the search space of the execution plans was too big.
– Statistical information is not present or update which leads to wrong cardinality estimation.
– A cached plan is suboptimal for the current parameter value.
– Parameter sniffing leads to inaccurate cardinality estimation.
– The Query Optimizer underestimates or overestimates the cost of an algorithm implemented in a physical operator.
– Hardware changes could better accommodate a different plan. For example, someone could add CPUs to a box and a plan that uses more CPU time could be more appropriate.
What are Extended Events?
SQL Server Extended Events is a very lightweight performance monitoring system where you can correlate data from SQL Server with data from the operating system and application.
An Extended Events package is a container for all extended events objects. These objects include:
– Events - These are your points of interest for monitoring. You can use events for monitoring or to trigger actions.
– Targets - These are event consumers. You can use targets that write to a file, store event data in a memory buffer, or aggregate event data.
– Actions - These are responses to an event. They are bound to an event. Actions can capture a stack dump and inspect data, store information in a local variable, aggregate event data, or even append data to event data, e.g. the execution plan detection action detects execution plans.
– Predicates - These are sets of logical rules to filter captured events. In order to minimize the impact of a monitoring session, it is important that you capture only events you need.
– Types - These help interpret the data collected. The data is actually a collection of bytes, and types give this data context. A type is provided for events, actions, targets, predicates, and types themselves.
– Maps - These are SQL Server internal tables that map internal numeric values to meaningful strings.
What is SQL Trace?
SQL Trace is an internal SQL Server mechanism for capturing events. SQL Trace is deprecated in future versions. This means that it will still be available in the life cycle of SQL Server 2012 and the next version of SQL Server; however, after the next version, SQL Trace might be discontinued.
You can create traces through a set of SQL Server system stored procedures. You can create traces manually or through the SQL Server Profiler UI. You trace SQL Server events. A source for a trace event can be a TSQL batch or some other SQL Server event, such as a deadlock. After an event occurs, the trace gathers the event information. Event information is then passed to a queue. Before passing to the queue, events are filtered according to your filters. From the queue, the trace information can go to a file or a SQL Server table, or it can be used by applications, such as SQL Server Profiler.
What is SQL Server Profiler?
SQL Server Profiler is a rich application that serves as a UI for SQL Trace. With SQL Server Profiler, you can create and manage traces, and you can analyze results of your traces. You can replay events from a saved trace step by step. To start a server-side trace, you can script a trace you created through the SQL Server Profiler UI, and then execute the script directly on your SQL Server instance.
What are some drawbacks to using SQL Server Profiler?
– You increase the monitoring impact on your SQL Server instance compared to when using SQL Trace only, due to the overhead the SQL Server Profiler UI is producing.
– When you use the SQL Server Profiler UI on a computer with the SQL Server instance you are monitoring, SQL Server Profiler competes for the same resources.
– When you use SQL Server Profiler remotely, all events must travel over a network which slows down other network operations.
– SQL Server Profiler shows events in a grid which can consume a lot of memory when you capture many events.
– You or someone else might inadvertently close the Profiler and stop the trace when you need to capture the events for a longer time.
What are the objects used by SQL Trace and SQL Server Profiler?
– Event - An event is an action within SQL Server. For example, an action can be a logon
failure, T-SQL batch, start of a stored procedure, and more.
-- Event Class - This is the type of an event. The event class defines the data that an event can report.
– Event Category - Event categories define groupings of events.
– Data Column - A data column is an attribute of an event. If you save a trace to a table,
an event is represented by a row in the table, and attributes of events are columns in
– Template - A template is a saved trace definition. SQL Server Profiler comes with a
couple of predefined templates that can speed up the creation of a trace.
– Filter - Filters limit the events traced. You can put a filter on any event column. In
order to minimize the impact of monitoring on your SQL Server instance, you should
filter out any event you do not need in your current trace.
– Trace - A trace is a collection of events, columns, filters, and data returned.
Why should you use Extended Events over SQL Trace and SQL Server Profiler?
Because Extended Events is more lightweight and SQL Trace and SQL Server Profiler are deprecated in future versions of SQL Server.
What are the actions of the optimization phase of query execution?
Generation of candidate plans and selection of the best candidate plan.
In which phase of query execution does SQL Server check whether the objects referred to by the query exist?
SQL Server resolves object names and binds them to logical operators in the binding phase.
What are pages?
SQL Server stores data on pages. A page is a physical unit on disk inside a SQL Server database. The size of a pages is fixed to 8 KB. A page belongs to a single object only such as a single table, index, or indexed view.
What are extents?
Pages are further grouped into logical groups of 8 pages called extents. An extent can be “mixed”, if pages on this extent belong to multiple objects or “uniform” when all pages from this extent belong to a single object only.
What is one of the goals of optimization?
To lower disk I/O and lower the number of pages SQL Server has to read.
What is STATISTICS IO?
STATISTICS IO gives you information about the number of pages per table accessed by queries.
You can turn on STATISTICS IO by using the SET STATISTICS IO T-SQL command.
STATISTICS IO is a session level option.
SET STATISTICS IO ON;
SELECT * FROM Sales.Customers;
What is a “session level option”?
A session level option means that the option stays unchanged for the complete session, until you disconnect from SQL Server, unless you turn it off.
What type of information does STATISTICS IO return?
(1) Scan count - The number of index or table scans performed.
(2) Logical reads - The number of pages read from the data cache. When you read a whole table, this number gives you an estimate about the table size.
This gives you a rough estimate of the effectiveness of a query. However, you should not use this information without thinking and additional knowledge.
SQL Server counts every touch of a table, even if pages are already cached. However, when pages are in the cache, touching them is not expensive.
(3) Physical reads - The number of pages read from disk. This number is lower than the actual number of pages because many pages are cached.
(4) Read-ahead reads - The number of pages SQL Server reads ahead.
(5) Lob logical reads - The number of large object (LOB) pages read from the data cache.
(6) Lob physical reads - The number of large object-type pages read from disk.
(7) Lob read-ahead reads - The number of large object-type pages SQL Server reads ahead.
What are LOBs?
LOBs (or large object) are columns of type VARCHAR(MAX), NVARCHAR(MAX)m VARBINARY(MAX), TEXT, NTEXT, IMAGE, XML, etc.
What does DBCC DROPCLEANBUFFERS do?
DBCC DROPCLEANBUFFERS clears data from the cache. SQL Server caches data besides query and procedure plans. In order to show the IO statistics, it is good to have no data in the cache.
SQL Server caches data in order to speed up queries; because a piece of data is cached, the next time it is needed, SQL Server can retrieve it from memory and not from disk, and thus execute a query that needs data much faster.
What is SET STATISTICS TIME?
SET STATISTICS TIME ON;
< query >
Statistics time returns CPU time and total (elapsed) time needed for an operation. In addition, you can see the actual execution time and the time needed for pre-execution phases, including parsing, binding, optimization, and compilation.
How can you get the most exhaustive information about how a query is executed?
You can get the most exhaustive information about how a query is executing by analyzing its execution plan. SQL Server exposes estimated and actual plans. If you display an estimated plan only, the query is not executed.
Do actual and estimated plans usually differ?
Actual and estimated plans usually don’t differ; however, in some cases, an estimated plan cannot give you exact information like an actual plan would. For example, if you create and query a temporary table in the same batch, SQL Server cannot optimize data access to the temporary table because it is not created at the time of optimization yet. Similarly, SQL Server postpones optimization of dynamic SQL because it not clear what to execute and how to optimize dynamic SQL until execution.
However, for large queries that execute for a long time, estimated plans can be very useful. You probably don’t want to execute a query that reads a billion rows on a production system just to get the actual execution plan.
In what formats are the actual and estimated plans available?
Text, XML, or graphically. SQL Server returns the plan in XML format natively. SSMS presents this XML in graphical format. Text representations of plans are deprecated and will be removed in future versions.
You can turn on and off text plans with the following T-SQL SET commands:
- SET SHOWPLAN_TEXT and SET SHOWPLAN_ALL for estimated plans.
- SET STATISTICS PROFILE for actual plans
You can turn on and off XML plans with the following commands:
- SET SHOWPLAN_XML for estimated plans
- SET STATISTICS XML for actual plans.
How do you read a graphical execution plan?
You read the graphical execution plan from right to left, from top to bottom.
The arrows show you the data flow from one physical operator to another. The thickness of an arrow corresponds to the relative number of rows passed from an operator to an operator.
By pausing on an operator or over an arrow, you get much more detailed information.
From the details, you can see both the logical and physical operators that are used.
You can also see the estimated number of rows and the actual number of rows. This way you can quickly identify errors in cardinality estimations.
You can see the estimated operator cost and the estimated cost of the complete subtree up to this point.
You can get the total estimated cost for the query if you pause on the last operator in the plan and read the estimated subtree cost.
You can also save the plan for later analysis with the Save Execution Plan As option.
What are dynamic management objects (DMO’s)?
SQL Server constantly monitors itself and gathers information useful for monitoring the health of an instance, finds problems such as missing indexes, and optimizes queries. SQL Server exposes this information through dynamic management objects (DMO’s). These objects include dynamic management views and dynamic management functions.
All DMO’s are in the sys system schema; DMO names start with the string dm_.
Some of the information from DMO’s shows the current state of an instance, whereas other information is cumulative from the start of an instance. In other words, DMO’s are not materialized in any database; DMO’s are virtual objects that give you access to the data SQL Server collects in memory.
What are the drawbacks to using DMO’s?
The most important issue you should take care of is when the last restart of the instance you are inspecting occurred. Cumulative information is useless if the instance was restarted recently.
What are the most useful DMO groups for analyzing query performance?
(1) SQL Server Operating System-related (SQLOS) DMOs - The SQLOS manages operating system resources that are specific to SQL Server.
(2) Execution-related DMOs - These DMOs provide you with insight into queries that have been executed, including their query text, execution plan, number of executions, and more.
(3) Index-related DMOs - These DMOs provide useful information about index usage and missing indexes.
What does the sys.dm_os_sys_info DMO provide?
Basic information about your instance of SQL Server including information about the number of logical CPUs, physical CPUs, physical memory, and the time at which SQL Server was started.
The time at which SQL Server was started tells you whether it makes sense to analyze cumulative information or not.
This is a SQLOS-related DMO.
What does the sys.dm_os_waiting_tasks DMO provide?
It gives you information about sessions that are currently waiting on something. For example, the sessions could be blocked by another session because of locking.
You can join this DMO to the execution-related sys.dm_exec_sessions DMO to get information about the user, host, and application that are waiting. You can also use the is_user_process flag from the sys.dm_exec_sessions DMO to filter out system sessions.
This is a SQLOS-related DMO.
What does the sys.dm_exec_requests execution-related DMO provide?
It returns information about currently executing requests. It includes a column called sql_handle which is a hash map of the T-SQL batch text that is executed. You can use this handle to retrieve the complete text of the batch with the help of the execution-related sys.dm_exec_sql_text dynamic management function that accepts this handle as a parameter.
This is an execution related DMO.
What does the sys.dm_exec_query_stats DMO provide?
It returns a lot of information about executed queries such as disk IO per query, CPU consumption per query, elapsed time per query, etc.
With the help of sys.dm_exec_sql_text DMO, you can retrieve the text of the query as well.
You can also use the statement_start_offset and statement_end_offset columns from the sys.dm_exec_query_stats DMO.
This is an execution related DMO.
What DMOs can help you find missing or unused indexes?
You can find missing indexes with the help of the sys.dm_db_missing_index_details, sys.dm_db_missing_index_columns, sys.dm_db_missing_index_groups,
Note that having too many indexes is bad as well; although queries do no use them, SQL Server has to maintain them.
With the help of sys.indexes catalog view and sys.dm_db_index_usage_stats dynamic management view, you can find indexes that are not being used.
Which DMO gives you detailed text of queries executed?
You can retrieve the text of batches and queries executed from the sys.dm_exec_sql_text DMO.