shyaway

Slow in the Application, Fast in SSMS? Part 1 본문

SQLServer

Slow in the Application, Fast in SSMS? Part 1

shyaway 2018. 7. 19. 12:44


Slow in the Application, Fast in SSMS? Part 1


Presumtions

For the examples in this article, I use the Northwind sample database. This database shipped with SQL 2000. For later versions of SQL Server you can download it from Microsoft's web site.The essence of this article applies to all versions of SQL Server, but the focus is on SQL 2005 and later versions. The article includes several queries to inspect the plan cache; these queries run only on SQL 2005 and later. SQL 2000 and earlier versions had far less instrumentation in this regard. Beware that to run these queries you need to have the server-level permission VIEW SERVER STATE.


This is not a beginner-level article, but I assume that the reader has a working experience of SQL programming. You don't need to have any prior experience of performance tuning, but it certainly helps if you have looked a little at query plans and if you have some basic knowledge of indexes. I will not explain the basics in depth, as my focus is a little beyond that point. This article will not teach you everything about performance tuning, but at least it will be a start.


Caveat: In some places, I give links to the online version of Books Online. Beware that the URL may lead to Books Online for a different version of SQL Server than you are using. On the topic pages, there is a link Other versions, so that you easily can go to the page that matches the version of SQL Server you are using. (Or at least that was how Books Online on MSDN was organised when I wrote this article.)



How SQL Server Compiles a Stored Procedure

In this chapter we will look at how SQL Server compiles a stored procedure and uses the plan cache. If your application does not use stored procedures, but submits SQL statements directly, most of what I say this chapter is still applicable. But there are further complications with dynamic SQL, and since the facts about stored procedures are confusing enough I have deferred the discussion on dynamic SQL to a separate chapter.


What is a Stored Procedure?

Stored procedures.That may seem like a silly question, but the question I am getting at is What objects have query plans on their own? SQL Server builds query plans for these types of objects:

    • Scalar user-defined functions.
    • Multi-step table-valued functions.
    • Triggers.

With a more general and stringent terminology I should talk about modules, but since stored procedures is by far the most widely used type of module, I prefer to talk about stored procedures to keep it simple.

For other types of objects than the four listed above, SQL Server does not build query plans. Specifically, SQL Server does not create query plans for views and inline-table functions. Queries like:

SELECT abc, def FROM myview
SELECT a, b, c FROM mytablefunc(9)

are no different from ad-hoc queries that access the tables directly. When compiling the query, SQL Server expands the view/function into the query, and the optimizer works with the expanded query text.

There is one more thing we need to understand about what constitutes a stored procedure. Say that you have two procedures, where the outer calls the inner one:

CREATE PROCECURE Outer_sp AS
...
EXEC Inner_sp
...

I would guess most people think of Inner_sp as being independent from Outer_sp, and indeed it is. The execution plan for Outer_sp does not include the query plan for Inner_sp, only the invocation of it. However, there is a very similar situation where I've noticed that posters on SQL forums often have a different mental image, to wit dynamic SQL:

CREATE PROCEDURE Some_sp AS
DECLARE @sql    nvarchar(MAX),
        @params nvarchar(MAX)
SELECT @sql = 'SELECT ...'
...
EXEC sp_executesql @sql, @params, @par1, ...

It is important to understand that this is no different from nested stored procedures. The generated SQL string is not part of Some_sp, nor does it appear anywhere in the query plan for Some_sp, but it has a query plan and a cache entry of its own. This applies, no matter if the dynamic SQL is executed through EXEC() or sp_executesql.



How SQL Server Generates the Query Plan

When you enter a stored procedure with CREATE PROCEDURE (or CREATE FUNCTION for a function or CREATE TRIGGER for a trigger), SQL Server verifies that the code is syntactically correct, and also checks that you do not refer to non-existing columns. (But if you refer to non-existing tables, it lets get you away with it, due to a misfeature known as deferred named resolution.) However, at this point SQL Server does not build any query plan, but merely stores the query text in the database.Overview


It is not until a user executes the procedure, that SQL Server creates the plan. For each query, SQL Server looks at the distribution statistics it has collected about the data in the tables in the query. From this, it makes an estimate of what may be best way to execute the query. This phase is known as optimisation. While the procedure is compiled in one go, each query is optimised on its own, and there is no attempt to analyse the flow of execution. This has a very important ramification: the optimizer has no idea about the run-time values of variables. However, it does know what values the user specified for the parameters to the procedure.


Parameters and Variables

Consider the Orders table in the Northwind database, and these three procedures:

CREATE PROCEDURE List_orders_1 AS
   SELECT * FROM Orders WHERE OrderDate > '20000101'
go
CREATE PROCEDURE List_orders_2 @fromdate datetime AS
   SELECT * FROM Orders WHERE OrderDate > @fromdate
go
CREATE PROCEDURE List_orders_3 @fromdate datetime AS
   DECLARE @fromdate_copy datetime
   SELECT @fromdate_copy = @fromdate
   SELECT * FROM Orders WHERE OrderDate > @fromdate_copy
go

Note: Using SELECT * in production code is bad practice. I use it in this article to keep the examples concise.


Then we execute the procedures in this way:

EXEC List_orders_1
EXEC List_orders_2 '20000101'
EXEC List_orders_3 '20000101'

Before you run the procedures, enable Include Actual Execution Plan under the Query menu. (There is also a toolbar button and Ctrl-M is the normal keyboard shortcut.) If you look at the query plans for the procedures, you will see the first two procedures have identical plans:


Index seek + Key lookup


That is, SQL Server seeks the index on OrderDate, and uses a key lookup to get the other data. The plan for the third execution is different:


Clustered Index Scan


In this case, SQL Server scans the table. (Keep in mind that in a clustered index the leaf pages contain the data, so a clustered index scan and a table scan is essentially the same the thing.) Why this difference? To understand why the optimizer makes certain decisions, it is always a good idea to look at what estimates it is working with. If you hover with the mouse over the two Seek operators and the Scan operator, you will see the pop-ups similar to those below.


Popup list_orders_1   Popup list_orders_2
List_orders_1   List_orders_2
Popup list_orders_3
List_orders_3

Note: the exact appearance depends on which version of SSMS you are using. The samples above were captured with SSMS 2008 running against SQL 2008. If you use a later version of SSMS, you may see the items in a somewhat different order, and there are also more items listed. None of those extra items are relevant for the example, which is why I have preferred to keep my original screen shots.


The interesting element is Estimated Number of Rows. For the first two procedures, SQL Server estimates that one row will be returned, but for List_orders_3, the estimate is 249 rows. This difference in estimates explains the different choice of plans. Index Seek + Key Lookup is a good strategy to return a smaller amount of rows from a table. But if more rows match the seek criteria, the cost increases, and there is a increased likelihood that SQL Server will need to access the same data page more than once. In the extreme case where all rows are returned, a table scan is much more efficient than seek and lookup. With a scan, SQL Server has to read every data page exactly once, whereas with seek + key lookup, every page will be visited once for each row on the page. The Orders table in Northwind has 830 rows, and when SQL Server estimates that as many as 249 rows will be returned, it (rightly) concludes that the scan is the best choice.


Where Do These Estimates Come From?

Now we know why the optimizer arrives at different execution plans: because the estimates are different. But that only leads to the next question: why are the estimates different? That is the key topic of this article.

In the first procedure, the date is a constant, which means that the SQL Server only needs to consider exactly this case. It interrogates the statistics for the Orders table, which indicates that there are no rows with an OrderDate in the third millennium. (All orders in the Northwind database are from 1996 to 1998.) Since statistics are statistics, SQL Server cannot be sure that the query will return no rows at all, why it makes an estimate of one single row.


In the case of List_orders_2, the query is against a variable, or more precisely a parameter. When performing the optimisation, SQL Server knows that the procedure was invoked with the value 2000-01-01. Since it does not any perform flow analysis, it can't say for sure whether the parameter will have this value when the query is executed. Nevertheless, it uses the input value to come up with an estimate, which is the same as for List_orders_1: one single row. This strategy of looking at the values of the input parameters when optimising a stored procedure is known as parameter sniffing.


In the last procedure, it's all different. The input value is copied to a local variable, but when SQL Server builds the plan, it has no understanding of this and says to itself I don't know what the value of this variable will be. Because of this, it applies a standard assumption, which for an inequality operation such as > is a 30 % hit-rate. 30 % of 830 is indeed 249.


Here is a variation of the theme:

CREATE PROCEDURE List_orders_4 @fromdate datetime = NULL AS
   IF @fromdate IS NULL
      SELECT @fromdate = '19900101'
   SELECT * FROM Orders WHERE OrderDate > @fromdate

In this procedure, the parameter is optional, and if the user does not fill in the parameter, all orders are listed. Say that the user invokes the procedure as:

EXEC List_orders_4

The execution plan is identical to the plan for List_orders_1 and List_orders_2. That is, Index Seek + Key Lookup, despite that all orders are returned. If you look at the pop-up for the Index Seek operator, you will see that it is identical to the pop-up for List_orders_2 but in one regard, the actual number of rows. When compiling the procedure, SQL Server does not know that the value of @fromdate changes, but compiles the procedure under the assumption that @fromdate has the value NULL. Since all comparisons 

with NULL yield UNKNOWN, the query cannot return any rows at all, if @fromdate still has this value at run-time. 


If SQL Server would take the input value as the final truth, it could construct a plan with only a Constant Scan that does not access the table at all (run the query SELECT * FROM Orders WHERE OrderDate > NULL to see an example of this). But SQL Server must generate a plan which returns the correct result no matter what value @fromdate has at run-time. On the other hand, there is no obligation to build a plan which is the best for all values. Thus, since the assumption is that no rows will be returned, SQL Server settles for the Index Seek. (The estimate is still that one row will be returned. This is because SQL Server never uses an estimate of 0 rows.)


This is an example of when parameter sniffing backfires, and in this particular case it may be better to write the procedure in this way:

CREATE PROCEDURE List_orders_5 @fromdate datetime = NULL AS
   DECLARE @fromdate_copy datetime
   SELECT @fromdate_copy  = coalesce(@fromdate, '19900101')
   SELECT * FROM Orders WHERE OrderDate > @fromdate_copy

With List_orders_5 you always get a Clustered Index Scan.


Key Points

In this section, we have learned three very important things:

    • A constant is a constant, and when a query includes a constant, SQL Server can use the value of the constant with full trust, and even take such shortcuts to not access a table at all, if it can infer from constraints that no rows will be returned.
    • For a parameter, SQL Server does not know the run-time value, but it "sniffs" the input value when compiling the query.
    • For a local variable, SQL Server has no idea at all of the run-time value, and applies standard assumptions. (Which the assumptions are depends on the operator and what can be deduced from the presence of unique indexes.)

And there is a corollary of this: if you take out a query from a stored procedure and replace variables and parameters with constants, you now have quite a different query. More about this later.


This article belongs to and owned by http://www.sommarskog.se/query-plan-mysteries.html






























'SQLServer' 카테고리의 다른 글

Slow in the Application, Fast in SSMS? Part 2  (0) 2018.07.25
SQL SERVER ERROR 18456  (0) 2017.11.24
Comments