Query Optimization 02 – Questions and Answers per Month [StackOverflow]

Hi Friends, this is my 2nd blog post of series Query Optimization. In 1st blog post What is my accepted answer percentage rate, we took optimization decision based on Logical Reads as our key performance parameter.

In this post as well, we are going to optimize another query from data.stackexchange.com. Below is the query to optimize-

Since the query seems quite simple so let’s proceed to analyze IO and Time stats results using statisticsparser.com obtained after execution of the query.
Stats Analysis - Query 01Actual Execution Plan for query (in SSMS) is-
Atual Execution Plan - Query 01In SQL Sentry Plan Explorer-
Actual Execution Plan in Plan Explorer - Query 01
Analyzing the actual execution plan, we can see that there is a Stream Aggregate operator. A Stream Aggregate operator groups rows by one or more columns and then calculates one or more aggregate expressions returned by the query. The Stream Aggregation is very fast because it requires an input that has already been ordered by the columns specified in the GROUP statement. If the aggregated data is not ordered, the Query Optimizer can firstly use a Sort operator to pre-sort the data, or it can use pre-sorted data from an index seek or a scan.

Similarly, in our case as well, there is a Sort Operator appearing just before Stream Aggregate Operator. But, why the Sort Operator has cost of 22.7% and a yellow coloured exclamation mark <!> signaling some warning. Checking the Warning under Sort Operator property, we can conclude that due to the huge data size, SQL Server decided to use [TempDB] to perform Sorting of <29,499,660> rows (approx. 647 MB) required for Stream Aggregate Operator. This is called TempDB Spill.sort-spill-01

So, let’s modify the query to filter out huge count of rows resolving this problem of tempdb spill. Consider below query-

Stats for query-
Stats for Query 02

Execution plan in SSMS-
Actual Execution Plan for Query 02
Execution plan in SQL Sentry Plan Explorer-
Actual Execution Plan for Query 02
Looking at the actual plan of new query, we can see the relative cost of Sort Operator has reduced to 0.0% from 22.7%. This occurred due to small input size (6 KB) to sort compared to 647 MB earlier. This time SQL Server has introduced another operator called Hash Match performing partial aggregation on data obtained from Clustered Index Scan and resulting a reduced size of data set with combination of group columns.
Due to nice our Estimated Subtree Cost of query has gone down to 5836.89 from 7772.4. Also, SQL Server has started suggesting missing non-clustered index on 2nd query due to huge residual IO caused by Clustered Index Scan.
As last step, let’s create missing index suggested by execution plan-

Now, after non-clustered index creation, query has executed within 7 seconds. Below are the stats and execution plan.
Stats for query 02 after adding Non-Clustered Index
Actual Execution Plan for query 02 after adding Non-Clustered Index
So after our final optimization, our query is performing 80k logical reads instead of 7734k reads earlier, and is executing within 7 seconds.

Conclusion: SQL Server is smart enough to suggest changes in terms of warnings. We just need to interpret and analyze those warning and take appropriate step in order to optimize our query. Happy Coding 🙂

Please feel free to correct me by comments. Mention your approach if it differs from mine. If you like the article, do Like & Share.

Leave a Reply