SQL Server Query Optimization/Tuning with StackOverflow

Spread the love

I am very often asked on how to start query tuning which gave me an idea to start a Query Tuning blog series. I shall be taking random and slow performing queries from data.stackexchange.com.

This is my first blog post from series Query Optimization. Today we will try to optimize What is my accepted answer percentage rate.

Details about my SQL Server environment is as below:-

Version:	SQL Server 2014 Developer Edition 12.0.5000 (x64)
Processor:	Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
OS:	Windows 7 Enterprise Service Pack 1
OS Memory:	8.00 GB
Hard Disk:	250 GB SSD

I have configured 4096 MB for SQL Server Max Memory with 4 TempDB data files each with 200 MB intial file size and 100 MB file growth.

So below is the query that we need to optimize.

-- On avg how often are answers I give, accepted

SET NOCOUNT ON;

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

DECLARE @UserId int = 9 --	Answer = 42.8571428571429

SELECT 
    (CAST(Count(a.Id) AS float) / (SELECT Count(*) FROM Posts WHERE OwnerUserId = @UserId AND PostTypeId = 2) * 100) AS AcceptedPercentage
FROM
    Posts q
  INNER JOIN
    Posts a ON q.AcceptedAnswerId = a.Id
WHERE
    a.OwnerUserId = @UserId
  AND
    a.PostTypeId = 2
GO

-- On avg how often are answers I give, accepted

SET NOCOUNT ON;

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

DECLARE @UserId int = 9 -- Answer = 42.8571428571429

SELECT

(CAST(Count(a.Id) AS float) / (SELECT Count(*) FROM Posts WHERE OwnerUserId = @UserId AND PostTypeId = 2) * 100) AS AcceptedPercentage

FROM

Posts q

INNER JOIN

Posts a ON q.AcceptedAnswerId = a.Id

WHERE

a.OwnerUserId = @UserId

AND

a.PostTypeId = 2

I personally prefer to analyze Logical Reads as key performance parameter for Query Tuning. Below is the screenshot of statistics for above query from statisticsparser.com:-
Also, Below is the execution plan:-
Digging further on top right most Clustered Index Scan Operator, and checking the value for Number of Rows Read & Actual Number of Rows, we can see that SQL Server is suggesting of Residual IO operation. In order to rectify the problem, we may need to remove this Residual IO operation with Index Seek operator by considering residual predicate.

So lets first modify it to make it easier to read and analyze.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

DECLARE @UserId int = 9

;WITH T_Posts AS
(
	-- Get answers of @UserId
	SELECT p.Id FROM dbo.Posts p WHERE p.OwnerUserId = @UserId AND p.PostTypeId = 2 
)
,T_Posts_a AS 
( 
	--Filter answer of @UserId accepted by others
	SELECT COUNT(*) AS Accepted_Counts, (SELECT Count(*) FROM T_Posts) AS Total_Answer_Counts 
	FROM T_Posts p 
	WHERE p.Id IN (SELECT q.AcceptedAnswerId FROM Posts q)
)
SELECT (CAST(Accepted_Counts AS float) / Total_Answer_Counts * 100) AS AcceptedPercentage 
FROM T_Posts_a
GO

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

DECLARE @UserId int = 9

;WITH T_Posts AS

(

-- Get answers of @UserId

SELECT p.Id FROM dbo.Posts p WHERE p.OwnerUserId = @UserId AND p.PostTypeId = 2

)

,T_Posts_a AS

(

--Filter answer of @UserId accepted by others

SELECT COUNT(*) AS Accepted_Counts, (SELECT Count(*) FROM T_Posts) AS Total_Answer_Counts

FROM T_Posts p

WHERE p.Id IN (SELECT q.AcceptedAnswerId FROM Posts q)

)

SELECT (CAST(Accepted_Counts AS float) / Total_Answer_Counts * 100) AS AcceptedPercentage

FROM T_Posts_a

Our table [dbo].[Posts] has only Primary Clustered Index on [Id] field. 4-5 indexes comprising 2-3 key columns should not be issue in most cases. So let’s create a non-clustered index on Predicate Fields of CTE [T_Posts].

--	Create Non-Clustered Index to remove Residual IO by Index Seek
USE [StackOverflow]
GO
CREATE NONCLUSTERED INDEX NCI_OwnerUserId_PostTypeId_Includes
ON [dbo].[Posts] ([OwnerUserId],[PostTypeId]) INCLUDE (Id)
GO

-- Create Non-Clustered Index to remove Residual IO by Index Seek

USE [StackOverflow]

CREATE NONCLUSTERED INDEX NCI_OwnerUserId_PostTypeId_Includes

ON [dbo].[Posts] ([OwnerUserId],[PostTypeId]) INCLUDE (Id)

Below are the Execution Plan and stats screenshot after creating above Non-Clustered Index.

We can see that our key performance parameter Logical Reads has reduced to 7,734,109 from 23,201,186 thus executing within 2 minutes.
Now, to further optimize our query, we need to replace bottom half of Nested Loop in Execution Plan with another non-clustered Index to avoid Clustered Index scan for IN predicate logic of CTE [T_Posts_a].

USE [StackOverflow]
GO 
CREATE NONCLUSTERED INDEX NCI_AcceptedAnswerId
ON [dbo].[Posts] ([AcceptedAnswerId])
GO

USE [StackOverflow]

CREATE NONCLUSTERED INDEX NCI_AcceptedAnswerId

ON [dbo].[Posts] ([AcceptedAnswerId])

After executing, the query again, we have below Execution Plan and Stats:-

Finally our query has executed 68 milliseconds performing only 153 Logical Reads to achieve the desired result.

Conclusion: A little modification of query along with analyzing the Execution Plan can help us to decide proper indexing and querying changes. Happy Coding

Friends, please feel free to correct me by comments. Share your approach if it differs from mine. If you like the article, do Like & Share.

Query Optimization 01 – What is my accepted answer percentage rate [StackOverflow]

Like this:

Related

1 Comment

Leave a ReplyCancel reply

Share this:

Like this:

Related

1 Comment

Leave a ReplyCancel reply