Query Optimization 03 – Users statistics (individual user) [StackOverflow]

Spread the love

Hi Friends, this is my 3rd blog post of series Query Optimization. In last blog post Questions and Answers per Month, we took optimization decision by analyzing warnings suggested by Execution Plan operators.

In this post again, we are going to optimize another query from data.stackexchange.com. Below is the query to optimize-

--	Users statistics (individual user)
USE [StackOverflow]
GO

;with
Edits as (
select
   Users.Id as Id,
   count(Posts.Id) as Edits
from
   Posts inner join
   Users on Users.Id = Posts.LastEditorUserId and Posts.OwnerUserId != Users.Id
group by Users.Id
),
Up as (
select
   Users.Id as Id,
   count(Votes.Id) as Votes
from
   Users inner join
   Posts on Users.Id = Posts.OwnerUserId
         inner join
   Votes on Posts.Id = Votes.PostId
where Votes.VoteTypeId = 2
group by Users.Id
),
Down as (
select
   Users.Id as Id,
   count(Votes.Id) as Votes
from
   Users inner join
   Posts on Users.Id = Posts.OwnerUserId
         inner join
   Votes on Posts.Id = Votes.PostId
where Votes.VoteTypeId = 3
group by Users.Id
),
Accept as (
select
   Users.Id as Id,
   count(Votes.Id) as Votes
from
   Users inner join
   Posts on Users.Id = Posts.OwnerUserId
         inner join
   Votes on Posts.Id = Votes.PostId
where Votes.VoteTypeId = 1
group by Users.Id
),
Answers as (
select
  Users.Id as Id,
  count(Posts.Id) as [Amount]
from
  Users, Posts
where Users.Id = Posts.OwnerUserId and Posts.PostTypeId = 2
group by Users.Id
)

select
   Users.Id as [User Link],
   Reputation as [Reputation],
   Up.Votes + Down.Votes as [Votes received],
   round(Down.Votes * 100.0 / (Up.Votes+Down.Votes), 1) as [% Downvoted],
   Edits as [Cleanup edits],
   Answers.Amount as [Answers],
   round(Accept.Votes * 100.0 / Answers.Amount, 1) as [% Accepted]
from Users, Up, Down, Edits, Accept, Answers
where Users.Id = Up.Id and Users.Id = Down.Id and Users.Id = Edits.Id
  and Users.Id = Accept.Id and Users.Id = Answers.Id
  and Users.Id = 26837
order by Users.Reputation desc;

Query execution took 12 minutes to execute with DOP = 3 (as limited by Instance DOP settings) and max server memory = 4 GB. Below are the stats from Statistics Parser:-
stats-01
Below is the Actual Execution Plan for query:-
Execution Plan for Users statistics (individual user)
We can see that are 3 tables involved in query, but we only see Clustered Indexes appearing in Execution Plan. This query is a perfect scenario suggesting for Foreign key relationships along with non-clustered indexes on tables. Oh yes, SQL Server does not create index on foreign keys as this is most common myth. A foreign key is a constraint, a relationship between two tables – that has nothing to do with an index. But many times it makes a lot of sense to index all the columns that are part of any foreign key relationship, because through a FK-relationship, one will often need to lookup a relating table and extract certain rows based on a single value or a range of values. So it makes good sense to index any columns involved in a FK, but a FK is not an index in itself. Check this Stackoverflow question. Also, check out Kimberly Tripp’s excellent article “When did SQL Server stop putting indexes on Foreign Key columns?”.

In my case, I downloaded [StackOverflow] database from Brent Ozar’s Torrent file. The downloaded copy has only Clustered Indexes on tables. So at first, let’s create Foreign keys on involved tables.

--	Create Foreign Key constraints on Posts table
	--	LastEditorUserId
ALTER TABLE dbo.Posts
	ADD CONSTRAINT fk_Posts_LastEditorUserId FOREIGN KEY ( LastEditorUserId ) references dbo.Users(Id)
GO
	--	OwnerUserId
ALTER TABLE dbo.Posts
	ADD CONSTRAINT fk_Posts_OwnerUserId FOREIGN KEY ( OwnerUserId ) references dbo.Users(Id)
GO

--	Create Foreign Key constraints on Votes table
	--	PostId
ALTER TABLE dbo.Votes
	ADD CONSTRAINT fk_Votes_PostId FOREIGN KEY ( PostId ) references dbo.Posts(Id)
GO

Below code can be used to check the foreign key references of dbo.Users and dbo.Posts tables.
foreign-keys using sp_fkeys

After adding foreign keys, let’s add appropriate indexes supporting these FK columns.

--	Index column [LastEditorUserId]
CREATE NONCLUSTERED INDEX NCI_LastEditorUserId_Includes
	ON dbo.Posts (LastEditorUserId) INCLUDE (PostTypeId)

--	Index column [LastEditorUserId]
CREATE NONCLUSTERED INDEX NCI_OwnerUserId_Includes
	ON dbo.Posts (OwnerUserId) INCLUDE (PostTypeId)

--	Index column [PostId]
CREATE NONCLUSTERED INDEX NCI_PostId_Includes
	ON dbo.Votes (PostId) INCLUDE (VoteTypeId)

Find Indexes of tables using sp_helpindex

That’s it. After creating foreign keys, and putting appropriate indexes on them, our query executed within 100 milli seconds. Below are the stats and new Actual Execution Plan for the same query.

Statistics for Query After Adding Foreign Keys & Indexes

New Execution Plan for Query

Conclusion: A good database design in terms of foreign keys for implementing relationship, putting indexes on those keys may add significant performance improvement in the system. Happy Coding 🙂

Please feel free to correct me by comments. Mention your approach if it differs from mine. If you like the article, do Like & Share.

Contributor: personalloandial.com

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.