Query Optimization 03 – Users statistics (individual user) [StackOverflow]
|Hi Friends, this is my 3rd blog post of series Query Optimization. In last blog post Questions and Answers per Month, we took optimization decision by analyzing warnings suggested by Execution Plan operators.
In this post again, we are going to optimize another query from data.stackexchange.com. Below is the query to optimize-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
-- Users statistics (individual user) USE [StackOverflow] GO ;with Edits as ( select Users.Id as Id, count(Posts.Id) as Edits from Posts inner join Users on Users.Id = Posts.LastEditorUserId and Posts.OwnerUserId != Users.Id group by Users.Id ), Up as ( select Users.Id as Id, count(Votes.Id) as Votes from Users inner join Posts on Users.Id = Posts.OwnerUserId inner join Votes on Posts.Id = Votes.PostId where Votes.VoteTypeId = 2 group by Users.Id ), Down as ( select Users.Id as Id, count(Votes.Id) as Votes from Users inner join Posts on Users.Id = Posts.OwnerUserId inner join Votes on Posts.Id = Votes.PostId where Votes.VoteTypeId = 3 group by Users.Id ), Accept as ( select Users.Id as Id, count(Votes.Id) as Votes from Users inner join Posts on Users.Id = Posts.OwnerUserId inner join Votes on Posts.Id = Votes.PostId where Votes.VoteTypeId = 1 group by Users.Id ), Answers as ( select Users.Id as Id, count(Posts.Id) as [Amount] from Users, Posts where Users.Id = Posts.OwnerUserId and Posts.PostTypeId = 2 group by Users.Id ) select Users.Id as [User Link], Reputation as [Reputation], Up.Votes + Down.Votes as [Votes received], round(Down.Votes * 100.0 / (Up.Votes+Down.Votes), 1) as [% Downvoted], Edits as [Cleanup edits], Answers.Amount as [Answers], round(Accept.Votes * 100.0 / Answers.Amount, 1) as [% Accepted] from Users, Up, Down, Edits, Accept, Answers where Users.Id = Up.Id and Users.Id = Down.Id and Users.Id = Edits.Id and Users.Id = Accept.Id and Users.Id = Answers.Id and Users.Id = 26837 order by Users.Reputation desc; |
Query execution took 12 minutes to execute with DOP = 3 (as limited by Instance DOP settings) and max server memory = 4 GB. Below are the stats from Statistics Parser:-
Below is the Actual Execution Plan for query:-
We can see that are 3 tables involved in query, but we only see Clustered Indexes appearing in Execution Plan. This query is a perfect scenario suggesting for Foreign key relationships along with non-clustered indexes on tables. Oh yes, SQL Server does not create index on foreign keys as this is most common myth. A foreign key is a constraint, a relationship between two tables – that has nothing to do with an index. But many times it makes a lot of sense to index all the columns that are part of any foreign key relationship, because through a FK-relationship, one will often need to lookup a relating table and extract certain rows based on a single value or a range of values. So it makes good sense to index any columns involved in a FK, but a FK is not an index in itself. Check this Stackoverflow question. Also, check out Kimberly Tripp’s excellent article “When did SQL Server stop putting indexes on Foreign Key columns?”.
In my case, I downloaded [StackOverflow] database from Brent Ozar’s Torrent file. The downloaded copy has only Clustered Indexes on tables. So at first, let’s create Foreign keys on involved tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
-- Create Foreign Key constraints on Posts table -- LastEditorUserId ALTER TABLE dbo.Posts ADD CONSTRAINT fk_Posts_LastEditorUserId FOREIGN KEY ( LastEditorUserId ) references dbo.Users(Id) GO -- OwnerUserId ALTER TABLE dbo.Posts ADD CONSTRAINT fk_Posts_OwnerUserId FOREIGN KEY ( OwnerUserId ) references dbo.Users(Id) GO -- Create Foreign Key constraints on Votes table -- PostId ALTER TABLE dbo.Votes ADD CONSTRAINT fk_Votes_PostId FOREIGN KEY ( PostId ) references dbo.Posts(Id) GO |
Below code can be used to check the foreign key references of dbo.Users and dbo.Posts tables.
After adding foreign keys, let’s add appropriate indexes supporting these FK columns.
1 2 3 4 5 6 7 8 9 10 11 |
-- Index column [LastEditorUserId] CREATE NONCLUSTERED INDEX NCI_LastEditorUserId_Includes ON dbo.Posts (LastEditorUserId) INCLUDE (PostTypeId) -- Index column [LastEditorUserId] CREATE NONCLUSTERED INDEX NCI_OwnerUserId_Includes ON dbo.Posts (OwnerUserId) INCLUDE (PostTypeId) -- Index column [PostId] CREATE NONCLUSTERED INDEX NCI_PostId_Includes ON dbo.Votes (PostId) INCLUDE (VoteTypeId) |
That’s it. After creating foreign keys, and putting appropriate indexes on them, our query executed within 100 milli seconds. Below are the stats and new Actual Execution Plan for the same query.
Conclusion: A good database design in terms of foreign keys for implementing relationship, putting indexes on those keys may add significant performance improvement in the system. Happy Coding 🙂
Please feel free to correct me by comments. Mention your approach if it differs from mine. If you like the article, do Like & Share.
Contributor: personalloandial.com