Performance Tuning With Predicate Logic
|I have often seen Developers write predicate logic without bothering about performance. That works most of the time. But what about queries that return huge data sets? Is there any general rule of predicate logic?
Let’s start with simple definition of Predicate Logic. A predicate is a property or an expression that either holds or doesn’t hold—in other words, is either true or false. In general term, we should define our database in such a way that every condition on WHERE clause of query should evaluate to either TRUE or FALSE. So the question is how can we do that? We can do that by solid database design with no NULLABLE columns in it. But is it possible to always have this NOT NULL design for every column of every table in database. Ideally no.
So skipping the theory for now, let’s start with dbo.Users table of [StackOverflow] database. This table has 4491134 records with NULL value in Location column. Below is the query and screenshot.
1 2 3 4 |
SELECT COUNT(*) AS Total_Counts FROM dbo.Users -- 5277831 SELECT COUNT(*) AS Total_Counts FROM dbo.Users u WHERE u.Location IS NULL -- 4491134 |
Now let’s execute below 2 queries.
1 2 3 4 5 6 7 8 9 10 |
-- Query 01 - Direct Comparison of Location SET STATISTICS IO ON; SET STATISTICS TIME ON; DBCC FREEPROCCACHE DBCC DROPCLEANBUFFERS SELECT u.Id, u.DisplayName, u.Location FROM dbo.Users AS u WHERE u.Location = 'India' |
1 2 3 4 5 6 7 8 9 10 11 |
-- Query 02 - Check for NOT NULL first SET STATISTICS IO ON; SET STATISTICS TIME ON; DBCC FREEPROCCACHE DBCC DROPCLEANBUFFERS SELECT u.Id, u.DisplayName, u.Location FROM dbo.Users AS u WHERE u.Location IS NOT NULL AND u.Location = 'India' |
Now, let’s check stats for queries.
Stats for
Query 01 - Direct Comparison of Location
Stats for
Query 02 - Check for NOT NULL first
We can see that although the there is no difference in Logical & Physical Reads between the 2 queries, still the query with NOT NULL predicate is 2 seconds faster than the other one for even a small result set of 24503 rows. So we can conclude that if a column is nullable, and we place the predicate that <column_name> IS NOT NULL, we can expect a little performance benefit from it.
Now, let’s analyze another scenario. Below are the 2 queries.
1 2 3 4 5 6 |
DECLARE @myNumber TINYINT = 0 -- Query 03: Zero value check at first SELECT * FROM (VALUES (12),(45),(NULL)) AS MyNumbers(mynumber) WHERE @myNumber = 0 OR MyNumbers.mynumber/@myNumber > 1; |
1 2 3 4 5 6 |
DECLARE @myNumber TINYINT = 0 -- Query 04: Zero value check at last SELECT * FROM (VALUES (12),(45),(NULL)) AS MyNumbers(mynumber) WHERE MyNumbers.mynumber/@myNumber > 1 OR @myNumber = 0; |
Query 03 executed successfully returning 3 rows, whereas Query 04 failed with Divide by zero error.
Output of Query 03: Zero value check at firstOutput of Query 04: Zero value check at last
We see that for Query 03 & Query 04, even though the queries have same predicates in WHERE Clause, but due to different order of predicates, SQL Server is executing the 2 queries differently.
Conclusion: SQL Server is smart enough to analyze your query, and re-arrange the predicate order in WHERE Clause in most cases. That is the reason our queries work efficiently most of the time. But it does not re-order the predicates that would produce the different result. So it is better for us as developers to not assume anything and decide the predicate order ourselves. Happy Coding.
Please feel free to correct me by comments. If you like the article, do Like & Share.
Member: cash advance