Troubleshooting High CPU on SQL Server – Part 2

Spread the love

In the last blog post Live Troubleshooting High CPU On SQL Server – Part 1, we worked on a scenario where we saw a high CPU on SQL Server due to some external OS level task. In this blog, we are going to explore a scenario where a high CPU issue is present because of the workload running on SQL Server.

Just like in the last blog post scenario, when I have to troubleshoot a “slow” SQL Server, if my SQL Server is baselined with the SQLMonitor tool, then I first visit my Monitoring – Live – All Servers dashboard which displays all the metrics of specific SQL Servers that need DBA help.

All Server Dashboard from SQLMonitor  Tool

Here under the 3rd panel All Servers – Health Metrics – Require ATTENTION, I notice that there is high utilization for both OS CPU & SQL CPU. This makes it clear that there are some queries/workload running on server that are causing high CPU.

From here, I open Monitoring – Live – Distributed dashboard which helps me check a lot more health metrics for a SQL Server Instance. In this dashboard also, I try to ensure that server is entirely facing CPU issue, and not accompanied by any other indicator like High Memory Grants, or excessive workload than usual. In this scenario, I don’t notice anything apart from high CPU at OS & SQL level.

All Health Metrics of SQL Server using SQLMonitor

So it is clear that there are some queries/workloads running that are causing above high CPU. From here, since the issue is LIVE and happening at moment, I can execute my Live Health Check query that displays the workload information in a kind of observable manner.

Live SQL Server throubleshooting using SQL Server Health Check Query

On the above 4 core server, at this moment, we have about 9 active_request_counts. We also get to see what Logins/programs are contributing how many active requests at present. To me, from the above result, it seems that the resources are evenly distributed among various logins/programs. So rather, I would proceed to check further at query pattern.

From the below result of the Live Health Check query, it seems there is a function named dbo.IsTagBanned which is involved in most of the currently running requests/queries.

Similar information is revealed in the resultset “All-Sessions-SortedBy-cpu” of the Live Health Check query.

Live SQL Server Health Check Query

So in order to resolve the issue, appropriate action would depend on the consent of application teams. We can go for following solutions –

  1. Kill all the sessions having specific function/procedure/piece of code which seems to be the main cause of high CPU at the moment.
  2. Perform RCA for the time duration of high CPU, and analyze the workload of today compared to yesterday. This comparison is easily possible using XEvent infra of SQLMonitor tool which captures sql_batch_completed & rpc_completed xevents into tables dbo.resource_consumption
  3. In order to fully control the CPU issue caused by sessions of specific login/program/host, use advanced methods like LogOn Triggers or Resource Governor. We will talk about this in future blogs.
  4. In case, the server is not baselined with SQLMonitor, then utilize sp_BlitzCache to get top CPU consuming queries.

Additionally, open source tools like sp_Blitz would tell you if your server is dealing with general problems like Plan Cache in-stability, high adhoc workload, Stored Procedures WITH RECOMPILE hints etc. Similarly, interesting findings may show up in sp_BlitzFirst tool also if we run the same under the CPU issue time. For example, in this workload case, I see the following red flags on my server –

  • Forwarded Fetches/Sec High – which indicates there are some active heap tables with high forwarded records in them. Solutions could be to create a Clustered Index or temporarily rebuild the heap table.
  • Compilations/Sec High – Usually if the number of compilations is more than 20% of Batch Requests per Sec, then I prefer to check for any stored procedure which is generating Adhoc queries using dynamic SQL within it. This is something where tools like QueryStore or sp_HumanEvents may be helpful, but should be used with precautions.
  • Re-Compilations/Sec High – Similar to Compilations/Sec, I prefer to keep this number below 20% of Batch Requests per Sec. To troubleshoot this warning, I prefer to look for stored procedures, or statements having RECOMPILE hints in them. Also, change in statistics of the physical/temp table may also trigger Re-Compilations.
Analyze server for issues using sp_BlitzFirst

Now, once the high CPU issue is resolved using above mentioned steps, for RCA, I prefer to use XEvent infra of SQLMonitor, and then sp_BlitzCache.

On my server, CPU was almost 100% b/w 3:40 PM IST till 4:53 PM IST.

CPU Trend for SQL Server using SQLMonitor

So below are some different queries I use –

RCA Query 01 – Get top Login/Programs by CPU from XEvent infra of SQLMonitor

Top CPU consumer logins or applications using extended events of SQLMonitor on SQL Server

RCA Query 02 – Find CPU Consuming Queries using XEVent Infra of SQLMonitor

Find cpu consuming queries on SQL Server using SQLMonitor
Find cpu consuming queries on SQL Server using SQLMonitor

From above result, we can see, 3 stored procedure has very high execution counts with avg CPU time over 1 second. So these are the queries that are causing 100% CPU on my server.

RCA Query 03 – In case we don’t have SQL Server baselined with SQLMonitor tool, then the best option we have is sp_BlitzCache.

Get CPU Consumers using sp_BlitzCache

Below is the SQL Server wait stats for the duration of the issue that I generally share with the application team as part of my RCA. This we can easily get from the Wait Stats dashboard of SQLMonitor.

Analyzing Wait Stats on SQL Server duing CPU Issue Time using SQLMonitor

Most ImportantWhat causes High CPU consumption within Queries?

While troubleshooting the queries consuming CPU time, we want to look for the below red flags –

  • String processing (LIKE ‘%SomeString%’)
  • Row-based processing (cursors, functions)
  • Sorting/grouping/distinct in queries
  • Heaps with forwarded fetches
  • Implicit conversions

Finally, Thanks to Arvind Shyamsundar for sharing his TSQLTextNormalizer project code. We are using this TsqlTextNormalizer in SQLMonitor based RCA Query 02 shared above.

I hope this would be helpful to you DBAs out there.

Kindly like, subscribe & share if this alert helped you in any way.

2 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.