Storage performance is something that puzzles a lot of SQL Server professionals. So in this blog, I will cover the basic steps I perform to ensure I get the best performance from the underlying storage.
How to measure the throughput of your Disk at the time of Server Setup?
With modern SSDs, we want to keep Read/Write Latency under 3-5 milliseconds. For storage older than 5 years, I can live with 20 milliseconds of latency. Every component in Storage Subsystem including SQL Server, OS, VM Host, HBA, Network, or Storage has a tipping point after which the response time increases exponentially. To say in simple terms, if we increase Reads/Writes on storage, the latency might also increase. This correlation of throughput and latency (stall) is usually linear until it reaches the tipping point.
Usually, I look at all these metrics using my homegrown open-source SQLMonitor tool. On SQLMonitor, I have 3 dashboards that help me figure out if my server is having IO bottleneck.
On the above dashboard, I can see avg Logical Disk – Latency is well below 5 ms. If I happen to see huge latency, then I like to know if the no of operations on the disk drive has increased. This I see in the top panel Logical Disk (Disk Queue Length) under the same row. Ideally, we want to keep Avg Disk Queue Length under 1. If that’s not the case, then storage is receiving more Reads/Writes than it can handle.
Dashboard Wait Stats can also be used to derive similar conclusions by looking at Storage Related waits like PAGEIOLATCH_* , WRITELOG, IO_COMPLETION, ASYNC_IO_COMPLETION. Most important is to look for PAGEIOLATCH_* wait type’s Avg Time Per Wait. It can be considered Read/Write latency like we had in the above dashboard.
Dashboard t__Database File IO Stats exposes captured information of sys.dm_io_virtual_file_stats DMV in both tabular & Time Scale formats. This dashboard tells me all metrics like Latency, IOPS, Throughput, and total Stalls for each database file.
This dashboard is very helpful in figuring out what databases or files are doing maximum Reads/Writes, and in the case of disk latency observed, which ones make sense to be moved to their dedicated volumes. This dashboard is very useful in scenarios of SQL Server migration projects from OnPrem to Cloud where we need to figure out IOPS & IO Share among databases.
There are cases when I have to work on a server that is not baselined with SQLMonitor. In those cases, I rely on First-Responder-Kitsp_BlitzFirst. The result has pretty much required data that helps me conclude storage issues in real-time.
How to Resolve High Disk IO Latency Issue
The following are the possible solutions that can be used in the same order to resolve the disk latency issue.
Dedicated Disk Drives. Place high Reads/Writes database files on dedicated disk drives. This is really helpful if simple tuning techniques like tweaking the read/write bias on the RAID controller’s cache, right-sizing the NTFS allocation units, and disk striping are already in place.
Query Tuning. This step can do magic but could take days, months, or years to work upon.
Add more RAM. Cache more data by adding RAM to reduce the physical IO.
Replace SAN storage 🙂
I hope this would be helpful to SQL DBA DBA suspecting storage bottleneck on their servers.
I am Microsoft Certified Professional having 10+ years of experience in SQL Server Querying, Database Design, and Administration. I am fond of Query Tuning and like to automate things using TSQL & PowerShell. I also have experience of implementing end-to-end Data Warehouse solution, and Data Migration using ETL tools SQL Server Integration Services (SSIS), Pentaho Business Analytics, and have designed Database Inventory through PowerShell, Python, and Django etc.