Insights, Best Practices and Forward Thinking from the Customer Facing Team, Solution Architects and Leaders on Extreme Performance Applications, Infrastructure, Storage and the Real-World Impact Possible

SQL Server and the WFA: Part 2 - Latency Matters

by VIOLIN SYSTEMS on August 28, 2014

Greetings!

The Windows Flash Array (WFA) is all about high throughput, low latency, scalability, and balanced performance. We previously discussed its high throughput, and in this blog, we are going to focus on how latency really matters, especially if you want your databases to hum along providing copious value to your enterprise.

diagram-1-sql-server-wfa

Having a high level of IOPS may seem impressive (and it is), but that’s only part of the high performance equation. There are two other factors in SQL Server environments that are the greatest concern with respect to achieving the highest performance.

Latency is a measure of the time it takes for data to come back from a request. How much you can lower this measure is indicative of how much of an increase in CPU utilization and therefore decrease in application processing time (duration of reports, etc.) you can achieve.

The other factor is throughput. Each query has a finite amount of data to process. Faster storage doesn’t change the amount of data in a report, rather only the time it takes to deliver it. So a disk array that caps out at 200MBps will deliver a 4000MB a report in 20 seconds. In contrast, the WFA, which achieves 4000MBps, can complete the task in 1 second.

Poor storage performance leads to poor CPU utilization. CPUs spend more time waiting than executing, which means servers are expensive and mostly idle investments. This reality has not escaped the attention of CIOs, CFOs, and other senior management. The performance challenge grows more pronounced the greater the number of server cores you have in a system. Of course, today multi-core is par for the course, which means you are facing the aforementioned challenge.

When an application (or in our case, a database server) is active, it will reside in one of three queues:

  1. The Running Queue is where the work gets done. It is a currently executing process, and the point at which your servers are earning their keep, and perhaps a bit more. You are on the express train, sitting in First Class.
  2. The Waiting Queue is what its name implies; the process is waiting for some resource (I/O, network, locks, latches, etc.) to become available. You ran out of things you can do, so you’re stuck waiting at the train station for the rest of your team to get it together.
  3. The Runnable Queue holds processes that have their resources ready, and are waiting their turn for some CPU time. You’re ready to go, but the train already left the station, so you’ll hop on the next one as soon as it arrives and the conductor will show you your seat.

[space size="20"]

It’s not bad when 1 user is vying for the system resources.

[space size="10"]

It gets a whole lot worse with multiple users vying for the same resources.


Typically, disk arrays have good throughput until the number of users increases at which point logically sequential requests become physically random. Keep in mind that the concept of being logically sequential at the database layer is entirely different than it actually being physically sequential on the storage medium. Defragging, index rebuilds, clustered indices, read ahead processes, large batch requests, etc. all may cause the DBA to believe that storage is being accessed sequentially; however, a SAN layout with striped LUNs, multiple components, and support of multiple workloads means that it is almost certainly not sequential on disk or the access of it is being randomized due to concurrency of requests with other users.

Our patented Flash Fabric Architecture™ overcomes this resource contention for storage. Put on your geek hats, here’s how it works:

  1. One 64K inbound write is split into 16 separate 4K packets
  2. Each 4K packet has parity calculated creating a total of 5K of data to store
  3. The 5K final packet is split into five 1K packets and one is sent to each of the five VIMMS in a RAID group

As you can see, the packet is broken into small batches and processed in parallel; this is what helps drive down latency. In our large capacity arrays there are 64 high-speed, custom-built, flash controllers processing 1K vs. commodity SSD controllers processing 64K (RAID 1 or RAID 10 will send the full packet to both SSDs that host a log or tempdb device). The result is that data is spread across the array so there is no clumping of data, data locality or ultimately, hot spot issues. This means the WFA can scale to support massive user concurrency for both random and sequential focused workloads. Connecting through SMB Direct can free up to 30% of CPU usage; these cycles could be put towards BI initiatives or other value added activities.

For example, you can hammer away on OLTP workloads while also executing highly parallel BI with many simultaneous related queries, without dragging down your overall performance. For example, let’s say your organization has a 24-64 core DW/BI solution. Thus, you have the capacity to run in a parallelization factor of 24-64. In order to justify a dedicated device for parallelization you would need to data in a factor over 64.

Running a mix of multiple database workloads against a single array is now not only possible, but you can do so with incredible performance, consistent latency, and overall higher server and storage efficiency. Of course, if consolidation or virtualization of SQL Server instances is your preferred approach, the WFA can enable you to increase virtualization density as well.


With the low latency of the WFA, you can achieve these levels of performance enhancement to your SQL Server without any storage tuning, upgrading or modification:

  • 3-10x faster reports
  • 3-6x faster transactions
  • 2-5x higher consolidation of SQL Server VM farms without performance penalty
  • Streamlined re-index process drops from 20+ milliseconds to sub-millisecond
  • Index and Stats defrag locking process can be released in a sub-millisecond
  • Grow your database size without taking a performance hit
  • 24x7 maintenance (do backups, index maintenance, etc., during the day) so if there’s an issue, it’s when you’re at work, not when you’re asleep

As you can see from this and the previous blog, we can do a lot in 3U to not only boost your SQL Server performance, but transform the economics of SQL Server as well. Finally, you can have storage that is so fast that it doesn’t require you to change your behavior; you can run tasks during the day while you’re at work, so why not buy the storage solution that allows you to do so?

Nonetheless, performance alone will not meet the needs of a 21st century data center. Transforming the efficiency and economics of the data center, which ultimately yields reduced OPEX and CAPEX, is just as essential for overall corporate success. We’ll continue on this theme the next time.

Cheers!

Learn how Violin and Microsoft collaboratively optimized this solution to leverage WFA’s unique performance profile so you can run your SQL Server environment in a Flash.

For more information on the WFA and SQL Server, download this solution brief.

> PREVIOUS: Part 1 of SQL Server and the WFA — Lots of I/O
> NEXT: Part 3 of SQL Server and the WFA — Reducing CAPEX and OPEX through Simplicity