Independent studies by Google and Microsoft’s Bing team were seeking to understand the impact of latency on user behavior. Their studies generated surprising results. The companies presented their results jointly at the O’Reilly Velocity conference in San Jose CA in May of 2009.
They wanted to assure their investment in systems to deliver search results and advertising was indeed based on user results, and if possible, business impact. The fact that these two companies came to very similar conclusions after their independent experiments brings a lot of credibility to their joint conclusion. The result of their research: latency matters.
One of the urban legends in the computing world is that people can’t perceive the difference in small amounts of time; and that under 200ms, there is no benefit to faster response times (lower latency). These two experiments contradicted that assumption and show that there is a measurable difference until latency drops below 50ms. This is not based on brain studies, or derivative measures trying to understand human perception, but by measured human behavior. When servers were artificially metered to deliver pre-determined levels of performance, people behaved differently when things got faster. There were statistically significant differences when things moved from 1000ms to 50ms latency. People would interact more with the application (search), respond faster, be more satisfied, and SPEND MORE MONEY when presented with a faster system. Click here to download the original author’s slides.
The payoff for getting a system with 50ms or less latency can be significant. The Bing study showed that moving from 1000ms system latency to 50ms or less produced 2.8% more revenue per user, in addition to better customer satisfaction (more clicks), and better engagement (faster clicks). Improved system latency seemed to imply improved customer engagement, perhaps because low latency input from the system doesn’t allow a user’s mind to wander. The users focused on the application, and that can generate real business results.
To better understand latency, let’s declare latency to be the lag time between the user pressing a key to get something done, and the time it takes for the results to appear on the screen. Now that we know there is benefit from reducing system latency to below 50ms, we need to understand where latency comes from. Latency consists of several different components:
Compute latency would include the CPU, main memory, secondary memory (storage), the operating system, and any virtualization software. Traditionally, the slowest component in the compute system has been storage with high latencies due to the electro-mechanical disk storage. Indeed, disk latency has been the largest contributor to latency in the entire system. Caching has been introduced to mitigate disk latency, but when the next block of data sought is a cache miss and needs to come off the disk, the caching cannot speed things up. People like consistency, and caching isn’t a great solution, it’s just the best available till now.
Infrastructure latency consists of the Ethernet network, storage area network, cluster interlinks, network operating systems, network virtualization and any other connectivity. Traditionally, the biggest contributor to infrastructure latency has been the Ethernet network, and the mass migration to 10GbE and early testing of 40GbE demonstrates the large strides to address this area of latency.
Application latency would include the database, OLTP, CRM, ERP applications, and similar non-system software. One of the common sources of latency in these has been the database due to the heavy compute and I/O loads. This is where overlap of these different latency domains becomes obvious, since the database performance is commonly limited by the slow disk I/O that gates the database performance and the applications that need that data.
In the quest to get to a consistent 50ms response time, each of the system layers should be reviewed, starting with the largest offenders (with the potentially largest payoff) and refining till your goal is achieved. The worst offender is usually the electromechanical disk. The key reason all storage isn’t silicon storage is cost. The problem is that the true cost of disk is seldom calculated to include the caching, short-stroking and overprovisioning to get more performance out of the system. Even with these fixes, the impact on latency is limited. In most cases the cost of all-flash arrays is now comparable to traditional enterprise disk solutions. Perhaps for the first time, 50ms system response time is now attainable at a reasonable cost.
Some additional study findings include that the cost of delay (latency) increases over time. That’s right, poor performance trains the end user who becomes increasingly less engaged, and responds slower and slower. What’s more, if performance is restored, the user delay persists. To get the best engagement with the user, they system should not just be fast, it should be consistently fast. I believe the fastest and most reliable way to get to a consistent 50ms latency system design is with all-flash storage. Caching creates inconsistent response times, and all-flash arrays based on SSDs can also generate inconsistent latencies. A memory fabric that provides consistent latencies without the write cliff found in SSD-based designs is the fastest route to the 50ms system design. You’ll find these attributes in the Violin Flash Memory Array.