Insights, Best Practices and Forward Thinking from the Customer Facing Team, Solution Architects and Leaders on Extreme Performance Applications, Infrastructure, Storage and the Real-World Impact Possible

Garbage Collection & XtremIO – Fiction & Fiction: Part III

by VIOLIN SYSTEMS on December 10, 2013

Part II of this series on the XtremIO product launch triggered some questions, which I will address in this post.

  • Is the performance of XtremIO “consistent” AND “predictable”?
  • Is it even “consistent” OR “predictable”? 
  • Is it “the only all-flash array that requires no system-level garbage collection yet maintains consistent and predictable performance.”?
  • Does it provide “the industry’s most consistent performance”?
  • Does it never happen that “IOPS suddenly drop and latency suddenly increases”?
  • Is their mixed R/W performance “stable”?

Let’s check the XtremIO Spec Sheet for the 2 brick system they were using in their demos during their product launch.

Blog2_image005

Let’s see if they live up to their claims shall we?

If you have an app reading along at 500K IOPs and a second app comes along and starts writing at 150K IOPs, the read IOPs of the first app will drop to 150K or a reduction of 70%.

Alright. You say: That’s not fair; both apps together would total 650K IOPs which is more than is supported by a 12U, 2 brick system.

Well then let’s try one app reading along at 350K IOPs and a second app starts writing at 150K IOPs. In this case, the first app will see a 60% drop in IOPs from 350K down to 150K. In addition, the total IOPs of the system goes down because of the addition of a second workload.

This doesn’t sound consistent to me.

How about predictable?

Well, let’s say you spend an hour each morning loading a large data file for processing, and then you delete the file once it has been processed. As it happens, the reason it loads so quickly is that ½ of the blocks in the file already exist somewhere else in the system and so ½ of the blocks are deduped and not written. Then one day someone deletes the file that had all the shared blocks and now your usual hour load takes an hour and a half. If almost all of the file you are loading already existed in the system, your morning load might usually only take you 10 minutes. That is until the other file is deleted and you find out that you were loading at 5X the normal rate due to dedupe and now not only does your load take over an hour, but everyone else using the system has their read performance dropped by 60% for a solid hour.

How can that be?

Well, if you watch the VDI demo at 46 minutes into the launch keynote, they are cloning VMs. i.e. very high dedupe rate at over 4GB/s which would be 1M write IOPs, but a 2 brick system can only do 200K non-deduped writes or a factor of 5X variation. So that is the kind of variability in performance you can have based not on the workload, but on the contents of your data and the data of any other user of the system.

How about not having any latency increases? Certainly they like to make claims about how badly others perform;  (from their competitive “kill sheets”)....

  • “What is the performance impact of garbage collection on host I/Os in real enterprise workloads, which are multi-threaded & mixed R/W, on a Violin that’s under constant I/O load & being overwritten? XtremIO doesn’t garbage collect SSDs & offers consistent sub-1ms latency to I/Os.”
  • “Ask SolidFire why their latency is only <2 ms? XtremIO can deliver <1 ms latency for all workloads even as array capacity utilization levels increase”
  • “What is the performance impact of garbage collection on host I/Os under real life enterprise workloads, which are mixed R/W, on a Pure array that’s under constant I/O load and being overwritten repeatedly
  • (preconditioned)? XtremIO does not garbage collect, nor lock, SSDs in the middle of I/Os & offers consistent sub-1ms latency to any workload.”

Does their performance live up to their competitive trash talking?

If you listen at [47:30] into the keynote, you might think so when after starting up 51 VMs they declare “we are seeing response times under 1ms.” But if you  look closely at the display, you will see response times under 1ms, along with many over 1ms - indeed one close to 2ms. You can also see the latency of those streams being measures varying considerably during the short period they are shown.

blogIII_image

 

And remember that this was a non-live demo of a recorded presentation, with an array that was only half full with only 51 streams of which 50 were apparently identicle VDI users. Hardly a demonstration of a mixed workload from a multi-stream user base with differing workloads in an array where all of the space was in use! And even with all that going for them, they could not show less than 1ms latency. They came close to not even showing 2ms latency!

So did XtremIO live up to any of its claims? Let’s go check the dictionary again.

  • sta·ble : adjective 1. not likely to change or fail;
  • con·sist·ent : adjective 1.(of a person, behavior, or process) unchanging in achievement or effect over a period of time
  • pre·dict·a·ble : adjective 1. able to be predicted
  • pre·dict : verb past tense: predicted : 1. say or estimate that (a specified thing) will happen in the future or will be a consequence of something.

As a well known reality TV show might say, I think we can call those claims BUSTED!

I should add that XtermIO, in the process of criticizing Violin’s garbage collection, also objects to the fact that we offer SLC as well as MLC and different capacities in the same chassis, as well as different formating ratios, all of which have different prices and different performances. Of course, the reason we do so is that our customers have differing requirements, so we provide them with differing solutions. XtremIO, on the other hand, can only offer one type, one size and one format ratio.

They are also very proud of their ability to deploy (not grow, but deploy) systems ranging from 7.5TB and 150K mixed W/R in 6U up to 30TB and 600K mixed in 22U, but feel that Violin’s ability to deploy a range of products spanning 6TB to 40TB and 200K to 1M mixed IOPs all in the same 3U chassis is some how bad?

What good is a scale-out system that can only reach 2/3 the performance or ¾ the capacity in 7X the space of a product that has already been shipping to actual customers for over a year?

Yes, the performance of a Violin system under garbage collection varies, because it was designed to. Our systems are designed for real world use, which is busty, database loads, boot storms, etc., rather than cap our customers' performance to never be greater than the average performance we allow the user to burst and then catch up during the lulls.

If your use case doesn’t have lulls, if your use case really does call for 1M mixed R/W IOPs, then yes, you will need our SLC product, but then that is why we have one.

It's funny isn’t it, that EMC claims one of the great benefits of using off the shelf SSDs is how easy it is for them to qualify parts and change components and yet they have only a single configuration to offer. Meanwhile, it's Violin that truly does have a product designed from the ground up for flash, who can offer you a range of products to fit your needs and budget, as well as your existing power feed and floor space.

In future posts, I will explore many of the other wild claims XtremIO has made and show them to be as  absurd as their garbage collection claims.