Insights, Best Practices and Forward Thinking from the Customer Facing Team, Solution Architects and Leaders on Extreme Performance Applications, Infrastructure, Storage and the Real-World Impact Possible

Flash Memory: To Array or to Card

by VIOLIN SYSTEMS on June 10, 2013

Flash memory is used to achieve epic performance in the data center, but it doesn’t always come down to choosing this silicon approach over hard disk.  Even when Flash is your decision, there’s an application-focused approach to choosing between putting it directly into the server or going network-attached. Let's cover the specific use cases to determine whether PCIe Flash cards or a Flash array is best suited for your needs.

The types of applications that are best targeted at cards include:

  1. Applications where fault tolerance is built into the software layer, as is common in many BigData/NoSQL/NewSQL software platforms. Generally these are designed for no-shared storage (DAS) and store multiple copies over the network.
  2. Read-caching use cases where it’s too expensive in CapEx and power to only use DRAM but the application can take advantage of a large read cache.
  3. Applications that are not business-critical and don’t need capacities beyond what can fit into a PCIe card and can accept some loss of availability if the server goes down or needs to be rebooted.

Flash arrays, on the other hand, are best used for applications that don't fit the profiles above, but require higher levels of fault-tolerance, as well as the ability to share resources among multiple servers across a network (FC/iSCSI/IB).

Applications with Fault Tolerance Built-in

Many newer applications have built fault tolerance into the software layer; this is especially common in BigData/NoSQL/NewSQL software platforms. Generally, these are designed for no-shared storage (DAS) and synchronously replicate writes over an IP network, but nothing precludes one from using SAN storage in these deployments.  While these extra data copies multiply the effective $/GB of the storage, the applications turn that necessity into a performance virtue by allowing the read and compute activity to be spread out among the servers.  For these applications, the RAID overhead inherent in creating a highly reliable array does not result in increased availability, therefore cards would generally be the preferred option.

Another advantage of cards over arrays in these deployments is that performance can scale linearly with servers.  Assuming that the IP switches can keep up then the absence of shared storage resources allows performance to grow up to the limits of the software platforms.  When using arrays the shared resource can be a bottleneck as the number of servers gets large, but if the ratio of servers to arrays is balanced then this can work as well.

When evaluating storage options for these platforms consider the fact that most of them do relatively large writes (128KB and above is common) and have both small and large reads.  This means that write IOPS are low but data volume can be large, whereas read IOPS and volume will both be high.  Assuming a 50/50 read/write load from a bandwidth point of view is a reasonable model; most PCIe cards are not designed with this high a write load in mind.

Big Data Workloads

We often get asked about whether to use Flash for Hadoop deployments.  To date, there seems to be little performance gain for Map-Reduce workloads when using Flash, and at much higher cost.  The story changes for HBase, Hive, Cassandra, and MongoDB.  These platforms do indeed gain from the low latency offered by Flash.

Even though we wouldn’t recommend Flash for Map-Reduce workloads, we’ve seen a lot of value in using Flash for the staging areas where data is migrated into an analytics platform like Hadoop or Splunk.  In these workflows there are many parallel writers that often don’t have the ability to directly write into the analytics platform and a number of readers pulling that data in.  The resulting 50/50 read/write workload with high parallelism is tailor made for Flash.  Now one could again use either a card or an array but the relatively high write load is often better supported by arrays that can spread the write load across many Flash modules.

It should be noted that an area of active development among many of these platforms is to support hybrid storage models wherein there is a mixture of DAS and SAN.  In a hybrid model, the software platform is aware of the fact that the data in a SAN can be accessed directly from multiple, if not all, servers and thereby reduces the need for synchronous replication without sacrificing the ability to scale up read load.  While the software provides inherent resiliency in the event of server, network, and/or storage failures it is still preferable that components not fail if the price is manageable.

Another emerging use case for shared storage in these software platforms is in virtualized deployments.  What has become apparent is that many of these platforms have a single writer thread bottleneck that prevents them from fully utilizing the large numbers of cores that are available in modern servers.  Virtualization enables better utilization of the cores that often exceeds the overhead costs of the hypervisor itself.  Once a hypervisor is in the mix, it makes more sense to use a shared storage model given that virtual servers can move across physical servers. 

Caching Workflows

There are many IO intensive applications that really only need reads to be as fast as possible and writes are of relatively little concern.  In these workflows, one can use either a PCIe Flash card or an array, but cards tend to be the most cost effective and highest performing option when the software supports it.  For cases where the data already resides on a legacy Fibre Channel disk-based array the GridIron TurboCharger can be used to provide the necessary read acceleration.

There are other caching workflows like VDI and swap files that may well be over 50% writes, but that data is transient and doesn’t have high availability requirements.  One would imagine that PCIe cards would be the preferred choice here, but be careful.  Arrays have historically been better able to handle mixed read/write loads than PCIe cards due to their ability to spread the write and erase load across many more Flash modules.  Arrays are also better able to maximize the longevity of the Flash than many independent cards.

Note that it is worth looking into advanced caching software from companies like Atlantis Computing to see how to leverage Flash in caching for virtualized environment.  These software products work with both cards and arrays and can greatly improve overall scalability and performance.

Finally, the caching concept can be extended to workflows where there is another copy of the source data somewhere else in the environment.  It is common for analytics applications to work on a copy of the source data on other servers to avoid affecting the performance of a high priority workflow.  In this case, even though there may actually be data written by the analytics application, it can generally be recomputed should it be lost so there is a much lower requirement for high availability.  This would again appear to be a use case for cards but there is a more efficient option.  Instead of going through the overhead in time and networking bandwidth to make a copy of the source data, it’s far better to give that analytics application a read-only or read-write clone of the source data served by the same SAN-attached array.  Flash arrays generally have enough performance to support the added IOPS required by the primary and secondary workflows and clones take far less overhead than full copies over the network.  Additionally, cloning allows the analytics application to operate on far more current data than might be possible when copying over the network, resulting in more accurate and timely results.

Clustered Applications

The most common use cases for Flash arrays are OLTP/OLAP and virtualized servers.  Mission-critical relational databases have been written to a clustered, shared storage model which typically means SAN arrays; when looking at Oracle RAC, SAP, Clustered Microsoft SQL Server, large data warehouses and similar applications, an array will be the preferred solution.  Virtualized servers generally require shared storage to enable virtual servers to move across physical servers.  In these Enterprise deployments, SAN arrays rule because the applications required them.  The requirements of high performance, relatively high write loads, and high fault tolerance mandate a Flash array solution.

Where individual database instances are so small in capacity that an array seems like overkill, be sure to look at the larger environment.  Often, there are many small databases that in aggregate can cost justify using an array for database consolidation; while no single database instance may seem mission-critical, in aggregate these dozens or hundreds of databases have a major impact on the business.

Other Considerations

PCIe card vendors talk a lot about moving storage closer to the server to improve latency.  Given that low latency is what makes applications run faster (and high IOPS allows one to scale), this makes a lot of sense, but only up to a point.  Moving data over a Fibre Channel network only adds about 100usec of latency and this addition may not have a meaningful impact on application performance.  Consider that most applications leveraging MLC Flash are targeting latencies under real load of just below 1ms.  Moving the accesses over a SAN may very well have no discernible impact on application performance.  Always be cognizant that you may just be moving the bottleneck to the software.

Span of failure is an important deployment consideration.  If your data is replicated across two PCIe cards in different servers then if one fails you’ve lost 50% of that application’s performance, and increasing the number of replicas grows $/GB and networking costs linearly; arrays are designed to have a low performance impact when individual components fail.  However, say one has many 1000s of virtual machines hosted by a single array, which is entirely possible given how fast they’ve become.  If that array were to fail completely the span of failure would be immense.  This span of failure is something we’ve been dealing with for many years in the SAN space; it just requires one to be confident in the fault tolerance of the system being deployed.

Availability of Management Tools

Arrays generally come with far superior management tools than cards.  As the number of cards in an environment grows past the single digits, it becomes important to monitor them, just as you would any high price asset for health, utilization, and Flash-specific issues like longevity.  Having a centralized point of management dramatically reduces the TCO of a Flash solution.  Also consider what happens if there is a new PCIe driver to be installed on all the cards in the environment; this upgrade would likely involve downtime on all of the servers.  Note that there are emerging standards like NVMe and SCSI Express that could mitigate or even eliminate this but it’s still early days.

Deployment Flexibility

One issue to consider when using cards is deploying equipment that cannot easily be repurposed to other applications.  Imagine for the moment that one deployed a few dozen servers with two 1TB PCIe Flash cards each for a NoSQL application.  When that application runs its course, will those servers be easily repurposed if they are not a standard skew in your environment?  One of the downsides of using purpose-oriented servers with DAS is that they are optimized for a specific application, and maybe even a version of that application.  When a significant application change occurs, or a decision to retire it, one often cannot use those servers for anything else because they lack some important component or have now useless components.  In an example from the Hadoop world, two years ago it was common to deploy Hadoop nodes with 4 internal SATA drives and 1GE networking.  Today the servers tend to have a disk per core, with 16 drives being fairly common, and 10GE networking.  How would those Hadoop nodes from two years ago be repurposed for an application that is not written for a DAS model?  They’re certainly too young to be just thrown away but they lack the networking bandwidth to support even a modest SAN and their internal drives provide no value to the new application.  Well, the networking can be upgraded if you have the PCIe slots, and well those SATA drives were pretty cheap so perhaps you just ignore them.  Now imagine the same scenario if the storage had been for Hadoop HBase using PCIe Flash at 20x the price, as it was two years ago.  Would you still want to just throw it away?  Would you pull all those cards out of all of those servers and find a new home for them?  If that flash had instead been deployed in a SAN all along then repurposing it would have be trivial, and the servers as well.  Basically, don’t underestimate the value of SAN(s) when it comes to repurposing storage.

Flash Flavor: SLC or MLC?

Due to the need to drive down costs, SLC is rapidly being replaced by MLC deployments.  The choice of SLC vs. MLC is really about how low latency should be and how high IOPS need to be before you hit diminishing returns; SLC is better in both dimensions at roughly 3x the cost.  We’ve seen that memory-oriented NewSQL databases can be much faster with SLC but the cost difference may not be worth it for every use case.

The Importance of IOPS

We’ve talked a lot about latency as that is the most important factor in improving application performance, but IOPS can also be important.  Parallel applications that scale with the number of cores can drive much higher IOPS; whereas single-threaded apps can only be improved by latency and IOPS don’t matter.  This is where arrays can be a better choice because as more servers share the Flash resource you inherently get more parallel IO and take greater advantage of the high IOPS capability of the media.  You may have heard of the IO Blender effect in virtualization, the same thing happens without virtualization when you share a storage device across many servers.

Addressing Power Failures

Arrays are better able to deal with power failures than cards and SSDs.  Recent research from Ohio State University has shown that many DAS products are not as resilient as they should be, whereas arrays have far more fault tolerance when it comes to power interruptions.   Even if your application can survive a loss of data, few applications can deal with silent data corruption that we’ve been trying to address in the spinning disk world for years.

To Array or to Card
In short, the most important difference between Flash cards and arrays is really the most obvious – cards are not highly available.  No matter how high quality that card might be, if the server is down, you cannot access the data on the card, whereas arrays are generally highly fault tolerant.  This difference in availability tends to drive the applications and workflows that are more appropriate for arrays and those that can leverage the lower-cost card alternative.