Insights, Best Practices and Forward Thinking from the Customer Facing Team, Solution Architects and Leaders on Extreme Performance Applications, Infrastructure, Storage and the Real-World Impact Possible

Top 5 Questions to Ask Your Flash Vendor

by VIOLIN SYSTEMS on October 7, 2013

NAND flash brings with it a new set of terminology; a new set of pros and most certainly a new set of cons.  New flash-based storage vendors are popping up by the dozens.  So, what is an IT person to do with all this new technology?  Simple. Ask questions and when in doubt, test the solution first.

Storage purchases are expected to live in production for at least 3 to 5 years.  Knowing which companies are likely to still be around, what makes their technology different and what makes flash storage the killer new toy or the aggravating purchase you soon regret is based on your understanding of this new technology and how each vendor utilizes it. Read on for the Top 5 questions to ask a flash vendor.

#1:  Technology Ownership and Support

What parts of your storage solution have you designed and manufactured and which portions have been purchased from other companies?  For each part, who supports it?  Are replacement parts stored in a local depot?  Explain the support process.

Why you should care

With the existing patent-scape as it is and the amount of time it takes to develop these advanced algorithms is it quite easy to see why a lot of startup flash storage vendors have chosen to purchase their flash storage as off-the-shelf SSDs in which your vendor is aggregating via software.  Be aware of which vendors are flash storage developers and which are 3rd party aggregators.  SSDs are designed to be hard drive replacements.  They are bootable SCSI devices and that SCSI controller will add unnecessary latency.  It also means that parallelization, error correction, wear leveling and garbage collection are not under the control of the full array and done as individual parts versus one chassis-aware system.

Enterprise class storage systems should have enterprise class support.  Understand what you are getting yourself into for support.  You may not want a solution where spare parts are not in a local depot, if the storage vendor has to route support calls to a different vendor or if local folks are not available for upgrades or part replacement.

#2:  High Availability

Are there any single points of failure in the device?  How many flash-aware controllers are there?  Does HA require buying two arrays?  Does the HA affect the I/O latency or throughput?  Does the HA feature/software cost extra?  What happens to the I/O performance after a failure (ask for each component in the array)?  How do failed parts get serviced (hot swap or downtime)?

Why you should care

In the rush to get products to market and due to attempting to keep costs under control it is common for vendors to have solutions that are vastly affected by component failure up to the point of the array itself going offline.  An enterprise solution should always be on and the best-in-class solutions will lose very little performance after component failure.  Also, the best-in-class systems allow for full hot-swap capability of every component.  Make sure you understand if you have to buy two arrays to get full redundancy, if turning on their spanning RAID affects performance or if changing out components requires downtime.  Any of those could leave your system underperforming or leave your data at risk while waiting for the next scheduled maintenance window.

#3:  Normalization

What are your sustained I/O metrics?  What are the metrics in real-world workloads (70/30, etc)?   Are your quoted performance metrics after a calculation like post de-duplication or post-compression?  Are your sustained metrics after the normal flash burn-in period?  What size I/O is used in the metrics (512B, 4k, 8k, etc)?

Why you should care

Determining the source, calculations and conditions of vendor supplied metrics is vital in understanding what you are actually buying.  Some vendors will quote IOPs (I/O’s per second) in 512 bytes, some in 4k, etc.  Some vendors will only quote read IOPs instead of write or mixed workloads.  Other vendors will quote post-calculation metrics meaning that the array requires compression or de-duplication in order to achieve this quoted number.  The array cannot meet these numbers alone.  Also, most flash storage products will settle into a performance zone lower than start.  Make sure you find out the numbers post “burn-in” as that is what you’ll see in production over the coming months and years.

#4:  Parallelization

How many flash aware controllers are in the solution?  How many components is wear leveling, error correction, garbage collection and packet striping done over?  Are the controllers SCSI based or custom flash-aware?  How is parallelization affected by component failure?

Why you should care

All but a few flash storage vendors are getting to market quickly by reselling 3rd party vendor SSDs.  Most have no NAND flash storage engineers or controller logic developers on staff.  This means that SCSI controllers are limiting latency and processes like wear leveling and error correction are out of the hands of the part-aggregating vendor.  Flash has the ability to perform with incredibly low latencies at incredibly high IOPs.  Quick-to-market SSD based arrays can yield faster-than-disk performance but are mostly considered a transition technology who’s time is starting to pass as full chassis-aware flash arrays are coming onto the market.

#5:  User-Facing Architecture

Does your storage solution require me, the user, to create RAID groups, unit-based LUN groups and use a Segregation and Aggregation based architecture model?

Why you should care

Storage architecture has long been based on the Aggregation and Segregation model.  Individual storage parts (disks) are aggregated together to service the requested I/O profile.  These groups are then commonly segregated to avoid one workload affecting another.  This requires someone to collect all of the workload groups, define their I/O profile, choose the number of units to place in their LUN group, choose the RAID for the LUN and then monitor and maintain the system.  Common byproducts of this system are hot spot issues relating to data locality and the need to specify workload I/O profiles in advance.  It is also common for application developers and database admins to not know what their future I/O profiles will need to be and this then causes additional friction in IT departments.

Distributed Block architecture is the way of the future.  Since flash is based on an all-silicon technology with no moving parts it presents the ability to have each storage location be equally accessible, at the same speed, all of the time.  This means that administrators can place any data in any format anywhere on an AFA (all-flash array) and it will always work at the same speed with no tuning or advanced planning.  The future is zero-risk performance with almost no setup or tuning.  Speed comes with the array as each I/O is striped over all of the components so every I/O goes at the maximum speed of the chassis.  Space is then used as space is needed and when more space is required another array is purchased.  It sounds crazy but this means that solutions engineers will buy space when they need space instead of buying space to get speed.  Most transitional SSD based solutions still require the Aggregation and Segregation model or internally create a basic RAID 5 like stripe over all of the SSDs causing issues with the wear leveling, error correction and write cliff optimizations.

Summary

NAND flash storage is a relatively new and quickly growing storage medium that brings wonderful performance to enterprise solutions. Like anything else, new technology comes with a new set of pros and cons.  Understanding how the technology works and what makes each storage vendors’ solution different is the difference between a 5-year empowering purchase or a 5-year headache.