Insights, Best Practices and Forward Thinking from the Customer Facing Team, Solution Architects and Leaders on Extreme Performance Applications, Infrastructure, Storage and the Real-World Impact Possible

Flash Flavors: MLC, eMLC, or vMLC? (Part II)

by VIOLIN SYSTEMS on October 18, 2013

A while back, we started Part I of this two-part series. Let's continue the conversation and discuss how Violin Systems can achieve the same effects with MLC.

Violin Systems

Changing the Architecture

Recall that I mentioned how it is possible to use MLC parts “like” SLC? By only using half the addresses in a block (writing one bit to each cell), this can be done and is commonly done by people who don’t have a relationship with the fab. This makes the flash last longer but it still doesn’t have the full performance of SLC. However, it does have the loss of half its storage capacity.

If not properly coordinated with the garbage collection algorithm and the system controller, this could cause enough increased write amplification to offset any potential endurance benefit.  Or as they say, we are trained professionals, don’t try this yourself.

If you do have a close relationship with a fab as Violin does, you can make your own flash controller like Violin does. You  could operate the MLC as SLC, getting the wear and performance benefit while being fully coordinated with the rest of the flash management, thus gaining improved endurance from it.

As mentioned previously, the difference might be which commands the internal or external eMLC controller uses that the MLC controllers don’t use. That is, an eMLC SSD might not be using eMLC flash parts, instead it may be using special commands which allow the controller to get more out of an MLC part.  One such command changes the manner in which the voltage level of the flash is read which in turn allows for finer control and coordination with the error correction system. As a side note, a lot of vendors describe this as “signal processing”, but I’ll get into the validity of calling memory ECC “signal processing” another time. For now let’s see the sort of improvement operating the flash with such commands can yield.

The graph below shows an MLC part with a data sheet life of 3K P/E cycles (this is an actual part as opposed to a made up one, since this is a graph from actual testing), which was rapidly run up to 10K P/E cycles or more than 3X its datasheet life. Doing this quickly rather than over the course of years also makes the wear even more severe. On the left of the graph, the number of bit errors for all the pages in a block are shown as read with standard read commands. On the right, the number of bit errors for all the pages in the same block  are shown as read with special read commands that allow the internal operation of the flash die to be adjusted by Violin’s flash controller.

Violin Systems

As you can see, even at 3X, the data sheet rated life more than ½ the pages in the block would have still been usable with standard read commands. But by using the special read commands, not only is the whole block usable at 3X the rated life, but the bit errors are not even using ¼of the recommended level of ECC correction with a result of many more cycles left in this part.

Example 2: “Flash Farm: Most dies good, some dies better”

Value MLC eMLC
P/E cycles 3K 10K
Retention at rated P/E cycles 1 year 1 year
Required ECC 30 bits (per 1K) 30 bits (per 1K)
Program Time (ave) 1.5ms 1.5ms
Erase Time (ave) 5ms 5ms
Min Blocks per Die 4096 4096

You may have noticed that this pair of data sheets is the same as Example 1, in Part I.

There are at least two ways this can occur. The first case is that the parts are *identical* except for the eMLC parts being "specially selected", or "binned" the same way that CPU vendors sell CPUs of different clock rates, In this case, the price difference charged by the manufacturer may actually resemble their cost difference.

The second case is when you see that the eMLC part of the same size isn’t available until sometime after the MLC part first ships. In this case, what you don’t see is the fab getting more experience with the new process or mask set and tweaking the recipe. Do they use the same tweaks on all the parts from then on? Or just the ones that are slated to be eMLC? My guess is that they probably do. I will explain below.

Binning the Parts

Die test is a very expensive part of producing flash, unlike a majority of the rest of the semiconductor process. Die test is, as its name implies, an individual process. The equipment to do the testing is expensive and the process is slow. How slow is it? It’s up there with watching paint dry. Literally. If a die has 4K blocks, and a block has 256 pages, and it takes 1.5ms to program a page, then it takes almost half an hour to program and erase the whole die *once* to check for bad blocks or other signs of the part being bad or of low quality.

Performing enough testing to separate out the really good parts from the "just OK" parts might take so long that it would be cost-prohibitive, and it might be that there aren’t enough really good parts, or they are not found on a reliable enough basis to meet the demand for them. As a result, it may be easier and cheaper for the fab to make the parts just a *little* bit larger and perhaps give up getting as many die on the wafer but have basically every die that isn’t really bad be "good enough" to be an eMLC part, since finding the very bad parts is much easier than finding the very good parts.

If most of the parts are good enough to be eMLC, then they only have to subject those parts that will be sold as eMLC to the added time and expense of extended testing

Even if many flash parts aren’t good enough to be eMLC, you can still be sure that they are much better than they need to be to be acceptable as MLC parts. Why is this? The fabs want a very high yield. They want to get as close to 100% of the dies on the wafer working as possible. Flash, particularly MLC flash, is analog rather than digital, resulting in a lot of variability. The way you get all the parts to work is by building a lot better than the specs they have to meet.

The Violin Systems Advantage

Violin can exploit this to yield multiple advantages over other vendors who build their product from off the shelf SSDs, whether they use MLC or eMLC. Violin’s bins are different in ways we can exploit to get a better product for a better cost than the ones the fab uses or the ones an eMLC SSD vendor uses.

Our systems are designed for the enterprise market, so it is worth doing extended component testing.  We can do testing, using the controller on each VIMM that we designed and built, more cost effectively than the fab can.

Our testing can be done to the precise level of quality that we desire. This might be lower than a standard eMLC part, it might be higher, or it might be both. Both? How can it be both you might ask? Well, that’s a very good question.

Because we manufacture our own flash modules (VIMMs) that use our own controller and we know they will be put into one of our arrays, we don’t necessarily need the parts to last 10K P/E cycles of an eMLC spec part. To over write a multi-TB array 10 times a day everyday, would be a workload of 100% writes, 100% of the time.  I have never seen a workload even close to this. If you happen to have such a workload, I have a lovely SLC product for you.

We put our parts in a system with over 8,000 dies as opposed to an SSD that stands by itself and might not even have 100 dies. Clearly the effect of a bad die or one with a certain number of bad blocks on our system is much lower, by orders of magnitude than it is to an SSD/PCIe card.

That is, how we can accept a lower quality rate of individual parts while providing an enterprise level of system quality.

On the other hand, suppose we wanted to add lots of additional meta-data and perform other forms of data integrity checks beyond ECC Meaning using less of the spare area for ECC, and as a result having fewer correctable bits than specified by the flash vendor. Then we could test for a bit error property that was much better than had been tested for by the flash fab and reject parts that wouldn’t pass this stronger test.

And in that way we could have parts whose testing is both weaker and stronger when compared to arbitrary levels of the vendor datasheet, but is in fact fully satisfying the specific requirements of our product and our customer.

One other benefit Violin has over vendors who use SSDs or PCIe cards made by someone else is that if there is a part that doesn’t meet our requirements, it can be taken off and replaced at the manufacturing site very economically. A vendor who wanted parts that were tested beyond the eMLC spec in certain areas would have to discard the entire SSD if it didn’t meet their requirements because it would still be fully compliant with the warranty with which it was sold.

But most MLC / eMLC datasheet comparisons look more like what we will see in the following examples.

Example 3: Stronger ECC Makes it Last Longer

Value MLC eMLC
P/E cycles 3K 10K
Retention at rated P/E cycles 1 year 1 year
Required ECC 30 bits (per 1K) 60 bits (per 1K)
Program Time (ave) 1.5ms 1.5ms
Erase Time (ave) 5ms 5ms
Min Blocks per Die 4096 4096

In this case, the difference is that the eMLC part requires much stronger ECC in order to meet its datasheet numbers. The only difference between the flash die is that the eMLC die is slightly larger in order to accommodate the extra ECC needed. This will make it slightly more expensive to manufacture, but not nearly enough to justify the cost difference.

Beware the "up to X bits of ECC", especially if its "multi level ECC", (i.e. OCZ Everest 2 or LDPC). As an example, in a Violin system we have RAID over 1K strips of data, so if an entire 1K of data was bad it could be reconstructed from the parity, yet we do not claim to have “up to 8 thousand bits of error correction”, while there are other vendors who do claim parity bits as if they were ECC. The “up to” problem is even more true in the case of LDPC of which there is great interest due to its ability to sometimes correct a very large number of errored bits, the only problem is that LDPC can fall victim to what is called a “stopping set” which is a particular combination of bits that the LDPC can not correct, where that number of bits is much, much smaller than the largest number of bits that the LDPC can sometimes correct.

So just as old supercomputer benchmarks fell out of favor because they only showed the  performance of the computer that was “guaranteed not to be exceeded” under normal conditions, likewise “up to” error correction claims just means, “it will never do better than this, but there is no guarantee it will not do much worse.”

I think the next two examples are the most interesting because they demonstrate how little bearing the datasheet numbers have on the actual limits of the flash usage, as well as showing why Violin flash Memory Arrays have no need to use eMLC flash.

Example 4: Going Slower Makes it Go Further

Value MLC eMLC
P/E cycles 3K 10K
Retention at rated P/E cycles 1 year 1 year
Required ECC 30 bits (per 1K) 30 bits (per 1K)
Program Time (ave) 1.5ms 2.0ms
Erase Time (ave) 3ms 6ms
Min Blocks per Die 4096 4096

In this example, the Program and Erase timing makes the difference between the two types of MLC. By programming and erasing the part more gently, the flash is made to last longer.

Note: In this case the flash die are *identical* except for the parameters used by the internal controller. For this type of eMLC, the exact same part with the exact same cost to manufacture, but with lower performance, is being sold to you for more money.

It is possible to externally emulate this behavior by filling lots of blocks in parallel rather than fully filling once flash block and then moving on to the next one.

As it happens, an external flash controller manages a large number of dies. This controller may have extra performance to spare and can operate the flash in a manner which gives much the same effect. Or if the flash controller is part of a larger array that has performance to spare then the array can operate the individual controllers below their full speed on average, allowing much the same effect.

This last example is my favorite. I think makes it most clear why Violin has no need of eMLC flash, or perhaps that we could legitimately call our MLC flash eMLC just because of how we use it.

Example 5: A Short-Term Memory Leads to Long-Term Operation

Value MLC eMLC
P/E cycles 3K 10K
Retention at rated P/E cycles 1 year 3 months
Required ECC 30 bits (per 1K) 30 bits (per 1K)
Program Time (ave) 1.5ms 1.5ms
Erase Time (ave) 5ms 5ms
Min Blocks per Die 4096 4080

Here the difference between these two parts is that the eMLC part is only guaranteed to retain data for 3 months at its "end of life" and that it is only guaranteed to have 4080 good blocks at its "end of life", i.e. it will only store 7.96GB of data instead of 8GB of data. That is the only difference, because they are in fact exactly the same flash parts! In this case the difference you are paying for is merely for the manufacturer to warranty a different data sheet for the same part.

At Violin, we allow for the expected number of failed blocks when we format our systems. We scrub the data so that we have no need of 1 year retention, and thus we use MLC flash in such a way that it effectively is eMLC flash. Or if you want to, you can just think of it as vMLC flash.

Example 6: Thinking Outside the Die

The last way to get MLC parts to give better properties is to use more of them. Many vendors talk about multi-dimensional RAID, or RAID on the card/SSD as well as RAID over the cards/SSDs.

For some vendors operating at the controller level this means classical style RAID between the different chips on the board. For other vendors that just integrate off the shelf parts with some software it may involve storing extra parity in the same parts as the data.

Many vendors of “cloud” solutions or proponents of “server storage” address the issue of reliability by multiple replications of the data.

However, taking this approach is very sub-optimal. You can windup buying a lot more flash than you need, as much as 3X in a replicated server storage system and despite what you may think, you are not getting very secure data protection. If part of the RAID coverage is being used make up for the deficiencies of MLC flash, then you do not really have multi-dimensional RAID, you have RAID and extra ECC, which doesn’t sound as sexy in the marketing material as 2D, 3D or 4D RAID!!!

Example 7: Use eMLC Flash

There is nothing “wrong” with using eMLC flash. Depending on the price premium of the eMLC it could very well be cheaper than a lot of the approaches in Example 6. After all why buy 3X the flash you need and all those extra servers to hold all those extra PCIe cards if that eMLC SSD only costs 25% more than an MLC SSD? Remember, I said at the beginning there might be plenty of users for whom using eMLC parts is the right decision. But if you go with eMLC parts you should do so because it was in your best interests, not because it was in the sales person’s best interests.

Conclusion

Violin's combined advantages eliminate the need to use eMLC flash parts because, taken together, they give us our own specially designed vMLC flash parts, controller, module and system.