Many vendors make some very interesting claims about their inline deduplication feature and why it is superior to post-processed deduplication.
- Save you space
- Increase the endurance of the flash by eliminating writes
- Increase the performance by not sending writes to their SSDs
The purpose of this blog is not to weigh in on which inline or post-processed dedupe is “better” but to make clear that the supposed advantages of inline dedupe, improved endurance and increased performance, DO NOT EXIST!
How can this be? The key point to understand is that BLOCK STORAGE deduplication ratio and the WRITE ELIMINATION ratio due to deduplication have nothing whatsoever to do with each other.
So when a vendor says they achieve a 5:1 deduplication ratio, i.e holding 5GB of user data in only 1GB of physical space and then go on to claim either that this means they have increased the endurance of the flash by a factor of 5, or that they can handle a higher level of write load, they are just making things up. To understand why this is the case consider the following two examples.
The first example is the easiest to follow. Consider a system with 100 blocks of physical storage. If we write blocks containing all zeros to logical addresses 1 to 100, then they will all dedupe to a single physical block giving a dedupe ratio of 100:1. Now if we write a block containing a 1 to address 1, the block will have to be written to flash because there does not exist a block containing a 1 anywhere in the system, and sadly this drops the deduplication ratio down to only 50:1. Now start writing to logical address 1 again and again, with a block containing a 2 then a 3, 4 ,5,… for the rest of the life of the array. Each new write to address 1 can’t be deduped because it is always the only block in the system containing the ever increasing number that is its only contents. So as time goes on, the storage dedupe ratio stays fixed at 50:1 yet 99.9999999% of all user writes result in writes to the flash. So a high dedupe ratio does not guarantee any increase in endurance or any increase in performance.
Now this second example is a little harder to follow, but it’s worth giving it a try. Here I show how a system can have a storage dedupe ratio of ~1 i.e. no deduplication at all, and yet eliminate ~99.999999% of all writes. This time we take our 100 block system and write a block containing a 1 in address 1, a 2 in address 2, etc. until we get to addresses 99 and 100 where we write both addresses with blocks containing the value 99. So if you think of the contents of the blocks in the system, it looks like this:
1 2 3 4 5 6……. 97 98 99 99
We have 100 blocks with only 1 duplicate, for a dedupe ratio of 1.01.
Now starting at address 99 and going down to address 2, we write 98, 97, 96…… 2 , 1. When we wrote the value 98 in location 99, the write was eliminated because the block in address 98 also contained the value 98. When we wrote a 97 at address 98 the write was eliminated because the block in address 97 also contained the value 97, all the way down to writing a 1 in address 2.
Therefore, EVERY SINGLE write is eliminated and now the values contained in the blocks of the system looks like this:
1 1 2 3 4 5 ……. 97 98 99
So we still have 100 blocks with only 1 duplicate, for a dedupe ratio of 1.01.
If we now write from address 1 to 100 the same data we did the first time, 1 2 3 4 ….. 97 98 99 99, again EVERY SINGLE write is eliminated and the system is restored to the initial state of
1 2 3 4 5 6……. 97 98 99 99
We can repeat this over and over till eventually the only blocks written to the system are the first 99 writes and a system with effectively ZERO STORAGE DEDUPLICATAION approaches almost TOTAL WRITE ELIMINATION.
So those two examples are obviously extreme cases, and I picked them because they were easy to explain and hopefully easy to understand.
What about real-world examples?
What about the real world? Can either of those sorts of example actually happen? Probably the easiest way to think about how the real world looks is that “expansive actions” are likely to be highly deduped, “inplace action” is likely to not be deduped. What do I mean by “expansive actions”? Actions such are copying VDI images, performing a backup, taking a snapshot are likely to have a lot of their writes deduplicated, which makes sense after all those are the actions we think of dedupe being good for, those are activities that without dedupe (or snapshots, versioning, linked clones, etc) would result in an increase in the “space used” metric. They can also be thought of as primarily being the result of system operation above the app/VM level.
The best example of an “inplace action” which shows why users should not expect and vendors should not claim that dedupe will increase the endurance or the performance of a flash system is a transaction processing database. In a database, all the blocks have sequence numbers that increase each time they are changed and so it is possible that no writes the database performs during operation will be deduped. The redo log is almost a perfect example of the first case I presented, the database will overwrite in a circular manner a very small file and it puts a unique sequence number in each block written making it almost exactly like the first example. So the longer the database runs for, the more the system storage dedupe ratio tells you less and less about the number of writes eliminated.
An exception to the rule
To be fair, there is an exception to this rule. There is one case where quite a lot of writes from a database can be eliminated. Before the redo log is overwritten, it is copied to an archive area, and that copy may have its writes eliminated. In order to protect against any corruption causing the loss of the log archive and preventing the recovery of the database, many databases perform what is called log multiplexing where the identical log blocks are written to 2 or more separate redo log files and then copied to separate archive logs. Preferably these copies are on separate disks, but even if they are not, having the separate copies ensures that in the event of any form of corruption or IO error, there is a separate physical copy available for a database recovery. Er….um…. oops.
If you have an always-on inline dedupe system that can’t be turned off and which uniformly spreads blocks around the SSDs regardless of LUN assignment, etc., then no matter how many multiplexed copies of the log files you try and create, they will all be deduped to a single physical copy. If you are running a critical database on an always-on, globally deduped system such as that of, oh say, XtremIO, you had better be aware that it is going to dedupe exactly those blocks that you don’t want it to dedupe and thus defeat one of the key reliability features of your database.
So while dedupe may be a very attractive feature for some use cases. If you are running a database, ALWAYS-ON dedupe that spreads its blocks over the entire storage cluster isn’t a feature, it’s a bug.
If you have a vendor telling you that their always-on dedupe system goes faster, or will have long enough endurance because of deduplication, then one of two things is true:
- Either they understand what I have just said, and are trying to get you to buy their product based on statements they know not to be true
- They don’t understand what I just explained, in which case they are trying to sell you a product they think will not wear out when maybe it will.