Data Deduplication

So, I’ve been playing with storage lately. Mainly, because I’ve been planning on moving my HyperV files to a faster drive due to some poor performance. While looking for solutions, I cam accross the idea of data deduplication.

Now, why is data deduplication an attractive idea here? Because it stores specific ranges of data only once. While this may not be important to most, I store a VM of each Windows operating system (Vista and Server 2008, and up) This is a lot of duplicate data being stored on the data.

In fact, Microsoft estimates that you can save 80-95% space. That is a significant savings. In practice, I’m seeing about 75% savings. That means that I can reliably use a single SSD drive to host HyperV.

Additionally, Microsoft estimates that you can get about 70-80% saves for “distribution shares”. Specifically shares used for installers and updates. If you’re planning on rolling out WSUS, it maybe a great idea for the storage used for the updates, as it could save a considerable amount of space (in theory).

However, accessing the data can be rather intensive, and can be fairly slow on a spinning hard drive, especially when copying a lot of data from the drive. That means that if you want to use Data Deduplication, you need to use Storage Spaces, RAID or an SSD. Something that will (potentially) be much faster than a normal hard drive.

Additionally, Data Deduplication is only supported on Server 2012, Server 2012R2 and Storage Server 2012R2.  This is as the built in feature, as 3rd party implementation could be used on any supported OS.

However, this means that it is not supported on Windows Server 2012R2 Essentials, the operating system that I am using.  This means that it’s not available by default, and requires both access to Windows Server 2012R2 Standard/Datacenter for the files, and hacking the feature in.

I’ll cover that later if there is demand (or if I feel like adding it).

Author: Drashna Jael're

Drashna Jael're

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.