It is already known for quite a while that ConfigMgr 2012 supports Data Deduplication. A nice blog article about it is available at http://blogs.technet.com/b/configmgrteam/archive/2014/02/18/configuration-manager-distribution-points-and-windows-server-2012-data-deduplication.aspx
You can do yourself a favour by reading Johan Arwidmark’s blog which summarizes it quite nicely at http://www.deploymentresearch.com/Research/tabid/62/EntryId/151/Using-Data-DeDuplication-with-ConfigMgr-2012-R2.aspx
In one of my projects I’m building a test environment to help my customer evaluate ConfigMgr 2012 R2 and allow the customer gather some results to build a ConfigMgr 2012 R2 Site infrastructure in a production environment. This allowed me to test Data Deduplication and see if I could gain something from it. Testing something like this is not something I’d recommend doing in a production environment since a production environment cannot be a playground…
I used Johan’s blog article as my reference in order to test Data Deduplication. Both blogs state that Data Deduplication is supported on 2 folders only:
- Content Library (SCCMContentLib)
This means that Data Deduplication only makes sense on Distribution Points.
I’ve sat it out for a few days to allow the server doing some deduplication and then I checked the deduplication rate.
A ‘whopping’ 58% rate, quite impressive if you ask me. However I expect a lower rate when a lot of applications are imported in ConfigMgr 2012 R2 but the amount of GB’s saved will still be quite high.
After achieving this result I started thinking how this can help me achieve a better customer experience, especially during Operating System Deployment. During OSD, even when using an MDT integrated Task Sequence, quite some Packages are used (and an image file too). When using default settings, everything will be downloaded when the Task Sequence is running. This is fine for mose environments, and this is something required when running multicast deployments (which I have rarely seen to be honest). However in most scenarios it would make sense to let everything run from the distribution point over the LAN. This means that the SMSPKGx$ share needs to be populated. For all packages involved, the following setting needs to be enabled.
Once this is done, the following option becomes available at the deployment settings of the Task Sequence.
The consequence of using this setup is that each package involved will be stored twice on the disk.
But now Data Deduplication kicks in. To quote the TechNet blog:
Data Deduplication is a new feature in Windows Server 2012 with the goal to store more data in less space. This is achieved by finding and removing duplication within files, without compromising its fidelity or integrity.
Deduplication is done by segmenting files into variable sized chunks (32-128KB), identifying duplicate chunks, and maintaining a single copy of each chunk. Redundant copies of the chunk are replaced by a reference to the single copy. The chunks are compressed and then organized into special container files in the System Volume Information folder.
This means that Data Deduplication will effectively cancel out the burden of storing data twice which is actually happening with SCCMContentLib and SMSPKGx$
You may receive different deduplication rates when setting this is up in your environment.