Clustering FileServer Data Deduplication on Windows 2016 Step by Step #sofs #winserv #ReFS #WindowsServer2016 #Dedupe

Building a File server in Server 2016 isn’t that different tan in Server 2012R2 except there are different options, ReFS, DeDupe and a lot more options. As we start with the basic file server clustered and using ReFS and Data Duplication. This is a common scenario and can also be used in Azure.

Data Deduplication can effectively minimize the costs of a server application’s data consumption by reducing the amount of disk space consumed by redundant data. Before enabling deduplication, it is important that you understand the characteristics of your workload to ensure that you get the maximum performance out of your storage.

In this demo I have a two node cluster a quick create of the cluster. This is a demo for file services.

Create Sample Cluster :

#installing the File server and cluster features

Get-WindowsFeature Failover-Clustering
install-WindowsFeature "Failover-Clustering","RSAT-Clustering" -IncludeAllSubFeature
Restart-Computer –Computername Astack16n014,Astack16n015 –force
 
#Create cluster validation report
Test-Cluster -Node Astack16n014,Astack16n015
 
#Create cluster
New-Cluster -Name Astack16R5 -Node Astack16n014,Astack16n015 -NoStorage -StaticAddress "10.255.255.41"

 

image

Now that the Cluster is in place we can start with the basic of the file cluster, the disks need to be sharable so no local disks.

If you want to build a file server with local disk only then we should use storage spaces direct, I’ll use this in the next blog post.

We add a shared disk to the cluster. Enable the disk and format the disk.

imageimage

I format the disk with ReFS as this is the next file structure and has more options than NTFS.

The next iteration of ReFS provides support for large-scale storage deployments with diverse workloads, delivering reliability, resiliency, and scalability for your data. ReFS introduces the following improvements:
  • ReFS implements new storage tiers functionality, helping deliver faster performance and increased storage capacity. This new functionality enables:
    • Multiple resiliency types on the same virtual disk (using mirroring in the performance tier and parity in the capacity tier, for example).
    • Increased responsiveness to drifting working sets.
    • Support for SMR (Shingled Magnetic Recording) media.
  • The introduction of block cloning substantially improves the performance of VM operations, such as .vhdx checkpoint merge operations.
  • The new ReFS scan tool enables the recovery of leaked storage and helps salvage data from critical corruptions.

image

The disk is formatted and added to the cluster,showing as Available Storage.

image

Our next step would be Adding the File server role to the cluster.

image

image

The question here is is this a normal file server or do you want to build a sofs cluster. Currently SOFS is only supported for RDS UPD,Hyper-v,SQL. Comparing both SOFS and a file server.

SOFS = Active – Active File share

Fileserver = Active – Passive File share

We are sing the file server for general usage.

image 

Give your file server a name. Remember this is the netbios name and needs to be in the DNS!

imageimage

Default is a DHCP IP but I assume you will set this to fixed or make this static in the DHCP & DNS

image

Now that the file server and the disk is added to the cluster we can start the file Server and add some shares to this

add the file share.

image

image

When adding the file share we see this error “ client access point is not ready to be used for share creation”

This is a brand new File Server and already broken ? well no reading this error message it said we can’t access the netbios name

image

We we do properties on the file server you can see there is a DNS failure. It can’t add the server to the DNS or the registration is not correct.

Just make sure the name is in the DNS and a nslookup works.

image

When adding the file share you get a couple off options, and lets pick the SMB share Quick option

image

Get the file share location, this would be on the shared disk in the cluster. if there are no folders make the folder first.

imageimage

I Give the folder a name and put this to the right disk.

image

Here you can pick a couple of options and some are already tagged. I this case I only use access-based enumeration.

imageimage

The file server is ready. clients can connect. Access ACL must be set but this depends on the environment.

Our next step is enable Data Deduplication on this share. It is a new option in Server 2016. Want to know what is new in Windows Server 2016 https://docs.microsoft.com/en-us/windows-server/storage/whats-new-in-storage

Data Deduplication

Install Data Deduplication every node in the cluster must have the Data Deduplication server role installed.

To install Data Deduplication, run the following PowerShell command as an administrator:

Install-WindowsFeature -Name FS-Data-Deduplication

image

  • Recommended workloads that have been proven to have both datasets that benefit highly from deduplication and have resource consumption patterns that are compatible with Data Deduplication’s post-processing model. We recommend that you always enable Data Deduplication on these workloads:
    • General purpose file servers (GPFS) serving shares such as team shares, user home folders, work folders, and software development shares.
    • Virtualized desktop infrastructure (VDI) servers.
    • Virtualized backup applications, such as Microsoft Data Protection Manager (DPM).
  • Workloads that might benefit from deduplication, but aren’t always good candidates for deduplication. For example, the following workloads could work well with deduplication, but you should evaluate the benefits of deduplication first:
    • General purpose Hyper-V hosts
    • SQL servers
    • Line-of-business (LOB) servers
Before enabling the Data Deduplication we can first check and see if there any savings are by doing this.

Run this in a Command or powershell command where e:\data is or data location that we are using for the dedupe

C:\Windows\System32\DDPEval.exe e:\data

image

Even with a few files there is a saving.

get-volume -DriveLetter e

image

To enable the dedupe go to server manager , volumes and select the disk that need to be enabled.

image

Selecting the volume that needs Dedupe other volumes won’t be affected. It’s important to note that you can’t run data deduplication on boot or system volumes

imageimageimage

The setting of the # days can be changed in to something what suite you.

image

When enabling Deduplication, you need to set a schedule, and you can see above that you can set two different time periods, the weekdays and weekends and you can also enable background optimization to run during quieter periods, and for the rest it is all powershell there is no gui on this.

Get-Command -Module Deduplication will list all the powershell commands

image

Measure-DedupFileMetadata -Path e:\data

image

I places some of the same ISO files on the volume and as you can see there is a storage saving.

get get the data run an update on the dedupe status.

Update-DedupStatus -Volume e:

image

image

It is all easy to use and to maintain. If you have any cluster questions just go to https://social.technet.microsoft.com/Forums/windowsserver/en-US/home?forum=winserverClustering and I’m happy to help you there and also other community or microsoft guys are there.

 

Follow Me on Twitter @ClusterMVP

Follow My blog https://robertsmit.wordpress.com

Linkedin Profile Robert Smit MVP Linkedin profile

Google  : Robert Smit MVP profile

Bing  : Find me on Bing Robert Smit

LMGTFY : Find me on google Robert Smit

Author: Robert Smit [MVP]

Robert Smit is Senior Technical Evangelist and is a current Microsoft MVP in Clustering as of 2009. Robert has over 20 years experience in IT with experience in the educational, health-care and finance industries. Robert’s past IT experience in the trenches of IT gives him the knowledge and insight that allows him to communicate effectively with IT professionals who are trying to address real concerns around business continuity, disaster recovery and regulatory compliance issues. Robert holds the following certifications: MCT - Microsoft Certified Trainer, MCTS - Windows Server Virtualization, MCSE, MCSA and MCPS. He is an active participant in the Microsoft newsgroup community and is currently focused on Hyper-V, Failover Clustering, SQL Server, Azure and all things related to Cloud Computing and Infrastructure Optimalization. Follow Robert on Twitter @ClusterMVP Or follow his blog https://robertsmit.wordpress.com Linkedin Profile Http://nl.linkedin.com/in/robertsmit Robert is also capable of transferring his knowledge to others which is a rare feature in the field of IT. He makes a point of not only solving issues but also of giving on the job training of his colleagues. A customer says " Robert has been a big influence on our technical staff and I have to come to know him as a brilliant specialist concerning Microsoft Products. He was Capable with his in-depth knowledge of Microsoft products to troubleshoot problems and develop our infrastructure to a higher level. I would certainly hire him again in the future. " Details of the Recommendation: "I have been coordinating with Robert implementing a very complex system. Although he was primarily a Microsoft infrastructure specialist; he was able to understand and debug .Net based complext Windows applications and websites. His input to improve performance of applications proved very helpful for the success of our project

17 thoughts on “Clustering FileServer Data Deduplication on Windows 2016 Step by Step #sofs #winserv #ReFS #WindowsServer2016 #Dedupe”

  1. Hmmm but refs doesn’t support dedupe in server 2016? Am I missing something?

  2. as of build 1709 it is there.
    Data Deduplication
    •Data Deduplication now supports ReFS: You no longer must choose between the advantages of a modern file system with ReFS and the Data Deduplication: now, you can enable Data Deduplication wherever you can enable ReFS. Increase storage efficiency by upwards of 95% with ReFS.
    •DataPort API for optimized ingress/egress to deduplicated volumes: Developers can now take advantage of the knowledge Data Deduplication has about how to store data efficiently to move data between volumes, servers, and clusters efficiently.

  3. Ahh ok, so your using the newer version without desktop. Its just i dont think its mentioned anywhere in the blog post, normally windows server 2016 refers to the “stable/normal” branch. Anyway that explains it 🙂

  4. Thank for you answer. I dont find any of doc about it, can you give me the link? And for example 5 node , 3 vm on different nodes, for example – on 1,3,5. What node will start the job in this situation? Because now we have different resource owner, but one csv.

  5. CSV enables multiple nodes to have simultaneous read-write access to the same shared storage. When a node performs disk input/output (I/O) on a CSV volume, the node communicates directly with the storage, for example, through a storage area network (SAN). However, at any time, a single node (called the coordinator node) “owns” the physical disk resource that is associated with the LUN. The coordinator node for a CSV volume is displayed in Failover Cluster Manager as Owner Node under Disks.
    So It runs on the coordinator node that hold the CSV not the VM
    https://docs.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs
    https://support.microsoft.com/en-us/help/2906888/known-issues-after-you-enable-data-deduplication-on-csv

  6. This post should really be taken offline.

    ReFS is the wrong choice for a general purpose fileserver. It’s the right choice for storage of virtual disks, SQL databases, backups (like from Veeam), …

    ReFS without Storage Spaces (Direct) is an excellent path to total data loss. ReFS will happily delete data if it determines there’s some corruption – except with Storage Spaces (Direct) where ReFS can repair data rather than just deleting it.

    Why deduplication? In the typical use case where you want to use ReFS the block cloning feature is more useful than deduplication. Although not as efficient as true deduplication you can save a lot of data and it’s a lot more performant. For that use case. Oh, if you enable dedup on an ReFS volume block cloning no longer works.

  7. Hi Tom, thanks for your comments , remember this was an post for Server 2016 and in that time it was all new, now days in server 2022 things are more evolved and storage spaces direct are the best example on how REFS is used

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: