Building a File server in Server 2016 isn’t that different tan in Server 2012R2 except there are different options, ReFS, DeDupe and a lot more options. As we start with the basic file server clustered and using ReFS and Data Duplication. This is a common scenario and can also be used in Azure.
Data Deduplication can effectively minimize the costs of a server application’s data consumption by reducing the amount of disk space consumed by redundant data. Before enabling deduplication, it is important that you understand the characteristics of your workload to ensure that you get the maximum performance out of your storage.
In this demo I have a two node cluster a quick create of the cluster. This is a demo for file services.
Create Sample Cluster :
#installing the File server and cluster features
Get-WindowsFeature Failover-Clustering install-WindowsFeature "Failover-Clustering","RSAT-Clustering" -IncludeAllSubFeature Restart-Computer –Computername Astack16n014,Astack16n015 –force
#Create cluster validation report Test-Cluster -Node Astack16n014,Astack16n015
New-Cluster -Name Astack16R5 -Node Astack16n014,Astack16n015 -NoStorage -StaticAddress "10.255.255.41"
Now that the Cluster is in place we can start with the basic of the file cluster, the disks need to be sharable so no local disks.
If you want to build a file server with local disk only then we should use storage spaces direct, I’ll use this in the next blog post.
We add a shared disk to the cluster. Enable the disk and format the disk.
I format the disk with ReFS as this is the next file structure and has more options than NTFS.
The next iteration of ReFS provides support for large-scale storage deployments with diverse workloads, delivering reliability, resiliency, and scalability for your data. ReFS introduces the following improvements:
- ReFS implements new storage tiers functionality, helping deliver faster performance and increased storage capacity. This new functionality enables:
- Multiple resiliency types on the same virtual disk (using mirroring in the performance tier and parity in the capacity tier, for example).
- Increased responsiveness to drifting working sets.
- Support for SMR (Shingled Magnetic Recording) media.
- The introduction of block cloning substantially improves the performance of VM operations, such as .vhdx checkpoint merge operations.
- The new ReFS scan tool enables the recovery of leaked storage and helps salvage data from critical corruptions.
The disk is formatted and added to the cluster,showing as Available Storage.
Our next step would be Adding the File server role to the cluster.
The question here is is this a normal file server or do you want to build a sofs cluster. Currently SOFS is only supported for RDS UPD,Hyper-v,SQL. Comparing both SOFS and a file server.
SOFS = Active – Active File share
Fileserver = Active – Passive File share
We are sing the file server for general usage.
Give your file server a name. Remember this is the netbios name and needs to be in the DNS!
Default is a DHCP IP but I assume you will set this to fixed or make this static in the DHCP & DNS
Now that the file server and the disk is added to the cluster we can start the file Server and add some shares to this
add the file share.
When adding the file share we see this error “ client access point is not ready to be used for share creation”
This is a brand new File Server and already broken ? well no reading this error message it said we can’t access the netbios name
We we do properties on the file server you can see there is a DNS failure. It can’t add the server to the DNS or the registration is not correct.
Just make sure the name is in the DNS and a nslookup works.
When adding the file share you get a couple off options, and lets pick the SMB share Quick option
Get the file share location, this would be on the shared disk in the cluster. if there are no folders make the folder first.
I Give the folder a name and put this to the right disk.
Here you can pick a couple of options and some are already tagged. I this case I only use access-based enumeration.
The file server is ready. clients can connect. Access ACL must be set but this depends on the environment.
Our next step is enable Data Deduplication on this share. It is a new option in Server 2016. Want to know what is new in Windows Server 2016 https://docs.microsoft.com/en-us/windows-server/storage/whats-new-in-storage
Install Data Deduplication every node in the cluster must have the Data Deduplication server role installed.
To install Data Deduplication, run the following PowerShell command as an administrator:
Install-WindowsFeature -Name FS-Data-Deduplication
- Recommended workloads that have been proven to have both datasets that benefit highly from deduplication and have resource consumption patterns that are compatible with Data Deduplication’s post-processing model. We recommend that you always enable Data Deduplication on these workloads:
- General purpose file servers (GPFS) serving shares such as team shares, user home folders, work folders, and software development shares.
- Virtualized desktop infrastructure (VDI) servers.
- Virtualized backup applications, such as Microsoft Data Protection Manager (DPM).
- Workloads that might benefit from deduplication, but aren’t always good candidates for deduplication. For example, the following workloads could work well with deduplication, but you should evaluate the benefits of deduplication first:
- General purpose Hyper-V hosts
- SQL servers
- Line-of-business (LOB) servers
Before enabling the Data Deduplication we can first check and see if there any savings are by doing this.
Run this in a Command or powershell command where e:\data is or data location that we are using for the dedupe
Even with a few files there is a saving.
get-volume -DriveLetter e
To enable the dedupe go to server manager , volumes and select the disk that need to be enabled.
Selecting the volume that needs Dedupe other volumes won’t be affected. It’s important to note that you can’t run data deduplication on boot or system volumes
The setting of the # days can be changed in to something what suite you.
When enabling Deduplication, you need to set a schedule, and you can see above that you can set two different time periods, the weekdays and weekends and you can also enable background optimization to run during quieter periods, and for the rest it is all powershell there is no gui on this.
Get-Command -Module Deduplication will list all the powershell commands
Measure-DedupFileMetadata -Path e:\data
I places some of the same ISO files on the volume and as you can see there is a storage saving.
get get the data run an update on the dedupe status.
Update-DedupStatus -Volume e:
It is all easy to use and to maintain. If you have any cluster questions just go to https://social.technet.microsoft.com/Forums/windowsserver/en-US/home?forum=winserverClustering and I’m happy to help you there and also other community or microsoft guys are there.
Follow Me on Twitter @ClusterMVP
Follow My blog https://robertsmit.wordpress.com
Linkedin Profile Robert Smit MVP Linkedin profile
Google : Robert Smit MVP profile
Bing : Find me on Bing Robert Smit
LMGTFY : Find me on google Robert Smit
17 thoughts on “Clustering FileServer Data Deduplication on Windows 2016 Step by Step #sofs #winserv #ReFS #WindowsServer2016 #Dedupe”
Hmmm but refs doesn’t support dedupe in server 2016? Am I missing something?
as of build 1709 it is there.
•Data Deduplication now supports ReFS: You no longer must choose between the advantages of a modern file system with ReFS and the Data Deduplication: now, you can enable Data Deduplication wherever you can enable ReFS. Increase storage efficiency by upwards of 95% with ReFS.
•DataPort API for optimized ingress/egress to deduplicated volumes: Developers can now take advantage of the knowledge Data Deduplication has about how to store data efficiently to move data between volumes, servers, and clusters efficiently.
Ahh ok, so your using the newer version without desktop. Its just i dont think its mentioned anywhere in the blog post, normally windows server 2016 refers to the “stable/normal” branch. Anyway that explains it 🙂
Can Data Deduplication be enabled in a SOFS?
Forgot to mention that the Virtual environment is VMWare
yes as you can see in my blog I use a SOFS also but you need server 2016 or higher.
it has nothing to do with the layer below this is done in the VM.
Thank you, Robert
If i have 4 node cluster which node will run dedup on csv volume? I cant find any information about it.
it will run on the node that has the active target, you can see this in the cluster manager who is the current owner of the CSV.
Thank for you answer. I dont find any of doc about it, can you give me the link? And for example 5 node , 3 vm on different nodes, for example – on 1,3,5. What node will start the job in this situation? Because now we have different resource owner, but one csv.
CSV enables multiple nodes to have simultaneous read-write access to the same shared storage. When a node performs disk input/output (I/O) on a CSV volume, the node communicates directly with the storage, for example, through a storage area network (SAN). However, at any time, a single node (called the coordinator node) “owns” the physical disk resource that is associated with the LUN. The coordinator node for a CSV volume is displayed in Failover Cluster Manager as Owner Node under Disks.
So It runs on the coordinator node that hold the CSV not the VM
This post should really be taken offline.
ReFS is the wrong choice for a general purpose fileserver. It’s the right choice for storage of virtual disks, SQL databases, backups (like from Veeam), …
ReFS without Storage Spaces (Direct) is an excellent path to total data loss. ReFS will happily delete data if it determines there’s some corruption – except with Storage Spaces (Direct) where ReFS can repair data rather than just deleting it.
Why deduplication? In the typical use case where you want to use ReFS the block cloning feature is more useful than deduplication. Although not as efficient as true deduplication you can save a lot of data and it’s a lot more performant. For that use case. Oh, if you enable dedup on an ReFS volume block cloning no longer works.
Hi Tom, thanks for your comments , remember this was an post for Server 2016 and in that time it was all new, now days in server 2022 things are more evolved and storage spaces direct are the best example on how REFS is used