How to create an Azure Windows Server FCI File Cluster If you don’t want to use Azure Files. #Winserv #Azure #Azurefiles #netapp #oldskool

In the past I build a lot of how to build stuff on a cluster or troubleshooting can’t think of any thing or I did add this on a cluster, but with Azure this whole workload went to the past.

A bit feels it that Windows server FCI is a legacy feature but is it ? well lots of items are still using this and not everyone is in the cloud.

But what if you still want to build a cluster in Azure. yes SQL  – AlwaysOn is still a good and valid option. But talking a failover file server ? or some other easy workload. Well in this blog I show you how to build this cluster and the workload is up to you. For a long time it was not possible to create a FCI in Azure as there where no shared disk available and If you want to build a FCI you need some extra software from SIOS.  https://us.sios.com/

In this post I create a Two node Failover Cluster FCI.  with a file server role

So what  do we need to build a cluster in Azure

  • Two Windows server 2019 Vm’s
  • atleast one Shared premium disk
  • Azure Internal loadbalancer
  • Some time 

Building the 2 Vm’s and domain joined need no explanation, If you need help just post a comment and I will help. 

Two Azure VM’s mine are deallocated for now for a reason, as we need to adjust the disk and this can only be done when the vm is deallocated.

  image

This is just a basic VM one network card.  but make sure you choose a SKU that support a Premium SSD ! with out that it won’t run and Size does matter.

image

In my created VM I use a 256 GB disk I may not need this size but it is the minimum supported disk for creating a cluster

image

Enabling shared disks is only available to a subset of disk types. Currently only ultra disks and premium SSDs can enable shared disks. Each managed disk that have shared disks enabled are subject to the following limitations https://docs.microsoft.com/en-us/azure/virtual-machines/linux/disks-shared?WT.mc_id=AZ-MVP-4025011

as you can see there is a MaxShares list For each disk, you can define a maxShares value that represents the maximum number of nodes that can simultaneously share the disk. For example, if you plan to set up a 2-node failover cluster, you would set maxShares=2. The maximum value is an upper bound. Nodes can join or leave the cluster (mount or unmount the disk) as long as the number of nodes is lower than the specified maxShares value

The maxShares value can only be set or edited when the disk is detached from all nodes that is why my VM’w are deallocated for now.

image

How to create such a Shared disk  There are multiple ways create a disk in the disk blade. or run a powershell script it’s all up to you

image

creating the disk in the portal is quick and easy but it can also be done in a ARM or posh or CLI script. Personally I use often PowerShell instead of ARM. 

image

In the Advanced options there you can enable this shared disk setting

image

There is no other GUI method that can set this

Or if you have already created and added this disk to a node you can create another disk on that node. But remember that does not enable the Maxshared option.

image

image

A resize does not help you.

image

There is no option to set this afterwards in the Portal keep that in mind. you can only set this with powershell

Sample Idea.  in my case

$vmDisks1 = get-azdisk -ResourceGroupName rg-cluster01 -DiskName demo01
$vmDisks1.MaxShares=2
$vmDisks1 | Update-AzDisk

image

as the error show the disk need to be detached.  of all machines!

Ok now that the Disk has changed or recreated and has the setting maxshared=2

We first go to node001 and add the disk to that node

image

Make sure you attach the same disk to both nodes as this disk was configured as a shared disk

image

Keep in mind creating the disk here does not enable the MaxShares

image

now on the second node we add the same disk as it is a shared disk you can see this now 1 used and one share is open. And remember the VM’s need to be deallocated !!

now that the disk is been added to both nodes we can start to build our cluster

After the VM’s are started we install the failover and the file server feature see also my other cluster blogs https://robertsmit.wordpress.com/2018/11/29/step-by-step-windows-server-2019-file-server-clustering-with-powershell-or-gui-cluster-ha-azure-windowsadmincenter-windowsserver2019/

Install-WindowsFeature –Name Failover-Clustering,file-services –IncludeManagementTools

or do this in the GUI. or run this from the domain member server in my case the Dc

$nodes = ("node001","node002")
Invoke-Command  $nodes {Install-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools}

Now building the Cluster with the wizard is not the best method. As in this case we want to set some different options than default.

image

The distributed network name (DNN) replaces the virtual network name (VNN) as the connection point when used with an Always On failover cluster instance on SQL Server VMs. This negates the need for an Azure Load Balancer routing traffic to the VNN, simplifying deployment, maintenance, and improving failover.

With an FCI deployment, the VNN still exists, but the client connects to the DNN DNS name instead of the VNN name.

image

Limitations

  • Currently, a DNN with FCI is supported only for SQL Server 2019 CU2 and later on Windows Server 2016 and later.
  • There might be more considerations when you’re working with other SQL Server features and an FCI with a DNN. For more information, see FCI with DNN interoperability.

https://docs.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/failover-cluster-instance-distributed-network-name-dnn-configure

Distributed server name as CNO this is perfect for SQL workloads

The big difference is that now the CNO is not an DNN

New-Cluster -Name AzCluster001 -Node ("node001","node002") –StaticAddress 10.80.0.100 -NoStorage -ManagementPointNetworkType Singleton |Set-ClusterQuorum -NodeAndFileShareMajority \\RDSDC01\cluster

image

The Static IP Address that you appoint to the CNO is not for network communication. The only purpose is to bring the CNO online due to the dependency request. Therefore, you cannot ping that IP, cannot resolve DNS name, and cannot use the CNO for management since its IP is an unusable IP.

Now that we have created the cluster and set the Fileshare Witness we can make the preparations for the file server

Adding the Disks

image

Before we move on we first add a Azure internal load ballancer. this is needed for the access in the azure subscription.

For creating a loadbalancer we need a loadbalancer and configure the backend pools with a health probe configured to a load balancing rule.

image

Creating a new loadbalancer is just a quick process but make sure you choose a Internal and a standard one

And place this LB also on the network where the Cluster nodes are.

image

In the backend pool we added both VM’s that are the cluster noded.

image

Press save and the cluster nodes are added to the loadbalancer.

image

In the loadbalancer we need to create a probe that is checking the port and as we are using a file server that is SMB traffic we use the SMB 445 port

 image

Set the interval to 10- seconds and you can keep the rest default – I changed the threshold to 31

image

last we make a loadballancer rule give this a name and add the backend pool to this.

image

and the health probe that we just created is also attached.

image

keep the floating IP on disabled

Now that the Load Balancer is in place we can create the File server role in the Cluster, You can do all this in random order but the powershell script at the end of this blog must run after you configure all of this.

Doing this in the wizard or PowerShell makes the different here, as we need the file server based on a DNS record that’s why we made the Azure LB. We do this with PowerShell

Add-ClusterFileServerRole -Storage "Cluster Disk 1" -Name FS01 -StaticAddress 10.80.0.211

Remember here the IP that is the same IP that is been used in the Azure Load balancer!

But remember, that IP Address is the same unusable IP address as the CNO’s IP. (Cluster IP) You can use it to bring the resource online but that is not a real IP for network communication. If this is a File Server, none of the VMs except the owner node of this VCO can access the File Share.  The way Azure networking works is that it will loop the traffic back to the node it was originated from.  So it works only on the node where the resource is running.

image

The Continuous availability is not supported in Azure.

Our next step is creating the File shares. and test the file server.  Using the create file share in the Cluster is not working create the file share on the node that holds the Cluster disk.  as it may work for you now but as soon as we configured the rest it will not work any more !!

image

Testing the file share on node 2 and it worked.

as you can see it works BUT you can see I’m logged in into node 2 and test also from node 2.  moving the role to node 1 it breaks the file server. 

as Azure can’t handle this we need to implement a little fix in PowerShell.

image

keep in mind that Pinging the CNO or the VCO will not work, as the cluster needs an IP to start but has no function further.

imageimage

get the cluster properties

image

So the cluster is running and the fileserver is running but you can only connect on the node where the file share is hosted, That is not how it should work.

We need to utilize the Load Balancer in Azure so this IP Address is able to communicate with other machines in order to achieving the client-server traffic. This can only be don with PowerShell  

Load Balancer is an Azure IP resource that can route network traffic to different Azure VMs. The IP can be a public facing VIP, or internal only. Each VM needs have the endpoint(s) so the Load Balancer knows where the traffic should go. In the endpoint, there are two kinds of ports. The first is a Regular port and is used for normal client-server communications.

We used port 445 is for SMB file sharing  Another kind of port is a Probe port. The default port number for this is 59999. Probe port’s job is to find out which is the active node that hosts the VCO (Fileserver) in the Cluster. Load Balancer sends the probe pings over TCP port 59999 to every node in the cluster, by default, every 10 seconds. When you configure a role in Cluster on an Azure VM, you need to know out what port(s) the application uses because you will need to add the port(s) to the endpoint. Then, you add the probe port to the same endpoint. After that, you need update the parameter of VCO’s IP address to have that probe port. Finally, Load Balancer will do the similar port forward task and route the traffic to the VM who owns the VCO.

Setting this for our File Cluster and here comes the complicated part, If you have only one nic it is easy the default is cluster network 1

getting the IP resource Name can be found   get-clusterresourcename

image image

***here is a different IP 150 as took later the screenshot and rebuild this a couple of times for the blog*

$ClusterNetworkName = “Cluster Network 1”
$IPResourceName = “IP Address 10.80.0.0”

# The IP address that is used in the Load balancer that should be the same than on the Fileserver cluster role.

$ILBIP = “10.80.0.150”
$params = @{"Address"="$ILBIP";
          "ProbePort"="59999";
          "SubnetMask"="255.255.255.255";
          "Network"="$ClusterNetworkName";
          "OverrideAddressMatch"=1;
          "EnableDhcp"=0}
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple $params

Running this should set everything to work.

image

WARNING: The properties were stored, but not all changes will take effect until IP Address 10.80.0.211 is taken offline
and then online again. So I Stopped the Cluster and Started it again

image

A quick test on my domain controller and test server and it all worked.

As you can see it is rather complicated to run a file cluster in Azure and the question is why ? there are better options for this as netapp files.

https://robertsmit.wordpress.com/2019/08/01/starting-with-azure-netapp-files-is-it-better-than-storage-spaces-direct-in-azure-azure-netapp-storagespaces-s2d-diskspd-wvd-cloud-mvpbuzz-wimvp/

Or using Azure files with Azure AD  Support 

Step By Step Azure Files share SMB with native AD support

https://robertsmit.wordpress.com/2020/05/11/step-by-step-azure-files-share-smb-with-native-ad-support-and-more-microsoft-azurefiles-smb-snapshotmanagement-azure-cloud-mvpbuzz-wimvp/

Sometimes you need just the cloud mind and step away from what you have. live can get easier and less management.

Thanks for your Support and If you use this let ne know why just a quick post in the comments Thanks!

Follow Me on Twitter @ClusterMVP

Follow My blog https://robertsmit.wordpress.com

Linkedin Profile Robert Smit MVP Linkedin profile

Google  : Robert Smit MVP profile

 

Windows 2008R2 Cluster hotfixes

 

Recently I saw some clusters that are not patched and or where not aware that there were hotfixes.

So here is a list of required and optional Windows 2008R2 SP1 hotfixes.

This list is found on several pages on the web and could be handy.

 

Windows & Hyper-V : Required Hotfixes

Validate SCSI Device Vital Product Data (VPD) test fails after you install Windows Server 2008 R2 SP1

http://support.microsoft.com/kb/2531907 (required for 3+ node Hyper-V clusters)

The network connection of a running Hyper-V virtual machine may be lost under heavy outgoing network traffic on a computer that is running Windows Server 2008 R2 SP1

http://support.microsoft.com/kb/2263829

The Cluster service stops unexpectedly on a Windows Server 2008 R2 failover cluster node when you perform multiple backup operations in parallel on a cluster shared volume

http://support.microsoft.com/kb/2494162 

MPIO failover fails on a computer that is running Windows Server 2008 R2

http://support.microsoft.com/kb/2460971

The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2

http://support.microsoft.com/kb/2522766

Performance decreases in Windows Server 2008 R2 when the Hyper-V role is installed on a computer that uses Intel Westmere or Sandy Bridge processors

http://support.microsoft.com/kb/2517329 

Stop error 0x0000007a occurs on a virtual machine that is running on a Windows Server 2008 R2-based failover cluster with a cluster shared volume, and the state of the CSV is switched to redirected access.

http://support.microsoft.com/kb/2494016 

Optional Hotfixes

An update is available for Hyper-V Best Practices Analyzer for Windows Server 2008 R2

http://support.microsoft.com/kb/2485986

“0x0000009E” Stop error when you add an extra storage disk to a failover cluster in Windows Server 2008 R2

http://support.microsoft.com/kb/2520235

A virtual machine online backup fails in Windows Server 2008 R2 when the SAN policy is set to Offline All

http://support.microsoft.com/kb/2521348

Cluster node cannot rejoin the cluster after the node is restarted or removed from the cluster in Windows Server 2008 R2

http://support.microsoft.com/kb/2549472

Cluster service stops when an error occurs in the registry replication process of a failover cluster in Windows Server 2008 R2 or in Windows Server 2008

http://support.microsoft.com/kb/2496034

0×20001 Stop error when you start a Linux VM in Windows Server 2008 R2 SP1

http://support.microsoft.com/kb/2550569

A heap memory leak occurs when an application or service queries the MSCluster_Resource WMI class in Windows Server 2008 R2

http://support.microsoft.com/kb/2580360

Cluster service initiates a failover after a delay of about 80 seconds when you shutdown the active node in Windows Server 2008 R2

http://support.microsoft.com/kb/2575625/en-us?sd=rss&spid=14134

New registration entries are added to the Persistent Reservation table when the physical disk resource that is associated with the CSV is taken offline on a Windows Server 2008 R2-based Failover Cluster

http://support.microsoft.com/kb/2579052/en-us?sd=rss&spid=14134

A transient communication failure causes a Windows Server 2008 R2 failover cluster to stop working

http://support.microsoft.com/kb/2550886

Cluster service leaks memory when the service handles state change notifications in Windows Server 2008 R2 or Windows Server 2008

http://support.microsoft.com/kb/2550894

Hyper-V Export function consumes all available memory in Windows Server 2008 or in Windows Server 2008 R2

http://support.microsoft.com/kb/2547551

Microcode update for Intel processors in Windows 7 or in Windows Server 2008 R2

http://support.microsoft.com/kb/2493989

Corrupted VSS snapshot

http://support.microsoft.com/hotfix/KBHotfix.aspx?kbnum=975688&kbln=en-us

FIX: The guest operating system may crash (STOP 0xd) when you perform a live migration of Hyper-V virtual machines in a Windows Server 2008 R2 environment

http://support.microsoft.com/kb/2636573

What is CAU ? Cluster Update Automation with CAU

#CAU is a great new feature but how does it fit in your infrastructure ?

I have already a WSUS server and I use SCCM ,and I use WSUS for my DTAP environment, and now Do I need another WSUS server ? or can I reuse the old WSUS ?

WSUS 3.0SP2 (on W2K8R2): not yet compatible with Windows Server 2012

You can’t use SCCM to pull the Updates.

So basically install a downstream server for the CAU or primary wsus, if you have more WSUS servers you can sync the updates with powershell to hold the same info on all your other servers.

 

  • Single-click launch of cluster-wide updating operation
  • Or a single PS cmdlet
  • “Updating Run”image
  • Physical or VM clusters
  • CAU scans, downloads and installs applicable updates on each node
  • Restarts node as necessary
  • One node at a time
  • Repeats for all cluster nodes
  • Customize pre-update & post-update behavior with PS scripts

 

  • Updates (GDRs) from Windows Update or WSUS
  • Hotfixes (QFEs) from a local File Share
  • Simple customization that installs almost any software update off a local File Share

 

 

 

 

 

 

 

image

  • Adds CAU clustered role
  • Just like any other clustered workload
  • Resilience to planned and unplanned failures
  • Not mutually exclusive with on-demand updating
  • Analogy: Windows Update scan on your PC with AU auto-install
  • But possible conflicts with Updating Runs in progress
  • “Configured, but on hold” functionality
  • Compatible with VCO Prestaging

image

Powershell usage :

Sample: fill in the cluster name and the wsus share.

 

Invoke-CauScan -ClusterName CONTOSO-FC1 -CauPluginName Microsoft.WindowsUpdatePlugin, Microsoft.HotfixPlugin -CauPluginArguments @{}, @{ ‘HotfixRootFolderPath’ = ‘\\CauHotfixSrv\shareName’; ‘HotfixConfigFilePath’ = ‘\\CauHotfixSrv\shareName\DefaultHotfixConfig.xml’ } -RunPluginsSerially -Verbose
Invoke-CauRun -ClusterName CONTOSO-FC1 -CauPluginName Microsoft.WindowsUpdatePlugin, Microsoft.HotfixPlugin -CauPluginArguments @{ ‘IncludeRecommendedUpdates’ = ‘True’ }, @{ ‘HotfixRootFolderPath’ = ‘\\CauHotfixSrv\shareName’;  ‘HotfixConfigFilePath’ = ‘\\CauHotfixSrv\shareName\DefaultHotfixConfig.xml’ } -MaxRetriesPerNode 2  -StopOnPluginFailure –Force

 

Options: RunPluginsSerially, StopOnPluginFailure, SeparateReboots

  • CAU supports only Windows Server 2012 clusters
  • Can be installed on Windows 8 Client RSAT package

Make CAU the only tool updating the cluster
Concurrent updates by other tools: e.g., WSUS, WUA, SCCM might cause downtime

For a WSUS-based deployment:

WSUS 4.0: needs a workaround with Beta builds (only) http://social.technet.microsoft.com/wiki/contents/articles/7891.how-wsus-and-cluster-aware-updating-are-affected-by-windows-server-8-beta-updates.aspx 
WSUS 3.0SP2 (on W2K8R2): not yet compatible with Windows Server 2012

Think about firewalls on nodes!
Windows Firewall Beta (or non-Windows firewall): create a firewall rule and enable it for domain-scope, wininit.exe program, dynamic RPC endpoints, TCP protocol
Windows Firewall RC: Enable the "Remote Shutdown" firewall rule group for the Domain profile, or pass the “-EnableFirewallRules” parameter to Invoke-CauRun, Add-CauClusterRole or Set-CauClusterRole cmdlets
Make sure GPOs agree

CAU: Understand and Troubleshoot Guide: http://www.microsoft.com/download/en/details.aspx?id=29015

CAU Scenario Overview: http://technet.microsoft.com/en-us/library/hh831694.aspx

CAU Windows PowerShell cmdlets
‘Update-Help’ downloads the full cmdlet help for CAU cmdlets
Online: http://go.microsoft.com/fwlink/p/?LinkId=237675

Starting with Cluster-Aware Updating: Self-Updating: http://blogs.technet.com/b/filecab/archive/2012/05/17/starting-with-cluster-aware-updating-self-updating.aspx

Virtual Machine Density Flexibility in Windows Server 2008 R2 Failover Clustering

Recently Windows Server 2008 R2 Failover Clustering has changed the support statement for the maximum number of Virtual Machines (VMs) that can be hosted on a failover cluster from 64 VMs per node to 1,000 VMs per cluster.  This article reflects the new policy in Hyper-V: Using Hyper-V and Failover Clustering.

Supporting 1000 VMs will enable increased flexibility to utilize hardware that has the capacity to host more VMs per physical server while maintaining the high availability and management components that Failover Clustering provides. 

Number of Nodes in Cluster

Max Number of VMs per Node

Average Number of VMs per active Node

Max # VMs in Cluster

2 Nodes (1 active + 1 failover)

384

384

384

3 Nodes (2 active + 1 failover)

384

384

768

4 Nodes (3 active + 1 failover)

384

333

1000

5 Nodes (4 active + 1 failover)

384

250

1000

6 Nodes (5 active + 1 failover)

384

200

1000

7 Nodes (6 active + 1 failover)

384

166

1000

8 Nodes (7 active + 1 failover)

384

142

1000

9 Nodes (8 active + 1 failover)

384

125

1000

10 Nodes (9 active + 1 failover)

384

111

1000

11 Nodes (10 active + 1 failover)

384

100

1000

12 Nodes (11 active + 1 failover)

384

90

1000

13 Nodes (12 active + 1 failover)

384

83

1000

14 Nodes (13 active + 1 failover)

384

76

1000

15 Nodes (14 active + 1 failover)

384

71

1000

16 Nodes (15 active + 1 failover)

384

66

1000

 

Note: There is no requirement to have a node without any VMs allocated as a “passive node”.  All nodes can host VMs and have the equivalent to 1 node of capacity unallocated (total, across all the nodes) to allow for placement of VMs if a node fails or is taken out of active cluster membership for activities like patching or performing maintenance. 

It is important to perform proper capacity planning that takes into consideration the capabilities of the hardware and storage to host VMs, and the total resources that the individual VMs require, while still having enough reserve capacity to host VMs in the event of a node failure to prevent memory over commitment.  The same base guidance of Hyper-V configuration and limits of a maximum number of VMs supported per physical server still apply.  This currently states that no node can host more than 384 running VMs at any given time, and that the hardware scalability should not exceed 4 virtual processors per VM and no more than 8 virtual processors per logical processor.  Review this Technet article on VM limits and requirements: Requirements and Limits for Virtual Machines in Hyper-V in Windows Server 2008 R2

Here are some Frequently Asked Questions:

1. Is there a hotfix or service pack required to have this new limit? 

a. No, this support policy change based on extra testing we have performed to verify that the cluster retains its ability to health detect and failover VMs with these densities.  There are no changes or updates required.

2. 64 VMs per node on a 16 node cluster equals 1024 VMs, so aren’t you actually decreasing the density for a 16 node cluster? 

a. No, the previous policy was to have 64 VMs per node in addition to one nodes equivalent of reserve capacity, which is 15 nodes x 64 VMs which equals 960 with the spare capacity of a passive node.  This policy slightly increases the density for a 16 node cluster an
d the density for an 8 node cluster is more than twice and a 4 node cluster more than 4-times as high as before.

3. Does this include Windows Server 2008 clusters?

a.  This change is only for Windows Server 2008 R2 clusters.

4. Why did you make this change?

a. We are responding to our customers’ requests to have flexibility in the number of nodes and the number of VMs that can be hosted.  For VMs running workloads that have relatively small demand of VM and storage resources, customers would like to place more VMs on each server to maximize their investiments and lower the management costs.  Other customers want the flexibility of having more nodes and fewer VMs. 

5. Does this mean I can go and put 250 VMs on my old hardware?

a. Understanding the resources that your hardware can provide and the requirements of your VMs is still the most important thing in identifying the capacity of your cluster or the specific Hyper-V servers.    Available RAM and CPU resources are relatively easy to calculate, but another important part of the equation is capacity of the SAN/Storage.  Not just how many GB or TB of data it can store, but can it handle the I/O demands with reasonable performance?  1000 VMs can potentially produce a significant amount of I/O demand, and the exact amount will depend on what is running inside the VMs.  Monitoring the storage performance is important to understand the capacity of the solution.

Source :http://blogs.msdn.com/b/clustering/archive/2010/06/28/10031803.aspx