There are a lot of good blog post about how to setup your Availability group, in two blog post I will try to break the basic setup and will show you what you should not do in your production environment. Just because you can does not mean I should do this.
This blog post will also show you most common errors and how to fix them and where to find the errors, but in the end you will have a working two node cluster and one Availability group
First how to setup an Availability group to make things more complex there are multiple instances, see how they look Naming convention is really important when you do complex configurations. an typo is quickly made!
Deploying AlwaysOn Availability Groups requires a Windows Server Failover Clustering (WSFC) cluster. To be enabled for AlwaysOn Availability Groups, an instance of SQL Server must reside on a WSFC node, and the WSFC cluster and node must be online. Furthermore, each availability replica of a given availability group must reside on a different node of the same WSFC cluster. The only exception is that while being migrated to another WSFC cluster, an availability group can temporarily straddle two clusters.
AlwaysOn Availability Groups relies on the Windows Failover Clustering (WSFC) cluster to monitor and manage the current roles of the availability replicas that belong to a given availability group and to determine how a failover event affects the availability replicas. A WSFC resource group is created for every availability group that you create. The WSFC cluster monitors this resource group to evaluate the health of the primary replica.
The quorum for AlwaysOn Availability Groups is based on all nodes in the WSFC cluster regardless of whether a given cluster node hosts any availability replicas. In contrast to database mirroring, there is no witness role in AlwaysOn Availability Groups.
The overall health of a WSFC cluster is determined by the votes of quorum of nodes in the cluster. If the WSFC cluster goes offline because of an unplanned disaster, or due to a persistent hardware or communications failure, manual administrative intervention is required. A Windows Server or WSFC cluster administrator will need to force a quorum and then bring the surviving cluster nodes back online in a non-fault-tolerant configuration.
Primary on an FCI with a replica on a different FCI
I have a lot of SQL instances and this all runs on a two node Cluster and not all instances are installed on both nodes to trick the installation and to show you the errors you can expect.
Enabling the AlwaysOn and you can see the Difference the new AG Wizard is not grayed out any more .
Starting the Wizard and on system Databases it wil not work AG will only work on your own DB !
You must make a full backup of your DB before you start ( this is always handy )
Now we can add a new replica Check the Server and as it is a Cluster you can not set it to automatic failover.
We do setup a Data share for the replication
Now that we have completed the wizard we do the validation and go for the finish.
An Error ? checking the location ?? eh what should the DB be on the same location ?? not all my SQL server Cluster are the same and are not using all the drive letters. and as I choose to do this on the same cluster ( not supported ) I can not give the other instance the same drive letter. but hat if I had an other cluster and even then I did not have the same Drive letter. Is there a wizard bypass some where. Wizards are nice If you have a default installation. If not Plan B.
TITLE: Microsoft SQL Server Management Studio
Checking for compatibility of the database file location on the secondary replica resulted in an error. (Microsoft.SqlServer.Management.HadrTasks)
The following folder locations do not exist on the server instance that hosts secondary replica MVPSQL201402\SQL2:
So placing this on a CSV SQL server 2014
Well if drive letters is an issue SQL Server 2014 can store the DB on a CSV so no more drive letters.
And It passed the Validation that is Great.
Great thinking but.. the CSV is connected to all SQL servers So the next error is logical.
Yes the DB is already there.. what now ?
Manually Creating an Availability group for a SQL Server 2014 FCI
This sounds great but where to start ? should I bing It ? Let Me Bing That For You!
Well I Create it with a SQL script ( I’m no SQL master ) So things can be different.
However you can run this in a SQL CMD but here I do this step by Step.
I have My SQL Availibility group name, My DB name,IP, Servers
CREATE AVAILABILITY GROUP SQL001AG04
WITH (AUTOMATED_BACKUP_PREFERENCE = SECONDARY)
FOR DATABASE AG04 –, …
REPLICA ON — primary:
N’MVPSQL201401\sql001′ WITH (ENDPOINT_URL = N’TCP://MVPSQL201401.mvp.local:5023′,
FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),
N’MVPSQL201402\SQL2′ WITH (ENDPOINT_URL = N’TCP://MVPSQL201402.mvp.local:5022′,
FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO));
ALTER AVAILABILITY GROUP SQL001AG04
ADD LISTENER N’MVPLST01′
(WITH IP ((N’10.255.255.69′, N’255.255.255.0′)), PORT=1433);
BACKUP DATABASE AG04 TO DISK = ‘\\mvpfsw01\SQLAG04\AG04.bak’
WITH INIT, COPY_ONLY, COMPRESSION;
BACKUP LOG AG04 TO DISK = ‘\\mvpfsw01\SQLAG04\AG04.trn’
WITH INIT, COMPRESSION;
Now we go to the replica server and run the script below.
—————- Run this on The Replica Server!!!!!!!
ALTER AVAILABILITY GROUP SQL001AG04 JOIN;
RESTORE DATABASE AG04 FROM DISK = ‘\\mvpfsw01\SQLAG04\AG04.bak’
WITH REPLACE, NORECOVERY, NOUNLOAD,
MOVE ‘AG04’ TO ‘E:\MSSQL11.SQL001\AG04.mdf’,
MOVE ‘AG04_log’ TO ‘E:\MSSQL11.SQL001\AG04_log.ldf’;
RESTORE LOG AG04 FROM DISK = ‘\\mvpfsw01\SQLAG04\AG04.trn’
WITH NORECOVERY, NOUNLOAD;
ALTER DATABASE AG04 SET HADR AVAILABILITY GROUP = SQL001AG04
But as You can see in the screen shot it is not working the secondary server is down.
the following error is showing : The connection to the primary replica is not active. The command cannot be processed.
A connection timeout has occurred while attempting to establish a connection to availability replica ‘MVPSQL201401\sql001′ with id [F82BBD94-4F04-4B0A-8B75-28A0899F240C]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
Ok I did turnoff all the Firewall, checked the network set permissions now what.
Think : I have two cluster nodes both are using SQL in the script what do they have in common.
ENDPOINT_URL = N’TCP://MVPSQL201401.mvp.local:5022’
So changed it from 5022 to 5023 and it work like a charm
In the next post I will explain how to check this and how to change this
SO basically it is better to use the script that the wizard well it depends For now in the demo environment running on two different disks it is better and It would be better if the wizard ask you about drive letters or storage locations.
But manually you have more control about the setup and if something fails you can fixit before you go further. But also you have to think about a lot of issues
Next will be Part 2
More Errors and more fixes on SQL Server 2014 Failover Clustered Instance (FCI) with Step-By-Step #AlwaysOn Availability Groups #winserv #FCI
16 thoughts on “Windows Server 2012 R2 with SQL Server 2014 Failover Clustered Instance #Step-By-Step #AlwaysOn AvailabilityGroups What can go Wrong! Part 1”
what happens if you have a shared storage and the instance its-self got failed over rather than the database ?
If the instance is failingover the DB is also moved to the new node. the DB=instance
I am getting the following error “The following folder locations do not exist on the server instance that hosts secondary replica” . In my case on primary server, my SQL location is at E drive and on Secondary server its on H drive. but the error showing the wrong path.
Yes this is common best way is to create the path in front. that is the source path must be the same as the destination even the drive letter.
this is by design. the only way you can change this is creating it by TSQL, it is also on my blog
when you failover what about the sql ids, will they orphan out how do you fix this problem
this is indeed an issue. But with a Clustered FCI you connect always to the same Server. But one important item your app must be ready for alwaysOn. and often there is no automaticly failover but only a manual failover just to control the SQL load.
Thanks for a great blog. Have you been experience problems browsing files in a SQL 2014 CSV eg when attaching or backing dbs with the SSMS gui? It trows me an access error even all Security are set correct. I would bet on bug..
is this on all the nodes or just from one node. You could try this from the node that holds the CSV owner. I do not have the same problems.
Thanks for reply, it is the same on the owner of CSV instance/volumes. I can see that there are others that have the same problems when browsing for doing backup. I will Google more and hope I find a solution…
Thanks for your blog, after changing one of ENDPOINT_URL from 5022 to 5023, the error message “The connection to the primary replica is not active. The command cannot be processed.” is still showed, I do not the reason:(
are you sure the port is free and not used by something else 5024 is also an option but be aware if you created the AG then when you run it for the second time the items are already in place and need to deleted manually. in ht eblog post are all the steps I did. Let me know IF you can fix this or try starting from the beginning delete the AG and al the listeners in the AD
Thanks for your reply:) , I did so, but it seemed that it was not the endpoint_url port issue, because I tried many different ports.
And I tried to look into SQL Server logs, the useful log was just the same with you 😦
Actually, there are three nodes in the Cluster and two instances in the AG, one instance is a SQL14 FCI installed on CSV, the other is a SQL14 standalone instance installed on Local disk. when I setup the dry-run environment, it succeeded, i did not encounter such problem, but at that time I install the standalone instance on system drive ( drive C).
Now, I am setting up the formal environment, and the difference is just that the standalone SQL14 instance need to be installed in Non-system drive (installed on drive D). Then I encountered such problem –“The connection to the primary replica is not active. The command cannot be processed.”
I did turnoff all the Firewall, tried to different endpoint_url ports, but it still did not work.
The secondary replica machine could ping the primary replica successfully. Now I want to now how to checked the network set permissions because I did no perform this step.
Keep in mind that the Disk Config needs to be the same on both AlwaysOn machines. If you are using different drive letters then you need to create the AG with TSQL. the script lines are in the blog post.