Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
213 views

SQL Server Always On Compute Engine

This document provides instructions for building a highly available 2-node Microsoft SQL Server Always On Availability Group configuration on Google Compute Engine. It describes creating two virtual machines in different subnets, installing SQL Server, configuring Windows clustering, setting up the availability group, and testing failover functionality through four simulated failure scenarios. Key aspects covered include licensing options, network requirements, and advantages of using an availability group in the cloud.

Uploaded by

rajkumarpomaji
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views

SQL Server Always On Compute Engine

This document provides instructions for building a highly available 2-node Microsoft SQL Server Always On Availability Group configuration on Google Compute Engine. It describes creating two virtual machines in different subnets, installing SQL Server, configuring Windows clustering, setting up the availability group, and testing failover functionality through four simulated failure scenarios. Key aspects covered include licensing options, network requirements, and advantages of using an availability group in the cloud.

Uploaded by

rajkumarpomaji
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Building a

Microsoft SQL Server


Always On Availability Group
on Google Compute Engine
Tara Kizer and Brent Ozar
Brent Ozar Unlimited

March 2017
Table of contents
Introduction
About the architecture

Before you begin

About the licensing

Building a 2-node availability group on Compute Engine


Creating the VMs Using the UI
Creating the second VM
Creating the VMs by using the command line
Connecting to the VMs through Remote Desktop
Configuring security and networks
Installing and configuring the Windows clustering services
Achieving quorum
Validating the cluster
Setting up the Availability Group
Setting up the backup share
Creating the Availability Group
Adding the listener

Testing your Availability Group


Test #1: Failing over manually A planned failover with zero data loss
Using the wizard
Using TSQL
Test #2: Failing over automatically An unplanned failover with zero data loss
Test #3: Failing over Manually An unplanned failover with data loss
Adding the third replica
Failing over
Test #4: Failing back manually - A planned failback with zero data loss
Testing recap

Replacing a server with one of a different size

1
Introduction
This white paper shows you how to build a Microsoft SQL Server Always On Availability Group
on Google Compute Engine. This paper might be of interest if youre a database administrator,
Windows administrator, or developer and you want to build a Microsoft SQL Server environment
thats highly available by using Always On Availability Groups.

In this paper, you'll learn:

How to build your first Availability Group on Compute Engine.


How to test your work with four failure simulations.
How to tell whether your databases will work well on Compute Engine.

This paper skips a lot of common SQL Server setup tasks that don't change when installing on a
virtual machine (VM) in the cloud.

You'll use PowerShell to accomplish some of the setup and configuration tasks, but you dont
need to be familiar with PowerShell in order to follow along.

About the architecture


You will build a 2-node, Always On Availability Group that uses two SQL Server virtual
machines in the same zone, as shown in the following diagram.

2
To keep it simple, you're not going to deal with network security, VPNs, application servers, and
so on, and you're not going to expose SQL Server to the Internet.

Always On Availability Groups (AGs) can make sense in the cloud for a few reasons:

Automatic failover of multiple databases. This is a common high-availability


requirement, and on premises, DBAs usually solve this with failover-cluster instances
(FCIs). Unfortunately, that requires shared storage, which isnt usually available in
todays infrastructure-as-a-service (IaaS) cloud. There are some workarounds involving
third-party software or UNC paths, but neither of those are great options in IaaS. Always
On Availability Groups can pull this off without shared storage.

Automatic repair of corrupt data pages. Each replica has its own copy of the
databases data pages. Only logged transactions are sent from one replica to another,
not data pages. When a replica encounters a corrupt data page, it can request a clean
copy of the page from another replica, and heal itself live. It doesnt protect from every
kind of corruption, but it can be a real lifesaver in many situations. Learn more about
automatic page repair in SQL Server Books Online.

Potentially lower downtime for patching. For minimal downtime, patch the secondary
replica, make sure its working successfully, then take a brief outage to fail over from the
primary replica to the newly patched secondary. After that, you can patch the former
primary instance. If something goes wrong when patching either replica, simply build a
new one to replace it and join it into the AG.

Possibly easier scalability with lower downtime. When you want to switch to a bigger
instance type, you can provision a new one, add it into the AG, fail over to it, and then
remove the previous instance. You can scale up and down to handle load with this
approach, although this isn't commonly used on a scripted basis to handle daily peaks
and valleys. Its more of a seasonal approach.

Our 2-node Availability Group design can come with a few drawbacks:

Higher cost. Having a warm standby means basically doubling the Compute Engine
costs.
More complexity. This is a little harder to manage than a conventional single SQL
Server VM and requires Windows clustering knowledge.
No disaster recovery. If you want the ability to fail over to another zone or region, you
need to add additional VMs, but you can do that later.

3
Before you begin
This tutorial makes the following assumptions:

Using SQL Server 2016 for this project. Setup is different for prior versions.
Using Windows Server 2016.
Domain controllers and DNS already set up.
Application servers run on Compute Engine. App servers in other places would require
more complex security and network configuration thats out of scope for this white paper.

About the licensing


On Compute Engine, there are two ways to license your SQL Server.

Pay-per-use licensing. Your Compute Engine VM hourly cost includes licensing.


Google manages the licensing logistics with Microsoft. Your hourly costs are higher, but
you have flexibility to ramp your costs up and down whenever you want.

Bring-your-own-licensing (BYOL). Your Compute Engine VM costs are lower because


the licensing isnt included. Youll need to purchase your own SQL Server licensing from
Microsoft, which means paying up front, and you may not have much flexibility here.
However, for companies with very stable usage needs, or with free/discounted licensing
through Microsoft licensing agreements, this can end up cheaper. If youre installing
Developer Edition, use this approach.

Generally:
If your SQL Server needs are unpredictable, consider using pay-per-use.
If your SQL Server needs are predictable for at least a year, consider buying them and
bringing them.

This white paper covers both licensing approaches. If youre not sure which one to use, try
pay-per-use because its easier to get started. Just remember to shut down your instances
when you dont need them.

4
Building a 2-node availability group on Compute
Engine
Youre going to build two VMs, each running Microsoft SQL Server, but its a bit more
complicated than you might be used to doing in an on-premises setup.

You'll use the following computer names, each a Windows VM running SQL Server:

BOU-SQL1, referred to as node1, server1, or the first replica in this paper, depending on
the context.
BOU-SQL2, referred to as node2, server2, or the second replica in this paper, depending
on the context.

Both of those VMs will be in a Windows cluster. If youve used Windows failover clustering in the
past, you might have used a cluster that required a shared quorum drive, such as the Q drive,
that was used for voting to determine which node was the primary at any given time.

Always On Availability Groups is built on Windows clustering. Even if you dont have a shared
drive (you wont in this example), Windows still needs to determine which node is the primary,
and which nodes are the readable secondaries. Clustering is outside of the scope of this white
paper, but theres one thing you need to know: clustering requires some networking and virtual
server name infrastructure.

Later in this white paper, you'll use a couple other names and IP addresses:

bou-cluster1 is a virtual name that points to the node that owns the cluster
infrastructure (the Windows plumbing that determines which node is the primary).
TestListener is a virtual name that points to the node that owns the databases: the
writable primary.

These two names will fail over back and forth between nodes.

If youve built labs at home before, you might have put your two SQL Server VMs right next to
each other in the same subnet. For example, maybe you used 192.168.1.10 and
192.168.1.11. Due to some complex networking issues, that doesnt fly here.

5
However, for this failover process to work in Compute Engine, your two SQL Server virtual
machines have to be in different network subnets, for example:

bou-sql1 IP address 10.0.1.4, subnet 10.0.0.0/16


bou-sql2 IP address 10.1.1.4, subnet 10.1.0.0/16

Note the differences in the IP addresses and subnets. You will need to:
Create the VMs in different subnets.
Set up network rules between the subnets, so that Google can route traffic between
them.
Test to make sure that the communication works.

This white paper provides you with an exact design, complete with IP addresses, that you'll
need follow to get this tutorial to work. In the real world, youll want to work closely with your
network team to pick the right IP addresses, subnets, and network security rules so that your
app servers and end users can connect to SQL Server.

Creating the VMs Using the UI


In the Google Cloud Platform Console, create a new Compute Engine instance.

1. Open the VM Instances page.


2. Click Create Instance.
3. Set the Name for your instance to bou-sql1.

For your first test AG, dont worry too much about the zone. When its time to design
your production infrastructure, youll want to put your database servers in the same zone
as your application servers in order to keep latency low. To learn more about whats
available in each zone, head over to Googles list of regions and zones, which lists what
machine types are available in each location.

4. For Machine type, we dont recommend going any smaller than 4 vCPUs with 15 GB of
memory (n1-standard-4), and its likely that youll need larger for production. Later on
in this white paper, you see how to choose your production instance size based on your
current workloads.

6
5. Change Boot disk to a custom image by clicking Change and then either OS images or
Custom images, depending on your licensing.

You will use SQL Server 2016 on Windows 2016 with a 200 GB SSD. Smaller boot
drives run the risk of filling up due to Windows Updates or other additional files.

7
Important: Picking your licensing model is important because you cant change this quickly later
on. If you pick the wrong licensing model, youll need to build new SQL Server VMs using the
correct OS instance, join them into the AG, fail over to the new VMs, and then decommission
the old VMs.

Pay per use:

8
Bring your own licenses:

In this same screen, youll configure the type and size of your SSD:

This paper keeps the storage simple. When performance tuning, you'll need to think about how
to design the disk layouts for your data, logs, and TempDB.

6. Expand Management, disk, networking, SSH keys and then select Networking.

9
7. Specify the Network you created and which Subnetwork this VM should be in. Use a
custom, unused Internal IP address with no External IP to avoid exposing SQL Server
to the Internet.
8. Enable IP forwarding.

9. Click Create.

10
Creating the second VM
Now create your next VM that uses a different subnet. Use the following screenshots to guide
you.

Pay per use:

11
Bring your own licenses:

12
Your VM Instances should now look similar to the following screenshot:

Creating the VMs by using the command line


With Google, there are two ways you can run commands:

Using Google Cloud Shell, an online terminal.


Using your command prompt on Windows, Linux, or Mac by installing the Google Cloud
SDK.

This paper doesn't walk you through installing the SDK, but if youre going to do a lot of cloud
work long term, consider doing that. Long term, its easier to use your local command prompt
and run a few scripts to configure your cloud resources. If youre just experimenting, use the
Google Cloud Shell in your browser and you wont have to worry about network connectivity and
firewalls.

Google Cloud Shell icon in the Google Cloud Platform Console

Pay per use:

gcloud compute instances create bou-sql1 --machine-type n1-standard-4


--boot-disk-type pd-ssd --boot-disk-size 200GB --image-project
windows-sql-cloud --image-family sql-ent-2016-win-2016 --zone
us-central1-f --subnet wsfcsubnet1 --private-network-ip=10.0.0.4
--can-ip-forward

gcloud compute instances create bou-sql2 --machine-type n1-standard-4


--boot-disk-type pd-ssd --boot-disk-size 200GB --image-project
windows-sql-cloud --image-family sql-ent-2016-win-2016 --zone
us-central1-f --subnet wsfcsubnet2 --private-network-ip=10.1.0.4
--can-ip-forward

13
Bring your own licenses:

1. First, create a disk and a Windows image with the guestOSFeatures enabled.

gcloud compute disks create windows-2016-disk-new \


--size 200 --zone us-central1-f --type pd-ssd \
--image-family windows-2016 --image-project windows-cloud

From the new disk create a custom Windows Server image with MULTI_IP_SUBNET
enabled in the guestOSFeatures parameter.

gcloud beta compute images create windows-server-2016-new \


--source-disk windows-2016-disk-new \
--source-disk-zone us-central1-f \
--guest-os-features MULTI_IP_SUBNET

2. Then create the VMs using the image just created.

gcloud compute instances create bou-sql1 \


--machine-type n1-standard-4 --boot-disk-type pd-ssd \
--boot-disk-size 200GB --image windows-server-2016-new \
--zone us-central1-f --subnet wsfcsubnet1 \
--private-network-ip=10.0.0.4 --can-ip-forward

gcloud compute instances create bou-sql2 \


--machine-type n1-standard-4 --boot-disk-type pd-ssd \
--boot-disk-size 200GB --image windows-server-2016-new \
--zone us-central1-f --subnet wsfcsubnet2 \
--private-network-ip=10.1.0.4 --can-ip-forward

Connecting to the VMs through Remote Desktop


The VMs have not yet been joined to the domain, which is a requirement for most Always On
setups, so youll need to create a local user on each and then RDP to them.

14
To RDP to a VM, youll need to download the RDP file and then use that to connect. Mac users
can open this file at no charge with the Microsoft Remote Desktop app from the Mac App Store.

Note: If you restart the server, you might need to download a new file because the external IP
might have changed.

15
Double click the downloaded RDP file and use the local account you created to connect.

If you get a timeout error, most likely youre dealing with a network firewall somewhere between
you and your cherished virtual machines. Check with your local sysadmins to make sure port
3389 is open out of your own environment. Then, if another team is managing your Cloud
Platform networking, make sure they havent blocked 3389 there.

Configuring security and networks


1. RDP to each VM and add the appropriate domain users/groups to the
Administrators group. Youll need to use a domain account to setup the cluster, so
be sure to do this step before you get to the next section.
2. If you are using Bring your own licenses, update all components to the latest version,
such as .NET Framework and then install SQL Server. If you are using Pay-per-use, this
has already been done.
3. Change to static IP by running the following commands in an admin command window. If
you havent used netsh before, its a command-line way of reconfiguring your network.
Heres where you can learn more about other things it can do:
https://technet.microsoft.com/en-us/library/cc754516(v=ws.10).aspx.

Server1:

netsh interface ip set address name=Ethernet static 10.0.0.4


255.255.0.0 10.0.0.1 1
netsh interface ip set dns Ethernet static 10.2.0.100

Server2:

netsh interface ip set address name=Ethernet static 10.1.0.4


255.255.0.0 10.1.0.1 1
netsh interface ip set dns Ethernet static 10.2.0.100

4. Join the servers to the domain by executing the following command on both servers:

16
netdom join %computername% /domain:dbeng.com
/userd:Administrator /password * /reboot

5. In Google Cloud Shell, run the following commands to add the routes needed for
Availability Groups.

gcloud compute routes create bou-sql1-route --network wsfcnet


--destination-range 10.0.1.4/32 --next-hop-instance bou-sql1
--next-hop-instance-zone us-central1-f --priority 1

gcloud compute routes create bou-sql2-route --network wsfcnet


--destination-range 10.1.1.4/32 --next-hop-instance bou-sql2
--next-hop-instance-zone us-central1-f --priority 1

gcloud compute routes create bou-sql1-route-listener --network


wsfcnet --destination-range 10.0.1.5/32 --next-hop-instance
bou-sql1 --next-hop-instance-zone us-central1-f --priority 1

gcloud compute routes create bou-sql2-route-listener --network


wsfcnet --destination-range 10.1.1.5/32 --next-hop-instance
bou-sql2 --next-hop-instance-zone us-central1-f --priority 1

gcloud compute routes list --filter="network:wsfcnet"

Installing and configuring the Windows clustering services


Always On Availability Groups rely on Windows Server Failover Clustering to determine
ownership. You need to do two things: install these additional services, and then configure them
to group your two virtual machines together into a bond known as a cluster.

1. Add the clustering tools to both servers by running the following command in an admin
Powershell window:

Install-WindowsFeature Failover-Clustering
-IncludeManagementTools

17
2. RDP to one of the servers using a domain account that has administrative privileges.
3. Open Failover Cluster Manager.
4. Right click on Failover Cluster Manager and select Create Cluster.

5. Click Next to go to the Select Servers page and add both servers.

18
6. Click Next and keep the option to run configuration validation tests.
7. Click Next to get to the Validate a Configuration Wizard screen.
8. On the Testing Options page, select Run only tests I select, and then click Next.
9. Unselect Storage on the Test Selection page as the Storage option will fail in
Compute Engine (as it would for separate standalone physical servers). It is not needed
for Availability Groups. Its for traditional failover-cluster instances (FCIs) where every
node needs to see the shared storage locations where data and log files reside.

19
10. Click Next twice to run the tests. Make sure none of the tests have failed.

Common issues found during cluster validation include:

Only one network path between replicas. In the old days, you would build a
separate cluster heartbeat network. Thats not the case today in the cloud, and
you can ignore this one.
Windows Updates not the same on both replicas. If you configured them to apply
updates automatically, one of them might have applied updates that the other
hasnt downloaded yet. Generally speaking, you dont want to automatically apply
updates to SQL Server.
Pending reboot. Youve made changes to one of the servers, and it needs a
reboot to apply. Dont ignore this one.

20
Important: before you click Finish, you really want these tests to succeed. Whenever
you look at a cluster, check Failover Cluster Validation to see if the tests have passed. If
they havent, start to ask questions about whether the tests have ever been run.

11. Click Finish to get back to Create Cluster Wizard.


12. Name the cluster bou-cluster1 on the Access Point for Administering the Cluster
page, and specify the addresses that were used when you created the routes.

21
13. Click Next twice to create the cluster and then Finish to complete the wizard.

Achieving quorum
Setup a file-share witness to achieve quorum.

1. First create a file share on a server that is not part of this cluster. For example, use a
domain controller: \\DC-WINDOWS\Witness.

2. Right click on the cluster, select More Actions and then Configure Cluster Quorum
Settings.

22
3. Click Next.
4. Select the option for Select the quorum witness and then click Next.
5. Select the option for Configure a file share witness.
6. Input the path to the file share and then click Next.

23
7. Click Next after confirming the settings.
8. Click F inish to end the wizard.

Validating the cluster


It is recommended to run cluster validation tests after every cluster change.

1. Right click on the cluster and select Validate Cluster.

24
2. Run through the wizard, ensuring all tests passed.

At this point, youve built the Windows clustering plumbing. Windows can now manage voting to
understand which nodes own a particular service or function, including AGs.

There can be additional complexities of Windows clustering. For production, youd also want to
consider things like:

Witness design. This Books Online page discusses Windows Server 2012, but the
basics still apply to Windows Server 2016:
https://technet.microsoft.com/library/jj612870.aspx.
Distributed transactions. Newly supported with Always On Availability Groups in
Windows 2016 & SQL 2016, but theres an issue that you'll see later in this article.

Setting up the Availability Group


Now you're ready to configure an AG. Perform these steps on both nodes to enable the AG
feature.

1. Open SQL Server Configuration Manager.


2. Click SQL Server Services on the left.
3. Right click on the SQL Server service and go to Properties, or just double click on the
SQL Server service

25
4. Change the account to dbeng\sql.service and provide the password.
5. On the AlwaysOn High Availability tab, select the checkbox for Enable AlwaysOn
Availability Groups.
6. Click OK and then click Yes in the restart warning dialog box.
7. Click OK in the warning dialog box about saving changes.
8. Restart the SQL Server service.

Setting up the backup share


Now that the SQL Server instances are ready for AGs, set up a backup share that the SQL
Server service account can read from and write to. You can do this on both nodes, but it is
recommended that this share be a remote resource so that you have access to all of the backup
files in the case of a restore.

1. For this white paper, use C:\SQLBackup for simplicity on node2 and share it as
SQLBackup. Use the following screenshots to guide you:

26
27
28
There are, of course, some issues with this simple, demonstration approach:

The backups wont be available if this node goes down.


If backups arent cleaned up regularly, the C drive will fill up.
Backups will be slower on node1 than node2, because itll be writing the files
across the network.
If node1 is doing a backup, it will slow down node2 because its pushing data into
its network card.
If you run a backup app that sweeps these files to somewhere else for
safekeeping, the SQL Server will slow down when that happens, because the
backup app will be burning up bandwidth.
Somebody could download or delete these backups if security isnt locked down
tight.

For these reasons, you wouldn't use this approach for production servers. For this
tutorial, this approach keeps things simple and moving forward.

29
2. Run the following commands in an elevated command prompt on both nodes to open up
ports 1433 and 5022 if you are using Windows Firewall (recommended).

netsh advfirewall firewall add rule name="Open Port 5022 for


Availability Groups" dir=in action=allow protocol=TCP
localport=5022

netsh advfirewall firewall add rule name="Open Port 1433 for


SQL Server" dir=in action=allow protocol=TCP localport=1433

3. On node1, create a test database or restore your database. The recovery model must be
set to Full. Perform a FULL backup as this is a prerequisite.

Creating the Availability Group


Now its time to create the AG. We recommend using the UI to create your first AG. You can
save a script at the end of the wizard to see what it takes to create an AG and reuse that script
in the future.

1. Right click on AlwaysOn High Availability and select New Availability Group Wizard.

30
2. On the Specify Name page, specify a name for your AG and select the check box for
Database Health Detection and then click Next. In SQL Server 2012 and 2014, a
failover would not occur if the database became inaccessible. In SQL Server 2016, a
failover will occur if the database becomes inaccessible and if you have enabled this
feature.
3. Select the database you want to add to the AG and then click Next.

31
Notice the prerequisites have not been met for ReportServer and
ReportServerTempDB.

A full backup is required in order to put a database into an AG. SQL Server doesnt
consider a database to be in full recovery until the first full backup has been taken.

4. Add node2. Node1 should already be listed. Enable:

Automatic Failover
Synchronous Commit
Readable Secondary

You can use Yes or Read-intent only for Readable Secondary. Read-intent only
allows only read-intent connections, whereas Yes allows all connections. Yes is needed
if you have applications using database drivers that do not support the
ApplicationIntent=ReadOnly connection parameter.

32
Dont change anything on the tabs for Endpoints, Backup Preferences, and Listener.
If

5. Click Next.
6. Select F ull and specify the path to the backup share on node2.

33
If you restored your own database and it is sizable, you could opt to use the previous full
backup and the log backup chain. If you did this, you would restore the database and log
chain and then select the Join only option.

SQL Server 2016 also adds Direct Seeding, which is the ability to seed replicas
automatically without taking backups. You can read about some caveats here:
https://www.brentozar.com/archive/tag/directseeding/.

7. Click Next and make sure all tests passed with Success, with the exception of the
listener test which will show Warning. You can ignore the warning as you will create the
listener later. If any tests failed, resolve those issues so that you can continue.
8. Click Next and then Finish. You could also save the script from this final page by using
the Script drop-down list.

34
9. After the AG is successfully created, click Close.

35
Adding the listener
You can now add the listener.

1. Expand the AG, right click on Availability Group Listeners, and select Add Listener.

36
2. Give the listener a name. This will be the name that you use in connection strings, so
keep that in mind when naming it. You can change it later if needed.

Tip: In production systems, you might consider using your old servers name as your
listener name. For example, if youre migrating away from SQL2012B, then when you
decommission that old server, you could add a new listener with that name. Your
applications wouldnt even know that their databases had moved to an AG. However, if
you do that, you wont be able to build and test the listener ahead of time.

Instead, use a new name for the listener, but when its time to decommission
SQL2012B, turn that into a DNS CNAME. Rename the old SQL2012B to a different
name (assuming that you have to keep the box around for a while for safekeeping), and
then work with your network admins to create a new DNS CNAME for SQL2012B that
points to your new listener name. This way, if someone forgets to change the connection
string in their application, itll still work.

To learn more about CNAMEs, see the following MSSQLTips post:


https://www.mssqltips.com/sqlservertip/2663/using-friendly-names-for-sql-servers-via-dn
s/

3. Use 1433 for Port and select Static IP for N


etwork Mode. Add the listener IP addresses
using the Add button.

37
4. Click OK to create the listener.

38
Testing your Availability Group
Whenever you implement a high-availability feature, you need to test 4 things:

1. A planned failover with zero data loss. How youll deal with patching, or a planned
migration from one instance type to another.
2. An unplanned failover with zero data loss. For example, when the primary replica
goes offline.
3. An unplanned failover with data loss. When replication from the primary to the
secondary falls behind, and then the primary fails.
4. A planned failback with zero data loss. After scenario #2 happens, and you bring the
former primary back online, you need to re-establish synchronization and deal with the
data differences between the replicas.

The following sections discuss each of these scenarios.

39
Test #1: Failing over manually A planned failover with zero data
loss

We recommend that you don't use Failover Cluster Manager to perform failovers for AGs. The
manager is not aware of the synchronization status of the replicas and can lead to an outage.
Make sure all system admins are aware of this issue. It can be especially tricky because it
seems to work because it doesnt throw an error, but that doesnt mean SQL Server wont have
issues when your systems fail over under duress.

To perform failovers, use the Fail Over Availability Group wizard or TSQL. You can initiate the
wizard from a primary replica or a secondary replica.

Using the wizard


1. In SQL Server Management Studio, right click on the AG and select Failover.

2. On the Select New Primary Replica page, verify that the secondary replica is selected
for the server to failover to, and then click Next.

40
3. Connect to the secondary server by using the Connect button, then click Connect, and
then click Next.
4. Click Finish to perform the failover and then Close when it finishes.

Management Studio refreshes and shows the new replica role:

41
When theres a planned, graceful failover, SQL Server automatically starts synchronizing data
over to the other replica (the former primary). You can fail over back and forth, without data loss,
and the replicas are automatically kept in sync.

Using TSQL

1. Connect to the secondary replica that you want to failover to and then run the following
command.

ALTER AVAILABILITY GROUP [TestAG] FAILOVER;

2. Refresh Object Explorer for the server you failed over to and see that the role has
changed.

Alternatively, you can use SQLCMD mode in Management Studio.

1. From the Query menu, select SQLCMD Mode.

42
2. Use :CONNECT syntax to run the commands underneath it on another server.

Notice that the Query Editor window is connected to BOU-SQL1 but that the failover
command will run on BOU-SQL2.

43
3. Execute both commands and then refresh Object Explorer.

Important: Remember that all sysadmins should know not to use Failover Cluster Manager to
perform an AG failover and instead know how to do them in Management Studio. This is really
important when an organization has a DBA team and a different team that performs reboots
such as for Microsoft security patching. Server admins with clustering experience know to
failover SQL Server instances off the server that they are going to reboot, so be sure that they
know how to do an AG failover.

44
Test #2: Failing over automatically An unplanned failover with
zero data loss

1. Initiate an automatic failover by stopping the SQL Server service on the primary replica.
2. Wait 15-30 seconds, and then check Object Explorer for the new primary replica.
Assuming everything went well, the AG failed over without data loss because the replica
it failed over to is configured for synchronous-commit with automatic failovers.

In this scenario, the failover wasnt planned. As long as the two replicas were in synchronous
commit mode, no data is lost. When the stopped SQL Server service is started back up again, it
automatically starts fetching data from the new primary in a matter of minutes, and catches back
up to sync and allow automatic failovers.

However, while one of the two replicas is down, you have no safety net.

Some people aim for zero data loss by designing a two-node AG with synchronous commits.
This works as long as both replicas are up. SQL Server doesnt automatically go down with a
minimum number of replicas. Your users might keep inserting data into the one remaining
replica, thinking that their data is safe, when its not. Here you have a single point of failure.

This is one reason why monitoring is very important. By default, SQL Server doesnt tell you that
youre down to a single point of failure, or that a replica has stopped synchronizing with other
replicas.

45
Test #3: Failing over Manually An unplanned failover with data
loss

You could use the 2 servers to test out a manual failover with data loss by switching node2 to
be asynchronous-commit and then stopping SQL Server on node1. Instead, take it one step
further and add a third asynchronous server to the mix.

Adding the third replica

1. Create a third VM, join it to the domain, add it to the cluster (use Add Node in Failover
Cluster Manager), enable the AG feature and then add it to the AG by right clicking on
the AG and then clicking Add Replica.

2. On the Connect to Replicas page, connect to the secondary replica, which is now
BOU-SQL1 as BOU-SQL2 is primary.
3. Click Connect twice once in the wizard and once in the Connect to Server dialog box
and then click Next.
4. On the Specify Replicas page, click A dd Replica, and enter the third server
5. Click Connect in the Connect to Server dialog box.
6. Configure the new replica as asynchronous-commit with manual failovers. There
isnt an option to use automatic failovers with asynchronous-commit. You need to

46
initiate the failover if there could be data loss, because AGs will never automatically fail
over in a way that data will be lost.
7. Decide how youd like to synchronize the data. If your test database is small or empty, a
Full backup is easy. Point it to the backup share if its not already filled in and then click
Next.

If you get an error for the AddReplicaDatabaseFileCompatibilityValidator


check on the Validation page, create the folders where the database files reside on the
third server. When the folders have been created, click Re-run Validation. It should
pass the check now.

8. Click Next and then Finish.


9. Click Close when it is done.
10. Refresh Object Explorer and expand the AG to see that the third replica has been
added.

Note: If you encountered any errors along the way for this third VM, go back in the doc
and make sure you ran all the necessary commands. Run the same commands with
necessary changes and do the same steps as was done for the other two VMs.

Failing over

1. Stop the SQL Server service on both synchronous-commit replicas (the first two VMs).

This step partially simulates a region or zone failure where all of your synchronous
replicas have gone down. It isnt a perfect simulation, because Windows is still up.

2. Connect to the third VM in Management Studio. Notice that the database is in a Not
Synchronizing state because its unable to reach the other two replicas.

47
3. Run the ALTER AVAILABILITY GROUP command to fail the AG over to the third
server, but use the option to allow data loss. This replica is asynchronous and might not
be completely up to date with all of the data changes that occurred on the primary replica
when both synchronous replicas went down. Failing over to it simulates what would
happen in the case of a disaster where you lose the primary data center and need to get
production online at the Disaster Recovery site.

ALTER AVAILABILITY GROUP TestAG FORCE_FAILOVER_ALLOW_DATA_LOSS;

4. Refresh Object Explorer for the third server.


5. Start SQL Server on the first 2 VMs.
6. Refresh Object Explorer for the first 2 servers.
7. Check the Always On Dashboard to confirm that the third server is in a good state,
how
whereas the first 2 servers are not. To do this, right-click on the AG and select S
Dashboard.

48
The Not Synchronizing state for BOU-SQL1 and BOU-SQL2 isnt temporary: after you perform
a manual failover with data loss, SQL Server knows that theres data on SQL1 and SQL2 that
you might want to recover.

Microsoft has a great checklist for tasks you need to perform after a forced failover:
https://msdn.microsoft.com/en-us/library/ff877957.aspx#FollowUp. It talks about fixing a cluster
quorum, creating database snapshots in case you want to recover the lost data from the former
primaries, enabling synchronization from the new primary, and what might happen if the offline
nodes restart with their own quorum. This information is important if you plan on implementing a
multi-subnet AG.

49
Test #4: Failing back manually - A planned failback with zero data
loss
There are at least 2 ways to get the first 2 replicas back to a healthy state without rebuilding the
AG:

1. Remove the first 2 servers from the AG, restore the LOG backup chain and then add the
first 2 servers back to the AG
2. Remove the first 2 servers from the AG, restore a FULL backup and the LOG backup
chain, and then add the first 2 servers back to the AG

There can be other ways, depending on factors like how little behind the AG was at the time of
the failover, how small the databases are, how fast the network is between sites, and so on.
This is where SQL Server 2016s Direct Seeding can come in handy, but you have to be careful
with large databases flooding your network connection, unchecked.

Here, you use option 1 for the first server and option 2 for the second server.

1. Remove the first 2 servers from the AG by using the UI or the TSQL command.

UI:

50
TSQL:

On the third server, use the REMOVE REPLICA option of the ALTER AVAILABILITY
GROUP command to remove a replica.

ALTER AVAILABILITY GROUP [TestAG] REMOVE REPLICA ON


N'BOU-SQL2';

2. Refresh Object Explorer on the third server to confirm that only the third replica is in the
AG.

51
3. Take a log backup on the third server and then restore it on the first server. If any other
log backups have run on the third server since the first 2 servers have been offline,
restore those log backups first.

4. Use the WITH NORECOVERY option when restoring the LOG backups.

Third server:

BACKUP LOG Test


TO DISK = 'C:\SQLBackup\Test_20170909_1022.trn'
WITH INIT;

First server (copy trn file to the 1st server or use a UNC path to reach the file on the third
server):

RESTORE LOG Test


FROM DISK = 'C:\SQLBackup\Test_20170909_1022.trn'
WITH NORECOVERY;

5. Set the third server to synchronous-commit by right clicking on the replica and selecting
Properties.

52
6. Change Availability mode to Synchronous commit.
7. Add the first server back to the AG by right clicking on the AG and selecting Add
Replica.
8. On the Specify Replicas page, click A dd Replica.
9. Specify the 1st server and click Connect.
10. Check the box for it to be a synchronous-commit replica as you are going to fail over
to it without further data loss. You'll go back and switch the third server to
asynchronous-commit after the first 2 servers are back to synchronizing with each

53
other.

11. Click Next, select Join only on the Select Data Synchronization page and then click
Next again.
12. Ignore the warning that says "Checking the listener configuration", and then click Next.
13. Click Finish to add the server as a replica to the AG.
14. Click Close to close out the Add Replica to Availability Group wizard.
15. Refresh Object Explorer to see that server1 and server3 are replicas in the AG.
16. Run a full backup and a log backup on server3 and then restore those on server2 using
WITH NORECOVERY.

54
Server3:

BACKUP DATABASE Test


TO DISK = 'C:\SQLBackup\Test_20170909_1105.bak'
WITH INIT;

BACKUP LOG Test


TO DISK = 'C:\SQLBackup\Test_20170909_1105.trn'
WITH INIT;

Server2:

RESTORE DATABASE Test


FROM DISK = 'C:\SQLBackup\Test_20170909_1105.bak'
WITH NORECOVERY, REPLACE;

RESTORE LOG Test


FROM DISK = 'C:\SQLBackup\Test_20170909_1105.trn'
WITH NORECOVERY;

17. Add server2 as a replica to the AG using the same steps you used for server1.
18. Failover to server1.
19. Set server3 to asynchronous-commit by right clicking on the AG from the new
primary replica, server1 and then selecting Properties.
20. Set server3 to Asynchronous commit for Availability Mode and switch to Automatic
for Failover Mode for servers 1 and 2.

55
21. Click OK.

The AG is now back to the original settings before the first 2 servers were taken offline.

Testing recap
When you build a production AG, youll want to step through these same four tests while taking
screenshots and document the steps. That way, if disaster strikes, you or any member of your
team can better handle the emergency by reading through your checklist.

56
Replacing a server with one of a different size
You can replace a server in Compute Engine when using an AG. You might want to do this
because of one of these scenarios:

You have outgrown your hardware and need a bigger server.


You started off with too big of a server and want to downsize it.
The workload is higher during certain peak times of the year and you want to temporarily
scale up.

Your AG already has 3 servers in it, assuming you followed the steps in the previous section.
Now, replace server1 with server3. Here are the steps:

1. Failover to server2.
2. Set server3 to synchronous-commit with automatic failovers.
3. Remove server1 from the AG.

Thats it!

57

You might also like