SQL Always on issues and resolutions __ SQL Server
SQL Always on issues and resolutions __ SQL Server
SQL Server AlwaysOn Availability Groups is a high-availability and disaster recovery solution introduced
in SQL Server 2012. Despite its robustness, it comes with its own set of common issues that can occur during
deployment, operation, or failover processes.
Below are the most frequent issues encountered in AlwaysOn Availability Groups and their resolutions.
Issue: The Availability Group or one of its replicas fails to come online after a failover or restart.
Causes:
Resolution:
● Check SQL Server Service Account Permissions: Ensure that the SQL Server service accounts have
the necessary permissions on the failover cluster nodes and storage devices. The service accounts must
have read/write permissions on the shared resources.
https://www.sqldbachamps.com
● Start the Cluster Service: Verify that the WSFC service is running on all participating nodes. Use the
following PowerShell command to check the status:
Get-Service clussvc
Network Configuration: Ensure the network configuration, especially the AlwaysOn listener, is correctly set up
and the DNS is resolving the listener name properly. Test network connectivity between the nodes.
Check Replica Synchronization: Make sure all replicas are in a synchronized state. If not, check for network
issues or transaction log shipping failures. Use the following query to check synchronization state:
SELECT
ag.name AS AvailabilityGroupName,
ar.replica_server_name AS ReplicaName,
ags.synchronization_state_desc
JOIN sys.availability_replicas ar
https://www.sqldbachamps.com Praveen Madupu +91 98661 30093
Sr SQL Server DBA, Dubai
praveensqldba12@gmail.com
ON ags.replica_id = ar.replica_id
JOIN sys.availability_groups ag
ON ags.group_id = ag.group_id;
Issue: The secondary replicas are experiencing significant synchronization lag, which may lead to
increased recovery time during a failover.
Causes:
Resolution:
● Monitor Log Generation: Use SQL Server's sys.dm_db_log_stats DMV to monitor the transaction
log generation rate and ensure that it is not excessively high. If necessary, review and optimize the queries
https://www.sqldbachamps.com
●
or batch processes causing the high log generation.
Increase Network Bandwidth: Ensure that there is enough bandwidth between the primary and
secondary replicas. If you are running replicas across different geographical locations, consider upgrading
your network links or using technologies like WAN accelerators to reduce network latency.
● Improve Disk Performance: Ensure that the secondary replicas have sufficient I/O throughput. If disk
performance is the bottleneck, consider upgrading to faster disk types (e.g., SSDs) or optimizing storage
configuration.
Verification:
SELECT
ag.name AS AvailabilityGroupName,
ar.replica_server_name AS ReplicaName,
ags.log_send_queue_size,
ags.redo_queue_size,
ags.last_commit_time
https://www.sqldbachamps.com Praveen Madupu +91 98661 30093
Sr SQL Server DBA, Dubai
praveensqldba12@gmail.com
FROM sys.dm_hadr_availability_replica_states ags
JOIN sys.availability_replicas ar
ON ags.replica_id = ar.replica_id
JOIN sys.availability_groups ag
ON ags.group_id = ag.group_id;
Issue: During a planned or unplanned failover, it takes too long to switch to a secondary replica, or the
failover fails altogether.
Causes:
● The secondary replica is not fully synchronized with the primary replica.
● The transaction log is too large, causing delays in recovery.
● The SQL Server service on the secondary replica is not running or has crashed.
● Incomplete or incorrect failover cluster settings.
https://www.sqldbachamps.com
Resolution:
● Ensure Synchronization: Before initiating a failover, ensure that the secondary replica is in a
SYNCHRONIZED state. You can check this using the same query as above. If the secondary is not
synchronized, perform a manual log backup and restore, or allow time for the secondary to catch up.
● Monitor Transaction Log Size: If a large transaction log is causing delays, you may need to back up the
transaction log more frequently to keep it from growing too large.
● Check SQL Server Service: Ensure that the SQL Server services are running on the secondary replicas.
If not, investigate the Windows Event Logs and SQL Server logs for errors.
● Review Cluster Settings: Check the AlwaysOn configuration settings, especially the failover threshold,
and ensure that they are appropriate for your workload. Consider increasing the timeout settings if failover
frequently fails.
https://www.sqldbachamps.com Praveen Madupu +91 98661 30093
Sr SQL Server DBA, Dubai
praveensqldba12@gmail.com
4. Availability Group Listener Fails to Come Online
Issue: The Availability Group listener fails to come online, preventing client connections to the
Availability Group.
Causes:
Resolution:
● DNS Registration: Ensure that the listener name is correctly registered in DNS. If necessary, manually
register the listener by running the following command:
ipconfig /registerdns
Check Permissions: Ensure the SQL Server service account has the required permissions to create DNS entries
for the listener. You can review the DNS registration status using the following PowerShell command:
https://www.sqldbachamps.com
Get-DnsServerResourceRecord -Name "<ListenerName>" -ZoneName "<DNSZone>"
● Firewall and Port Configuration: Verify that the listener's port (default 1433) is open on all participating
nodes and allowed by firewalls. Also, ensure that no other services are using the same port as the listener.
Causes:
Resolution:
● Check Network Connectivity: Ensure that all cluster nodes have consistent network connectivity. You
can use the Failover Cluster Manager or the following PowerShell command to check the status of the
cluster nodes:
Get-ClusterNode
● Review Quorum Settings: Ensure that your quorum configuration is appropriate for the number of nodes
in your AlwaysOn cluster. For a multi-site setup, use Node and File Share Majority or Node and Disk
Majority.
● Investigate Hardware Issues: Check the Windows Event Logs for any hardware or OS failures that may
https://www.sqldbachamps.com
have caused the node to drop from the cluster. Resolve hardware issues as necessary.
Verification:
Causes:
Resolution:
● Avoid Forced Failovers: As a general rule, avoid forced failovers (WITH DATA LOSS) unless absolutely
necessary. This option should only be used when the primary replica is permanently offline.
● Monitor Synchronization: Regularly monitor the synchronization state of your replicas to ensure that
secondary replicas are always synchronized and capable of performing a failover without data loss. Use
this query to check for unsent logs:
● Restore from Backup: If data loss occurs, you may need to restore from a recent backup to recover the
lost data.
https://www.sqldbachamps.com
7. Availability Group Replica in "NOT SYNCHRONIZED" State
Issue: One or more replicas in the Availability Group remain in a "NOT SYNCHRONIZED" state,
preventing automatic failover and data protection.
Causes:
Resolution:
● Check Network Latency: Monitor the network traffic between the replicas and ensure that there are no
high latencies or dropped packets. Use tools like Ping and Tracert to diagnose connectivity issues.
● Transaction Log Backups: Ensure that regular transaction log backups are occurring on the primary
replica to keep the log size manageable.
● Disk Performance on Secondary: If the secondary replica is experiencing I/O bottlenecks, consider
upgrading the disks or optimizing the I/O path.