Log Shipping or Always on as DR for SQL Failover CLuster

https://dba.stackexchange.com/questions/199064

25-12-2020
|

Pregunta

Which is easier to maintain as Disaster Recovery on a remote site for a 2-node SQL Server Cluster

Log Shipping
Single Instance of Always on Availability Group Instance.

Any suggestion will be highly appreciated.

Solución

My experience is that an AlwaysOn high availability group, HAG, is easier to maintain than log shipping. Lets set some variables for our scenario.

1.) Your version and edition of SQL Server is SQL Server 2016 Enterprise Edition. We are not on the latest SQL Server patch level. We need to apply a newly released service pack.

2.) The SQL Server database environment exists at a typical small to medium sized company (The active passive AlwaysOn build that you described in your question leads me to this conclusion). This company has a nightly or weekly maintenance window but like most companies the service level agreement is either four or five nines of up time (The database must be available either 99.99% or 99.999% of the time).

Lets look at how each feature allows you, the DBA, to keep the I.T. manager happy by maintaining those up time numbers.

An AlwaysOn HAG gives you the flexibility of being able to fail over back and forth between your AlwaysOn replicas. My assumption is that the primary and secondary replicas are not in close physical proximity to each other. Perhaps they are in different data centers or different racks within a data center. Regardless, If I need to perform maintenance on the primary I can set the availability mode from asynchronous to synchronous, fail over, perform my maintenance, fail back over, and set the availability mode back to asynchronous (https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups). With a properly configured AlwaysOn HAG listener the end user will experience little to no interruption and you'll get to easily maintain that SLA.

Now lets take a look at Log Shipping. Without doing a cut and paste from the Microsoft books online there are five steps that have to be manually performed before you can failover. They can be found here... https://docs.microsoft.com/en-us/sql/database-engine/log-shipping/fail-over-to-a-log-shipping-secondary-sql-server .

Considering that most incidents and or maintenance windows occur after hours would you rather have an AlwaysOn high availability group that you can easily failover back and forth between replicas or try and get the primary and secondary databases into sync by applying transaction log backups?

Otros consejos

Log shipping is by far easier to maintain than an Availability Group for the scenario you described.

The ability to fail over back and forth quickly using an AG is great for high availability, but often disaster recovery RTO (recovery time objective) is much longer because hey, it's a disaster. While the high availability goal might be to have it all back up in 60 seconds, the disaster recovery goal may be 60 minutes. Log shipping should be able to handle that easily.

Adding a remote, third AG node in asynchronous commit mode may itself be an easy task, but maintaining that can get tricky. Things I've seen go wrong with my own AGs include:

Failure to automatically seed the database on secondary nodes
Databases showing as "not synchronizing" or "restoring" after a failover
Failure to stop data movement before a planned failover resulting in nodes getting out of sync
SQL Server Service stuck as "Change Pending" during maintenance on an AG node
Databases can end up reverting: https://blogs.msdn.microsoft.com/alwaysonpro/2014/11/25/large-transaction-interrupted-by-failover-secondary-database-reports-reverting/

With Availability Groups, there are a lot more things that can go wrong and a lot more places (DMVs, error logs) to look for troubleshooting information: https://blogs.msdn.microsoft.com/sql_server_team/troubleshooting-high-hadr_sync_commit-wait-type-with-always-on-availability-groups/

Also, here are some issues related with just patching Availability Groups (granted this list is a little older now): https://www.brentozar.com/archive/2015/02/patching-sql-server-availability-groups/

Oh, and there's the added complexity/setup/maintenance of Windows Server Failover Clusters with AGs, too. (Note: SQL Server 2017+ doesn't always require a WSFC.)

Consider not only the technology part of maintenance, but also the staffing. For full production coverage, you ought to have at least two DBAs with AG experience. That kind of experience is hard to find and commands a higher salary. Compare that to log shipping, where most DBA candidates will have that experience and there's no salary premium that comes with it.

In short, practically everything about Availability Groups is higher maintenance except the process of failing over to the DR node. If you have a tight RTO for your DR node, then you may have to go with async mirroring or even an AG. Likewise, if you have an experienced DBA team that can handle AGs going wrong, then maybe go that route. Otherwise, log shipping will be the easier technology to maintain.

AlwaysOn. If the application is configured properly for the listener, you can have a fail-over every hour and no one is going to care and no complaints from users in 99% of cases. I've also seen legacy apps that don't support it work almost flawlessly during a fail-over. And you can separate your databases into several groups each with it's own listener and fail them over independently of each other. Or if you have virtually no overhead on your VM environment, you can put a problem database into it's own group so a fail-over won't affect other databases.

AlwaysOn also supports Read-Only. If your developers code a read only connection into their apps you can point applications at your secondary for selects. Handy feature if your primary is on an overloaded hyper-visor with other applications.

I haven't used mirroring in a few months, but from what I remember if you failed over a single database then you needed to make manual changes to DNS.

The one thing I miss about mirroring is the GUI they had that told you how far it was behind if you had some issue. There were a few times we had replication issues I couldn't figure out. And then I looked at the mirroring admin tool and saw that some database was 50GB or whatever behind in sending log data. And it kept a history. Pretty good if you're working in an organization with a tight budget and no monitoring software you can use the mirroring history as part of your troubleshooting or answer questions to management about issues.

Log shipping is by far the easier to maintain and troubleshoot technology for DR. I've used both extensively and would much prefer log shipping.

Why?

Because with a little scripting you can easily automate failover procedures. Both a nice failover where you take a tail-log backup and copy it over (using the agent jobs already in existence) restore it (again with jobs that already exist) and restore with recovery.

There is little that can go wrong with log shipping. And when it does it is easy to fix.

Sure there is no fancy GUI to monitor log shipping, but it isn't hard and there are agent jobs that do indeed check your jobs. Out of the box.

AOAG can go down and be down and you won't know about it until after you need it UNLESS you know how to setup the alerting. I know this because I inherited servers that had AOAG that had stopped syncing.. months before I even knew these servers existed.

Log Shipping sets up alert jobs, you make sure your email alerts are working and bam, it bothers you endlessly until you fix it.

I've used Log Shipping, Mirroring, AOAG all for DR.. I much prefer Log Shipping.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a dba.stackexchange