Understanding Disaster Recovery and What Should be in your DRP

Written by John Riddoch | Jul 27, 2021 4:28:26 PM

Stating the obvious, technology is central to almost every aspect of our personal and professional lives. We require it for everything from our jobs, to buying groceries, and communicating with friends and family. We can’t seem to function without it.

So, ask yourself, what would you do if your critical business systems crashed, went offline, or disappeared? Would your team, customers, or service users be supported and able to continue? What impact would it have on them individually or as a business, and as a result, how would your organisation be affected?

Accidents happen, it’s part of life, but when an organisation decides a Disaster Recovery Plan isn’t a high priority, it would suggest they have not paid attention to, or learned from several recent and well documented events, including: the OVH Data Centre story in France, the Fastly hosting outage, or the British Airways blackout in 2017 (which we discuss in more detail below).

Yet, FEMA this year reported that one in five companies still don’t have a disaster recovery plan, with between 40-60% of smaller businesses never reopening after a disaster.

Whether an outage is the result of something high-profile such as a cyber-attack or a natural disaster; or, as simple as a power cut or human error, without the appropriate Disaster Recovery procedures in place, the least you will experience is a halt to business operations. More likely, you will need to deal with consequences beyond a disruption to service, such as data loss, loss in revenue, fines, unsympathetic customers, and reputational damage.

Wherever you, or your organisation, are on your journey to implementing or optimising your Disaster Recovery Procedures, this article looks to answer a few common questions:

What is Disaster Recovery?
The Cost of IT Disasters
What is a Disaster Recovery Plan and Why Should it be a Priority?
What Should be in a Disaster Recovery Plan (DRP)?

What is Disaster Recovery?

The main goal of Disaster Recovery is to restore all critical business support systems with minimum downtime. In short, it enables an organisation to regain use of all critical IT infrastructure and systems after a disaster has occurred, as quickly as possible.

Disaster Recovery solutions have evolved over the last decade. Combining enhancements in cloud technology, processing speed, and bandwidth, leading solutions are capable of regaining vital systems in hours, and in some cases minutes, through a set of defined tools, policies and procedures allowing an organisation to respond, recover and progress from any setback.

The Cost of IT Disasters

Reports suggest that IT downtime is costing businesses an average of £3.6 million annually, with technical faults costing an average of £4,300 or £258,000 per hour depending on the organisation’s size.

British Airways are a prime example of a company who have suffered such a disaster.

A power surge at one of BA’s IT hubs lasted only 15 minutes but caused chaos by grounding all flights globally, costing the company £150 million in compensation and millions in PR damage.

Certainly, not every business has the size of infrastructure or commercial weight of British Airways. But organisations of all sizes, including Public Sector, should learn lessons from the processes and fallout of BA’s very public humiliation.

A Scottish Government report highlighted organisations – especially in the public sector – have an over-reliance on data centres for their security needs.

Yet, even data centres trusted by hundreds of thousands of organisations to protect their most sensitive data are susceptible to threats or unplanned downtime. Three-quarters of data centres have experienced a service outage in the last 3 years, with 3 in 10 experiencing an outage that had a significant impact on its clients and their users.

So, it is imperative to ensure that regardless of where and how your systems and data are stored, or hosted, that you have taken the actions internally and with your suppliers to remove any potential gaps in the process should the worst happen.

What is a Disaster Recovery Plan and Why Should it be a Priority?

A Disaster Recovery Plan does exactly what the name suggests. It is a defined plan of action detailing the procedures an organisation should follow, and the tools it should use during and after a disaster(s) to get back to normal.

Its sole purpose is to help protect the organisation by minimising delays and downtime, improving security, and avoiding potentially damaging last-second decision making under pressure.

In case it is not obvious, the reason a DRP should be a priority for organisations is that our dependency on technology only seems to be heading in one direction. As with all things in life, the more we rely on something, the bigger the consequences are when issues arise.

In addition to the day-to-day risks of power cuts, or natural disasters, as the world makes its way out of the Covid-19 pandemic, one thing which seems guaranteed is that a full-time return to the office for all staff is unlikely.

Whilst on one hand this new approach brings opportunities for organisations to reduce costs and improve work-life balance, it introduces several new complexities, and dependencies on their IT landscape. With that comes risk. Most notably, Cyber Security and the threat of business systems being impacted or disabled as a result of a malware attack.

In March 2021 the UK government reported 39% of businesses have experienced a cyber-attack in the previous 12 months, with medium and large companies being targeted more often at 65% and 64% respectively.

The lack of a sufficient DRP is a contributing cause in business failure, and the speed at which an organisation can respond to disasters depends on how prepared they are.

What Should be in a Disaster Recovery Plan?

Disaster Recovery focuses on restoring all critical systems with minimum downtime, ensuring organisations can adhere to and fulfil policies they are committed to, through a set tools and procedures.

Like with all challenges, the first key to addressing them is understanding what the challenge is. When creating a Disaster Recovery Plan, it is important to first define what you need to protect, and why? As a starter, we would suggest reviewing and understanding your policies. Thereafter, it is a case of defining the tools and procedures you require to get things back to where they should be.

Policies:

What is your organisation responsible for and as a result what do you absolutely need to prevent in the case of a ‘disaster’? This is something each organisation needs to understand in relation to their own Terms and Conditions, Service Level Agreements, and contracts, but two of the most common areas to consider are operations and security.

Operationally, what have you committed to for your users to ensure they can rely on your solution (guaranteed uptime)? As a result, how quickly do you need to be back online? What are the consequences of failing to meet this level of your agreement?

Normally, users would be entitled to consider financial compensation claims if there is a significant disruption to their service. If, however the problem persists and downtime exceeds an ‘acceptable’ recovery period, could your organisation be liable for a breach of contract with potentially much more serious consequences for financial penalties and damage to your reputation?

From a security perspective, what policies are in place and does your organisation adhere to in relation to end user privacy agreements, data storage and access? Whilst this is not necessarily a sole focus for disaster recovery (in so much as it relates to your security and backup policies), it should be considered when design your DRP to ensure you have covered all bases properly.

Tools:

By clearly defining your policies, you will identify which systems are critical and what level of responsiveness is required. From there, you need to implement the right technology to ensure normal service can continue, or resume, as quickly as possible in the event of a disaster.

Historically, the most effective method of disaster recovery, resulting in minimal downtime was to ensure your entire environment was duplicated using synchronously mirrored technology. This means that your entire set-up is effectively running live somewhere else, ready to become the primary set-up at any point.

Sure, this is effective in terms of providing a quick and robust recovery. However, it can be extremely costly, as it essentially doubles every IT related cost you have, from licensing and hardware costs, right down to hosting and power usage – all just in case something goes wrong. It’s like buying two of the same car and having one just sitting in the garage in case the first one breaks down one day.

(Actually, to make that analogy work fully, you would also have to be draining the fuel of the second car and re-filling it every time you topped up the primary car, just to reflect the ongoing costs).

As cloud technology, processing power and bandwidth have all improved. CloudEndure (an AWS Solution) has evolved to provide enterprise-grade disaster recovery, at a fraction of the cost by removing the need for a completely duplicate environment.

Rather, than running a duplicate environment, CloudEndure constantly takes a snapshot of your entire environment (systems and applications), using a process called ‘Continuous Blockchain Replication’, and stores them in a low-cost, cloud-based staging area ready to be booted up immediately in the event of a disaster. Customers are then charged a significantly lower amount for the staging area (compared to a duplicated environment) and are only charged additionally, on a usage-based tariff, should they experience a disaster and need to increase their resource whilst they get the primary systems back to where they should be.

So, if you are looking for a solution which provides the level of cover you require but are interested in reducing your costs and operational footprint, speak to us today to arrange a proof of concept for your organisation.

Procedures:

Ok, so you know what systems and technology you need to keep running following a disaster and you have a market-leading solution to help you do so. All you need now is to ensure you have two sets of policies in place to protect your organisation’s technology: policies to follow should a disaster occur; and policies to follow regularly to ensure your DR solution is continuing to work as it should.

This is where working with an expert partner can add real value to your approach. Having a provider in your corner who understands the technology you are implementing better than anyone, who can bring years of experience in implementation and training will help you to draw up procedures optimised for both the technology you are using, and the specific requirements of your business.

Once you solution is implemented, we would always recommend testing, including at least one ‘full disaster invocation’, followed by monthly, quarterly and annual checks of your DR solution to varying degrees depending on your set-up and resources.

About Agenor

When considering DR, choose a partner who has a proven methodology and industry experience, understands the value of your business and the risks that any data failures, server corruptions and downtime can bring for your organisation.

Having delivered hundreds of successful business-critical technology projects for medium and large enterprises for over 15 years, Agenor, along with AWS CloudEndure, is an ideally positioned partner to support organisations throughout both the public and private sectors to implement a modern, cost-effective DR solution. From initial scoping, through to proof of concept, design, testing, go-live and support our expert consultants advise you at every stage of the journey to ensure your solution meets your requirements, now and in the future.

View full post