The traditional concept of IT Disaster Recovery (DR), i.e. the solution where an organisation sets up an alternate site where servers, applications and data can be used in case the primary data centre burns down, floods, loses power or otherwise fails, needs to be re-thought completely due to two major developments.
The first one is Cloud computing, resulting in the IT DR responsibility seemingly being transferred to the shoulders of an external supplier. “We’ve outsourced our business continuity challenges to a Cloud vendor” is a popular comment. Don’t be fooled though. Like with perhaps everything in life, any benefits usually come with a set of new challenges.
Whilst you may have picked a Cloud partner with ISO 27001 and/or related certifications, you will unlikely have full control over their operating procedures, any changes in security practices between audits, their mergers and acquisitions, their staff background checking processes, any temporary skill gaps, any disgruntled employees they may have, exactly where on their systems your data resides, and who else’s data resides on it.
Additionally, many customers of Cloud vendors have ‘all eggs in one basket’ when it comes to storing their various data environments (e.g. production, test, development and DR) all with the same Cloud vendor. This is not always the best choice, if we consider the risk of your account being compromised or in case that supplier’s systems/infrastructure go out of operation – which happens even to the best of them, as was demonstrated in 2021 when a leading customer management vendor went down for 6-8 hours taking their clients with them. Clients who often had stopped even worrying about having a Business Continuity Plan (BCP) including manual work-arounds, because “they had outsourced their BCP to the Cloud” – remember?
Without negating the upsides of Cloud solutions for BCP, one should just be conscious of the aforementioned issues as well as further downsides, such the relatively little ability to customise the user interface (compared to in-house software). But possibly the biggest downside is the complete and utter reliance on network connectivity. Whilst in a pre-Cloud world, your staff may have been able to continue working on local file and mail servers, now they are no longer able to even email the colleague sitting next to them if Internet connectivity is affected. Cloud can absolutely be an excellent choice, only as long as the decision is made with all pros and cons in mind.
The next development that has changed the concept of IT DR entirely is the uprise in information (including cyber) security threats. The traditional ‘primary site vs ‘backup site’ concept makes little sense if malware has worked its way into both environments. Further complicating this risk is not knowing how far it’s travelled, “so let’s initially unplug all systems so we can investigate”. A fire, flood or power outage makes itself heard and seen in an obvious way, but with information security threats, part of the challenge comes with the inability to assess properly what’s happened, what components are affected, how to remove the cause and when a patch may become available. Finding an expert cyber security consultancy partner quickly to assist in this process may also be a challenge, particularly in the case of a large-scale cyber attack, where it is likely you will not be the only one seeking their help.
In a nutshell, DR is not as predictable as in the past so having a solid BCP with initial/manual work-arounds and excellent communication procedures and tools is imperative – more so than in the past. However, BCPs and Cyber Incident Response Plans (CIRPs) often exist on paper, rather than actually being embedded across the organisation.
There’s too much focus on ticking boxes to please auditors or clients, too much paperwork, too much required effort to maintain such plans, too little hands-on implementation, too little buy-in, too little enthusiasm from staff, too little actual incident readiness, and too little effort put into preparing staff to think ‘on their feet’ when a disruptive incident occurs.
It affects entire organisations. Senior management ends up with false sense of security that everything is covered with technical controls, that risks are managed well, and that staff are ready to act if a cyber-attack or other incident were to occur – and that is if management even understands that the broader workforce must play a part in identifying and reducing information security risks. Whilst, in reality, only a few individuals, such as the BCP manager, the Chief Information Security Officer (CISO) and any IT (Security) staff keep themselves familiarised with the content of the plans and procedures, or even worse, they are the only staff who even know these plans exist.
Even if organisation-wide awareness campaigns are occurring, non IT/Security/BCP staff are usually getting on with their normal business without understanding the context and how their daily work might incur risk. Until an immediate trigger occurs (e.g. a real-life cyber incident blocking their data, network or application access), they don’t even think about all the issues that could affect them. Often, information (including cyber) security and business recovery procedures only get written or refreshed for audit or other compliance related purposes. And if staff can avoid being involved, they usually will.
The problem actually starts much earlier than that. BCP managers, CISOs and IT Security staff tend to work in a solitary way, or mainly involve those in an organisation who work directly with them. At best, they may try to have some dialogue with senior management to provide confidence that the risks are managed and ensure the top can go to sleep at night.
It is often challenging to get buy-in, time and attention from middle management and the general workforce who are busy ‘doing their job’. And that’s where the ball stops rolling in many BCP and Cyber Incident Response Planning (CIRP) initiatives.
The result is that mountains of documentation may get produced (including detailed preventative and impact-reducing controls for a range of incidents such as ransomware, DDoS attacks, malware, phishing and social engineering), but these are either written very much generically, e.g. using a standard template ‘downloaded off the Internet’.
More ‘fit for purpose’ style documents (including practical manual work-arounds) are preferred, but these is often invested in just once and then easily get out of date. If a real incident occurs, most staff are oblivious to the incident (or confused), thereby increasing the chance of worsening the impacts. They don’t know their role, what to look out for, what treatment options to activate and/or who has the authority to give them instructions. In a nutshell, they’re far from ready.
These problems stem from the following six mistakes…
Only the BCP manager, CISO, IT and/or IT Security staff are fully aware of the plans and these individuals become ‘single points of success’ without the broader workforce being ready at any time for an incident. Little or no integration exists with broader incident management processes. Or worse, the entire plans have been written by an external party who haven’t aligned it with the organisation’s processes, structure, priorities and culture.
In addition to over-dependency on a few internal skilled individuals, there tend to be an over-reliance on (and over-confidence in) external recovery services providers and Cyber Incident Response (CIR) providers. Will their contractual promises and Service Level Agreements (SLAs) survive a substantial influx in demand for their services if many of their clients are affected by the same incident, such as an industry-wide ransomware attack or wide-spread flooding? Have you discussed with them how they might juggle their various clients’ needs for help and where you are on their priority list? Taking legal action to address their non-compliance and getting compensated weeks or months after the event won’t help you to maintain proper service levels and relationships with your own clients - and your reputation in the marketplace.
Complicated and jargon-filled procedures sent by technical staff to business divisions and expecting their staff to understand and adopt them without proper guidance. Staff within the divisions are often unclear about their role in the plans and the purpose of some of the treatment options (e.g. password change policies, phishing attack simulations, BCP exercises and staff training programs), which results in low uptake, attempts to circumvent certain controls and eventually creates resistance amongst the broader workforce to help keep the process alive.
Top management, whilst aware of the risks and the need to comply with relevant regulatory requirements, often doesn’t commit sufficient time to truly understand their own role in the processes, palms it off as an ‘IT thing’, isn’t equipped with the skills to actively guide middle management and general staff and doesn’t commit sufficient resources to embed awareness programmes across the organisation.
The CIRP and BCP are built as large documents, which are centrally managed by the BCP manager, the CISO and other Security staff, not regularly maintained and impractical in real incidents, because relevant content is difficult to find. Version control (if any) may be impeded by only one person being able to edit the latest version at a time. And when the IT systems are deactivated as a precaution, the CIRP and BCP documents can’t be retrieved as it sits on a system that is now unavailable.
Simulation tests being timed inconveniently, repetitive in terms of the scenario, not including sufficient business context/relevance and/or having a ‘pass/fail’ flavour - causing participants to try to look good in front of bosses rather than trying to find areas of the plan that need improving.
I have observed organisations spending hundreds of thousands of dollars on consultants, only to find they still make these 6 mistakes. The resulting problems recur every few years when the documents are out of date. Or sooner - and this is much worse - when a real-life flood, fire, data breach or other incident occurs and the plan (and other controls) don’t work - or nobody knows how to activate them.
Equipped with a short, sharp, dependable BCP and CIRP (integrated where possible, in terms of key decision-makers and related teams), your business will be in a far better position to respond confidently in an actual incident, protecting its brand and reputation, meeting its legal responsibilities, and ensuring the needs of its staff, clients and stakeholders are met. To achieve this, senior management needs to commit to these processes ‘all the way’.
The goal is for everyone to be able to sleep soundly at night knowing that, not only are good plans in place, but also that they are up to date, and that everyone knows what to do should an incident occur.
This article was first published in the 2022 Better Boards Conference Magazine.
Cybersecurity – Is This a New Directors’ Duty?
The Board’s Role in Business Continuity Management
What Directors need to know about Cyber Security