theregister.com

Heterogeneous stacks, ransomware, and ITaaS: A Dr nightmare

Comment Disaster recovery is getting tougher as IT estates sprawl across on-prem gear, public cloud, SaaS, and third-party ITaaS providers. And it's not floods or fires causing most outages anymore - ransomware now leads the pack, taking down systems faster than any natural disaster.

This makes one thing clearer: The more homogeneous and standardized your IT environment, the easier it is to recover from disasters – whatever their cause.

If you're running on your own x86-based servers, with applications deployed as virtual machines or containers, and your networking and storage are software-defined, then you've got a fighting chance. In the event of a failure, you can fail over to a secondary datacenter - either your own or a public cloud environment configured to replicate your infrastructure.

But success depends entirely on how rigorously you keep that environment synchronized: Replicating all changes in real time or near-real time, and regularly testing both failover and failback to ensure they work and meet recovery time objectives.

The same principle applies for orgs that go all-in on public cloud and use a secondary deployment in a different region. The public cloud provider is responsible for maintaining and operating their infrastructure. If you don't fully trust a single provider's disaster recovery (DR) capabilities, you can establish a secondary site with a different cloud vendor for added protection.

ITaaS: Your DR plan ends where your supplier's begins

The more you depart from this idealized homogeneous environment, the more DR difficulties spiral out to enlarge your footprint in the disaster-prone area. There are two particular areas of concern: IT-as-a-Service (ITaaS) and ransomware. Combine the two and you can enter a world of hurt.

The ITaaS model poses a serious risk for disaster recovery, because you're effectively handing over your DR responsibilities to a third-party provider. As the name suggests, ITaaS is where you don't just rent cloud services or subscribe to software, you outsource whole IT functions to someone else, such as an outside IT services provider.

Organizations with ITaaS in their stack are especially vulnerable when outages strike, since they depend on downstream vendors who may claim to offer robust DR but often fall short when it matters. Without strict, contractually defined recovery requirements, you're entirely reliant on whatever internal plans that supplier says it has in place.

And when the disaster is malware, rather than a physical failure, the entire recovery process becomes more complex; it's just as much incident response as it is disaster recovery. The infrastructure may still be running but can't be trusted. Scans to detect compromised computers and corrupted data are a first step towards restoration, if you can trust the results, followed by wiping or replacing systems, reinstalling clean software, and restoring clean data from an immutable backup vault.

If you're using ITaaS, you're reliant on your provider to know what it's doing in this emergency, keep you in the loop as it reacts and repairs, and aid you in any cleanup you need to do across your environment. If you're running your own infra alone, the good news is you're more in control; the bad news is, you've got a lot more work to do.

These tasks are anything but straightforward. Identifying even how the malicious software got into your network and what kind it is, so that it can be tracked, stopped, and blocked in future, can be difficult, depending on your antivirus defenses. Sifting through and restoring good data from bad across petabytes or even exabytes of block, file, and object storage is tough without comprehensive, up-to-date, and immutable backups. That includes not just your on-prem workloads, but also data tied up in public cloud platforms and SaaS applications.

And, to re-emphasize an earlier point, if your ITaaS provider itself gets hit with malware, your own operations can take a major hit - collateral damage from their compromised systems.

The risk is even higher when you outsource critical apps and data to niche providers with little track record or resilience testing. Unlike established SaaS giants that have fended off attacks and coped with hardware failures for years, smaller SaaS vendors have no track record of surviving disasters. The UK's NHS and US healthcare operators have learned that lesson the hard way, repeatedly.

Blood, bugs, and backups: How NHS found out the hard way

In June 2024, blood pathology services across two NHS regions in London were brought to a halt after Synnovis — the outsourced provider — was hit by the Qilin ransomware gang. The impact rippled through hospitals, delaying treatments and operations. About three weeks after the attack, more than 2,194 outpatient appointments and 1,134 elective procedures (aka operations) had been postponed.

Synnovis is a joint venture between NHS pathology services and SYNLAB, a German diagnostics giant with more than 27,000 employees. The firm had consolidated multiple local pathology systems under a unified Laboratory Information Management System (LIMS).

Despite presenting itself as continuity-conscious, with a CISO and a Security Operations Center in place at the time of the UK attack, SYNLAB's other European operations had already been hit by ransomware, such as in Italy in April 2024.

The NHS suffered as a result of an attack on its ITaaS supplier. Any healthcare provider that outsources critical functions - whether blood pathology or insurance payment processing - faces similar risks.

For example, also in April last year, Octapharma Plasma was forced to temporarily close all of its more than 150 US plasma donation centers following a cyberattack, disrupting plasma collection and triggering canceled procedures across the country. Three months later, around 350 hospitals across Alabama, Florida, Georgia, and the Carolinas experienced blood supply disruptions after supplier OneBlood was hit by ransomware.

An American Hospital Association spokesperson summed up the growing concern: “We continue to strongly recommend that hospitals and health systems identify all of their life-critical and mission-critical third-party service and supply chain providers, and develop business and clinical continuity procedures and supply chain resiliency to sustain a loss of access to those critical services and supplies for 30 days or longer.”

Another US example: The UnitedHealth Group’s data processing firm Change Healthcare fell victim to a ransomware attack in February last year. As a major processor handling 15 billion healthcare transactions annually, this breach disrupted insurance claims and payments for hundreds of thousands of doctors, hospitals, and pharmacies across the US.

While initial restoration efforts commenced within weeks, many healthcare providers grappled with financial and operational challenges for several months. UnitedHealth Group has provided billions in temporary financial assistance, and full restoration efforts are still ongoing.

The big lesson here applies to all ITaaS suppliers and customers — not just those in the health sector. Many ITaaS suppliers and their upstream customers have disaster recovery plans that are either inadequate or overly focused on physical incidents, rather than digital threats like ransomware or other forms of malware. If they exist at all, these plans are often rooted in traditional thinking: Hardware failure, datacenter outages, or natural disasters.

For suppliers, there's no substitute for robust, immutable backups, a plan for redeploying applications in a good safe secure state, and tested recovery procedures that cover both physical and cyber incidents. Disaster recovery plans need to include detailed playbooks for both direct hits and collateral damage from upstream failures.

Customers, too, must account for third-party disasters in their own DR planning, especially when they rely on ITaaS for mission-critical operations. Yes, this costs money. And yes, it's frustrating when you've outsourced IT to save money. But ignoring it could cost a lot more.

That isn't to say that managing your own infra entirely by yourself makes you immune to attack and a piece of cake to restore. But at least you have full control and visibility, albeit while also having to implement all of the above. And then you know it's implemented. ITaaS can bring benefits, but you may find out the hard way that your provider isn't as on the ball as much as you had hoped, so factor that into your DR plans – and/or talk to your provider.

Your DRaaS vendor might not cover your actual disaster

There are many disaster-recovery-as-a-service, or DRaaS, suppliers, with Gartner having a Peer Insights list of 63.

As an example, Cohesity offers DR services with SiteContinuity for orchestration and FortKnox, an offsite immutable vault to guard against ransomware. Rubrik provides similar tools for fast recovery in hybrid environments.

HPE's Zerto protects VMware and Hyper-V workloads across on-prem, AWS, and Azure. It also claims support for some SaaS apps, including Active Directory, Microsoft 365, Dynamics 365, Power BI, and Google Workspace, but coverage depends on setup and may require extra configuration. If your stack falls outside that list, you'll need other options.

Veeam delivers DRaaS through cloud partners, but only for workloads already protected by Veeam and replicated to the provider's infrastructure. When ransomware hit Canada's Eastern Ontario Health Unit, its preconfigured Veeam setup had core systems back online in under two hours, helped by a streamlined VM environment and readily available cloud backups.

Three hard truths about DR your CIO won’t want to hear

A general lesson here is that the more homogeneous your IT environment is, the easier it is to protect it against disasters. All the DRaaS suppliers will have degrees of required homogeneity.

Also, data protection vendors can only protect what they can back up, which is rarely everything. And in our experience, organizations serious about on-prem disaster recovery tend to migrate toward mainstream virtualized or containerized apps that are cloud-compatible by design, allowing them to run in the cloud if needed.

A second lesson is to take ransomware very, very seriously indeed. Assume you and your IT supply chain suppliers will be hit and have tested recovery plans in place with verified immutable backups.

The third lesson is that, and sorry to bang on about this, you must police your ITaaS suppliers, require them to have valid DR plans, and validate their recovery procedures. Don't just trust, verify - and if you can't verify then don't trust. ®

Read full news in source page