COMMENTARY: When a cyber event hits, speed-to-recovery remains paramount. And now, with the ability to simultaneously retrieve clean data and rebuild applications, businesses can get back online quicker, helping to mitigate the financial carnage, customer backlash, and even real-world harm that can result from an extended IT incident. But adopting these capabilities means breaking old habits.
When not well-practiced at recovery readiness, organizations hit with large-scale incidents typically report outages that can last 24 days or longer. The recovery process often gets split into four categories: get control of the situation; communicate with external stakeholders; analyze and recover the “right” data; and rebuild and relaunch the applications. Each step is critical and interdependent, crossing siloed processes and organizations, and in turn, adding considerable time to resolution.
[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]
Complicating this response: the reality that, in an attack, companies cannot trust their IT environments. By the time an organization discovers a breach, the hackers have had plenty of time to infiltrate the network, spreading corruption and wreaking havoc. Organizations must tread carefully. Otherwise, they risk reinfecting themselves. Under these circumstances, data, networks, credentials, core services, and infrastructure are all put under intense interrogation.
It's also what separates a general IT failure, such as the CrowdStrike incident, from a malicious cyberattack. While the CrowdStrike situation disrupted many businesses, it was driven by a reversible mistake rather than by destructive attackers. In a true cyberattack, companies should assume their data and environment have been compromised and must stand ready with an effective recovery readiness response.
Organizations should think of recovery like a medical emergency room. Each new “patient” needs a rapid assessment of their vital conditions to determine a scope of the compromise, ensuring that critical cases are routed to specialists for a deeper inspection and forensics. These experts require isolated and secure “operating rooms” to start to rebuild a company’s network after an attack.
And ultimately, these different “doctors” need to simultaneously work on interdependent tasks. This could include reestablishing core services, including directory services and access management, while analyzing the state and contents of recovery data. Combined, these improvements will shave valuable time off the recovery process, potentially saving the organization millions of dollars and limiting the reputational fall-out from extended downtime.
The initial response: 36 hours
Under a conventional approach, when a breach gets discovered, the company may declare and launch an emergency response that will last for days. Teams often put in long hours at the office, with a growing mound of pizza boxes and an all-hands-on-deck, 24/7 effort.
During the first session of the “emergency room,” the primary objective involves assembling facts, rather than speculating. Businesses must quickly understand the extent of the infected systems, the integrity of the recovery environment, immediate response actions required, and any early context about the scope of the compromise.
Once the team has the attack under control, it becomes a waiting game for test results that may help identify the “source of the infection” and the proper “inoculation process.” Similar to mandated medical specialists being called in to review and issue a prognosis, cyber insurers may inadvertently impose added delays for an organization. And with state-sponsored attacks, the federal government may do its own forensic analysis and add more steps to the process.
Only once these reviews are done can the team begin the more tedious task of reconstructing their applications. And while the specialists start to work on surgical repairs, the application teams are watchfully waiting as the urgency and stress factors start to hit critical levels.
The dog days of data recovery: one week
As the security teams continue their assessments, many will expect the recovery team to quickly produce digital clones of the affected system. However, under the intense spotlight, any existing flaws in the process will become exposed, severely crippling the recovery efforts. For example, many organizations find they are missing important elements that would allow an applications to fully recover. Or the backup copies are too far out-of-date. Or worse yet, the backup copies are residing inside a location infected by ransomware.
Fortunately, forward-thinking organizations proactively assess, expose, and remediate these risks by modernizing their plan and operations. The steps include ensuring there are timely, consolidated backups of all important data; establishing air-gapped separation of the backups from the operating network; and frequently testing and validating recovery procedures to make sure the team has muscle memory. These improvements help organizations more quickly surface clean and timely backup datasets, ultimately speeding the rebuilding process.
The re-launch: the subsequent 14 days
Armed with clean data and validated recovery procedures when an incident occurs, the application teams can then move forward with rebuilding and relaunching the systems. But often, it’s not as simple as restoring the core building blocks of the application. In today's digital world, software and application services are intertwined, and many programs rely on other systems to run appropriately.
And so, companies must understand their digital operating web, then make sure the adjoining systems are recovered along with the core applications. Otherwise, while primary systems are ready to go back online, they are not operational until the ancillary programs are also running safely again.
With “cleanroom” technology available in a cloud-first environment, recovery teams can map-out these different connections. They can then run simulated attacks and recoveries to pressure-test their response without adding substantial costs or injecting new IT risks. And so, when the inevitable breach happens, there’s a recipe for rebuilding not only the core systems but also the interconnected software.
Condensing the timeline
When companies can react and recover data and applications simultaneously, they can significantly reduce the typical 24-day recovery timeline. Coordination, seamless handoffs, and well-practiced actions and reactions help companies leverage the common services of a cyber resilient platform to accelerate the clean outcome.
Meanwhile, with cleanroom technology, companies can quickly unpack their secure cloud backup copies to restore instances of the application or run a thorough scan of the data contents to expose and purge the corrupted data. It’s how a team can quickly analyze and sanitize the critical recovery point to accelerate the “clean” outcome.
To extend the medical analogy, no patient should be willing to jump on the operating table blindly and trust a rookie surgeon paging through a how-to-book under candlelight to get it right on the first attempt. The same principle applies for successful cyber recovery: practice and experience produce successful outcomes.
Cleanrooms can serve as a secure, isolated cloud “operating room,” letting teams practice, improve, and learn from mistakes, ensuring they are ready for the real emergency when it inevitably comes. Better yet, when they hit the system’s off-switch, it’s not just powered down, it’s purged, ensuring a fresh, sanitized start each time without incurring an escalating medical bill for dark, idle resources.
Organizations should not simply put their trust in hope and candles to weather the event. Today, every business has become increasingly dependent on its digital infrastructure. When it’s knocked out, the disruption permeates across the organization’s employees, customers, supply-chain stakeholders, and shareholders. Stay ready and poised to react and recover.
Brian Brockway, global chief technology officer, Commvault
SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.