Incident Response, Network Security, Patch/Configuration Management

CrowdStrike CEO says 97% of Windows systems back online

Share

Nearly one week after a CrowdStrike bungled software update triggered a massive global outage of Windows systems, 97% of impacted computers are back up and running.

An update from CrowdStrike CEO George Kurtz posted late Thursday claimed that “97 percent of Windows sensors are back online as of July 25” and thanked “tireless efforts of our customers, partners, and the dedication of our team at CrowdStrike.”

To the 3% of customers still scrambling to update systems to get back online Kurtz said he was “deeply sorry.”

“To our customers still affected, please know we will not rest until we achieve full recovery. At CrowdStrike, our mission is to earn your trust by safeguarding your operations. I am deeply sorry for the disruption this outage has caused and personally apologize to everyone impacted. While I can’t promise perfection, I can promise a response that is focused, effective, and with a sense of urgency.”

Through a LinkedIn post on his person account Kurtz added automated recovery techniques combined with “mobilizing all our resources to support our customers” have helped speed recovery of systems.

On CrowdStrikes support page it stated: “Using a week-over-week comparison, greater than 97 percent of Windows sensors are online as of July 24 at 5pm PT, compared to before the content update.”

BSOD tied to out-of-bounds memory read bug

Earlier in the day the company posted a preliminary Post Incident Review (PDF) stating the root cause of the faulty Falcon sensor update.

“On July 19, 2024, at 04:09 UTC, a Rapid Response Content update for the Falcon sensor was published to Windows hosts running sensor version 7.11 and above. This update was to gather telemetry on new threat techniques observed by CrowdStrike, but triggered crashes (BSOD) on systems that were online between 04:09 and 05:27 UTC.”

The Post Incident Review went on to say crashes were tied to “a defect in the Rapid Response Content, which went undetected during validation checks.” It stated that when updated content was loaded by the Falcon sensor, “this caused an out-of-bounds memory read, leading to Windows crashes (BSOD).”

To ensure future updates aren’t met with the same catastrophic fate, CrowdStrike said it would improve Rapid Response Content testing via employing “local developer, content update and rollback, stress, fuzzing, fault injection, stability, and content interface testing.” It also added testing would include new validation checks for code to prevent similar issues.

In addition to the above CrowdStrike said it would:

  • Enhanced Resilience and Recoverability
  • Refined Deployment Strategy
  • Boost Third Party Validation

CrowdStrike-Microsoft outage cost $5B to Fortune 500 firms

The estimated cost of the CrowdStrike faulty update and ensuing Microsoft system outage has a price tag of $5 billion in direct losses to Fortune 500 companies alone, according to insurance firm Parametrix.

Key finding of the Parametrix report titled “CrowdStrike’s Impact on the Fortune 500” (PDF) include:

  • About 25% of Fortune 500 companies experienced disruptions due to the CrowdStrike outage.
  • The most heavily impacted industries were Airlines, Healthcare, and Banking
  • Insured losses are expected to fall between $0.54 billion and $1.08 billion, representing 10%-20% of the total financial loss.

Dave Stapleton, CISO at ProcessUnity, said the CrowdStrike may face liability and legal issues. “It still remains to be seen if customers will have the ability or desire to bring legal claims," he said.

He added concerns have been voiced by the fact CrowdStrike should not have release an update on a Friday. “[Any vendor] shouldn't release an update to entire global population at once, all releases require the full suite of tests and shouldn't assume an update is solid based on the success of a previous update,” he said.  

Josh Lemon, director, managed detection and response for Uptycs, agreed with Stapleton, adding: "Based on what occurred last week, CrowdStrike appears to simply push updates to all customers at once which is fairly dangerous given the amount of customers they have."

Best practices, Lemon noted, is to push updates to a subset of customers on production systems. Next, monitor to make sure there are no adverse effects. Then push to another subset of customers and continue this process until everything in production is updated, he said.

The three percenters' club

Meanwhile on Thursday Delta Airlines said it is still mitigating broken systems tied to the outage. The company reported while systems are coming back online, still thousands of Delta flights have been canceled.

Mitigating issues tied to the CrowdStrike-Microsoft outage may be harder for some, explained Justin Endres, chief revenue officer at Seclore.

"There are a few reasons recovery is slow for some," Endres said. "For one, the manual intervention to rebuild these systems. Clearly those with a larger number of Windows machines will take longer."

Encryption may also present a challenge when it comes to remediating CrowdStrike-Microsoft devices, Endres said. "Organizations that have invested in security where they are encrypting their computers' hard drives will prove even more challenging to access the (Falcon sensor) file that needs to be deleted."

Separately, the Department of Transportation is investigating Delta over the way passengers were treated during the CrowdStrike-Microsoft outage.

An In-Depth Guide to Network Security

Get essential knowledge and practical strategies to fortify your network security.
Tom Spring, Editorial Director

Tom Spring is Editorial Director for SC Media and is based in Boston, MA. For two decades he has worked at national publications in the leadership roles of publisher at Threatpost, executive news editor PCWorld/Macworld and technical editor at CRN. He is a seasoned cybersecurity reporter, editor and storyteller that aims always for truth and clarity.