CIO POV: CrowdStrike Incident Offers 3 Digital Resilience Lessons

August 13, 2024 Omer Grossman

On July 19, 2024, organizations around the world began to experience the “blue screen of death” in what would soon be considered one of the largest IT outages in history. Early rumors of a mass cyberattack were quickly squashed: it seemed a minor software update was to blame for countless shopping excursions cut short, airline flights grounded and critical surgeries postponed.

Nearly three weeks later, the world is still reeling from the faulty CrowdStrike update, and new details are emerging about what went wrong. On August 6, the company published an in-depth technical root cause analysis and acknowledged shortcomings in its software testing processes.

As the dust settles and we continue to learn more, here are three observations and lessons in digital resilience that every organization can take away from the incident:

1. Prepare for Your Worst Day

The global CrowdStrike outages highlighted risks associated with vendor lock-in (over-reliance on any one vendor) and even left some organizations questioning their cloud strategies completely. This scrutiny is important, but it needs to be balanced with practicality.

Virtually every organization today relies on cloud services for some aspect of its business. While sometimes keeping “crown jewels” on-prem and distributing workloads across different providers can limit failures, it also adds more complexity and cost. All these factors must be carefully considered when building an anti-fragile organization with a solid IT infrastructure.

As security leaders, we must prepare our organizations to function with limited digital capacity in the event of an outage or service degradation. Anything can happen, like when Google Cloud deleted a customer account in May 2024, causing two weeks of downtime for 647,000 users in a completely isolated and random misconfiguration accident.

As the saying goes, plans are nothing; planning is everything. Evaluate existing disaster recovery and business continuity plans with fresh eyes. Run and stress-test playbooks regularly for a wide range of scenarios. Go through the entire exercise of bringing backups online to see what’s working and what isn’t. Then, do it all over again and again.

2. Ask Hard Questions

The CrowdStrike incident has prompted many organizations to examine their third-party dependencies to understand better how vendor outages could impact their operations. Now is also a good time to review critical vendor vetting processes for both existing and future partners. For instance, until last month, you may not have considered the importance of phased software updates. Does the vendor give you the option to roll out updates gradually—first testing the patch on a test server, then deploying it to a small test group of users to ensure things are working properly—and, if necessary, stop mid-way if there’s an issue before it impacts the entire organization? How robust are the vendor’s secure development lifecycle and quality assurance processes? How do they test and validate their updates before sending them out into the world? What security certifications do they have to back up these claims?

Asking the right questions is critical to building trust. The more customers know, the better they can prepare for the unknown. On the vendor side, clearly defined customer expectations can drive process and quality improvements and, ultimately, ensure more resilient systems.

3. Communicate Openly

Consistent, transparent communication is critical during a crisis. On July 19, in the early hours of the incident, CrowdStrike’s communications were tightly coordinated and centralized—no speculation or mixed messages to be found on social media. Company leadership quickly took responsibility, apologized and kept customers in the loop as they worked to remediate the problem. Despite widespread issues, people respected this transparent approach, and many customer organizations have publicly voiced their continued loyalty to CrowdStrike.

Organizations can apply these valuable crisis communications lessons to their own DR/BCP contingency planning efforts. How will security leaders keep business stakeholders apprised of an unfolding situation? Are the right communication channels in place to quickly mobilize internal teams and get systems back online? What are the best ways to keep customers and partners in the know? What’s our corporate social media policy—and who is authorized to speak with members of the press during an incident?

There Will Be Another Black Swan

The CrowdStrike incident surfaced critical questions around software testing and update quality assurance that must be addressed. It also reinforces the inherent dangers of a technological world that, in the words of Thomas Friedman, “we’ve taken from connected to interconnected to interdependent.” This interdependency means that every organization will experience a black swan event at some point. It may come in the form of a critical vendor outage, a ransomware attack or something else. By embracing an “assume breach” mindset and continuously stress-testing contingency plans and processes, your team will be better prepared—mentally and operationally—to face a crisis, respond rapidly and emerge even stronger.

Omer Grossman is the global chief information officer at CyberArk. You can check out more content from Omer on CyberArk’s Security Matters | CIO Connections page.

Editor’s Note: For more insights from CyberArk CIO Omer Grossman on this topic and beyond, check out his appearance on CyberArk’s Trust Issues podcast episode, “Trust and Resilience in the Wake of CrowdStrike’s Black Swan.” The episode is available in the player below and on most major podcast platforms

The Human Factor in a Tech-Driven World: Insights from the CrowdStrike Outage

AI and Deep Fake Technology v. The Human Element The idea that people are the weakest link has been a const...

Identity Security Brings a Competitive Advantage to Midsize Enterprises Leveraging New Technologies for Business Growth

In this paper, ESG Analysts describe how midsize organizations, with limited resources, can gain a competit...

Up Your Security I.Q. by Checking Out Our Collection of Curated Resources.

CIO POV: CrowdStrike Incident Offers 3 Digital Resilience Lessons

1. Prepare for Your Worst Day

2. Ask Hard Questions

3. Communicate Openly

There Will Be Another Black Swan

Previous Article

Next Article

STAY IN TOUCH

CIO POV: CrowdStrike Incident Offers 3 Digital Resilience Lessons

1. Prepare for Your Worst Day

2. Ask Hard Questions

3. Communicate Openly

There Will Be Another Black Swan

Previous Article

Next Article

Recommended for You

As retailers prepare for a season of high-demand online shopping, the risks of cyberthreats continue to grow, much like the need for increased security in a bustling mall on busy shopping days. In...

Join Microsoft, Device Authority, and CyberArk to explore integrated security solutions for the manufacturing industry, offering comprehensive protection and lifecycle security management.

The Clock is Ticking The Digital Operational Resilience Act (DORA) is about to shake things up in the EU, and if you’re not ready, it’s time to get moving. With the new regulations set to...

Easily manage and secure everything your developers access and build in the cloud

AI and Deep Fake Technology v. The Human Element The idea that people are the weakest link has been a constant topic of discussion in cybersecurity conversations for years, and this may have been...

In this paper, ESG Analysts describe how midsize organizations, with limited resources, can gain a competitive advantage when strong identity security is in place.

See how Intelligent Privilege Controls™ are dynamically applied to protect a user’s access when it becomes high-risk.

Ensure comprehensive, layered security, operational efficiency, and business productivity in manufacturing with CyberArk.

This white paper explains how CyberArk Identity Security solutions can help you: • Achieve the key controls in the revised GL20 guidelines. • Secure all identities across all IT systems improving over

Ensure comprehensive, layered security, operational efficiency, and business productivity in manufacturing with CyberArk.

Ensure comprehensive, layered security, operational efficiency, and business productivity in financial services with CyberArk.

Ensure comprehensive, layered security, operational efficiency, and business productivity in critical infrastructure with CyberArk.

Ensure comprehensive, layered security, operational efficiency, and business productivity in automotive innovations with CyberArk.

The CyberArk 2024 Identity Security Threat Landscape Infographic for Education, based on a survey of 2,400 security decision-makers, examines GenAI, machine identities, and third- and fourth-party

Watch to understand the CyberArk Identity Security Platform vision and roadmap and how we will deliver innovation to secure every identity - from the Workforce, to IT, to Developers and Machines.

The CyberArk 2024 Identity Security Threat Landscape Infographic for Manufacturing, based on a survey of 2,400 security decision-makers, examines GenAI, machine identities, and third- and fourth-party

The CyberArk 2024 Identity Security Threat Landscape Infographic for Public Sector, based on a survey of 2,400 security decision-makers, examines GenAI, machine identities, and third- and fourth-party

The CyberArk 2024 Identity Security Threat Landscape Infographic for Healthcare, based on a survey of 2,400 security decision-makers, examines GenAI, machine identities, and third- and fourth-party

The CyberArk 2024 Identity Security Threat Landscape Infographic for Technology, based on a survey of 2,400 security decision-makers, examines GenAI, machine identities, and third- and fourth-party

The CyberArk 2024 Identity Security Threat Landscape Infographic for Financial Services, based on a survey of 2,400 security decision-makers, examines GenAI, machine identities, and third- and fourth-