Building a Resilient World: Practical Automation Cybersecurity

The Critical Role of BCP and DR Planning in OT Cybersecurity: Lessons from the CrowdStrike Incident

Written by Vaibhav Malik | Aug 2, 2024 11:00:00 AM

In the wake of the recent global IT outage caused by a faulty CrowdStrike update, the importance of robust business continuity planning (BCP) and disaster recovery (DR) strategies in operational technology (OT) environments has never been more apparent. This incident is a stark reminder that even the most trusted cybersecurity solutions can falter, potentially halting critical industrial systems.

The CrowdStrike Incident: A Wake-Up Call for OT Cybersecurity

On July 19, 2024, a routine update from cybersecurity giant CrowdStrike triggered a cascading failure that affected businesses worldwide. This incident impacted various sectors, including aviation, healthcare and manufacturing, demonstrating the far-reaching consequences of malfunctioning cybersecurity tools in our interconnected industrial landscape.

For OT professionals, this event highlights several key points:

  1. The increasing convergence of IT and OT systems
  2. The potential for security tools to become single points of failure in industrial environments
  3. The critical need for comprehensive BCP and DR strategies tailored to OT environments

OT Cybersecurity: A Global Imperative

OT cybersecurity is paramount in securing critical infrastructure and supply chains. The CrowdStrike incident underscores this imperative, showing how vulnerabilities in cybersecurity tools can have real-world impacts on industrial operations.

The Unique Challenges of OT Cybersecurity

Unlike traditional IT environments, OT systems often control physical processes in industries such as manufacturing, energy and utilities. This presents unique challenges:

  1. Safety-Critical Systems: In OT environments, cybersecurity incidents can lead to physical safety hazards, environmental damage or loss of life.
  2. Legacy Systems: Many industrial systems use older technologies that weren't designed with cybersecurity in mind, making them vulnerable to modern threats.
  3. Continuous Operations: Many OT systems require 24/7 operation, making it challenging to implement updates or security measures without disrupting critical processes.
  4. Complex Dependencies: Industrial automation often involves intricate networks of interconnected systems, where a failure in one component can have ripple effects throughout the entire operation.

BCP and DR in OT: More Than Just IT Recovery

In OT environments, business continuity planning and disaster recovery are not merely about restoring data or systems — they are fundamental to maintaining operational integrity, safety and regulatory compliance. Here's why:

Ensuring Operational Continuity

OT systems often control critical infrastructure and essential services. A cybersecurity incident that takes these systems offline can result in significant societal impacts, financial losses and reputational damage.

Maintaining Safety Standards

In industries like oil and gas, chemical processing or nuclear power, system failures can lead to catastrophic safety hazards. Robust BCP and DR plans are essential to ensure rapid recovery and maintain safe operations.

Regulatory Compliance

Many industries with OT systems are subject to strict regulatory requirements. Effective BCP and DR strategies are often mandated to ensure resilience and quick recovery in the face of cyber incidents.

Key Components of Effective BCP and DR for OT Cybersecurity

To build resilience against incidents like the CrowdStrike outage, OT cybersecurity professionals should focus on:

Comprehensive Risk Assessment

  • Identify critical OT systems and their dependencies
  • Assess the potential impact of various failure scenarios, including security tool malfunctions
  • Regularly update risk assessments to account for new technologies and threats in the OT landscape

OT-Specific Redundancy and Diversity

  • Implement redundant systems for critical OT operations
  • Consider a multi-vendor approach for cybersecurity tools to avoid single points of failure
  • Ensure diversity in control systems and network paths

Regular Testing and Drills in OT Environments

  • Conduct frequent DR drills to test the effectiveness of recovery procedures in OT systems
  • Simulate various scenarios, including cybersecurity tool failures and their impact on industrial processes
  • Update plans based on lessons learned from these exercises, considering the unique aspects of OT environments

Offline Backups and Manual Overrides for Industrial Systems

  • Maintain offline backups of critical OT system configurations and data
  • Develop and maintain procedures for manual operation of key industrial systems
  • Ensure staff are trained in manual override procedures specific to OT environments

Incident Response Planning for OT

  • Develop clear communication protocols for various incident types in industrial settings
  • Establish a cross-functional incident response team that includes OT specialists
  • Create detailed playbooks for different failure scenarios in OT environments

Learning from the CrowdStrike Incident: OT Cybersecurity Perspective

The CrowdStrike outage offers valuable lessons for OT cybersecurity professionals:

  1. Don't Assume Infallibility: Even trusted security tools can fail. BCP and DR plans should account for this possibility in OT environments.
  2. Test Comprehensively in OT Contexts: Regular, thorough testing of BCP and DR plans is crucial, considering the unique aspects of industrial systems.
  3. Maintain Operational Flexibility: The ability to quickly switch to alternative systems or manual operations can be crucial during a cyber incident affecting OT.
  4. Collaborate Across Disciplines: Effective BCP and DR in OT require collaboration between OT, IT and cybersecurity teams.

Conclusion

The CrowdStrike incident serves as a potent reminder of the critical importance of robust BCP and DR planning in OT environments. The potential for widespread disruption grows as our industrial systems become increasingly interconnected and reliant on advanced cybersecurity tools.

By prioritizing comprehensive, well-tested BCP and DR strategies tailored to OT environments, as well as reliance upon globally accepted standards and conformance programs, cybersecurity professionals can ensure resilience in the face of unforeseen cyber incidents. This maintains the safety, reliability and efficiency of industrial operations and contributes to the security of critical infrastructure globally.

OT cybersecurity is a global imperative. We can work toward a more secure and resilient industrial future by learning from incidents like the CrowdStrike outage and implementing robust BCP and DR strategies.

Interested in reading more articles like this? Subscribe to the ISAGCA blog and receive weekly emails with links to the latest thought leadership, tips, research and other insights from OT cybersecurity leaders.