ICS Cybersecurity Resilience and the Importance of Remote Laboratory

Written by Deshabhushan Chougule | Feb 15, 2022 10:30:00 AM

Introduction

The digital transformation of industries is having a profound impact in ICS. Improvements in cost and performance have encouraged the evolution of the ICS by utilizing IT & OT capabilities in existing systems, resulting in many of today’s “smart” systems, such as the smart electric grid, smart transportation, smart buildings, and industry 4.0. Technological advances have made possible that ICS have great flexibility, scalability, and connectivity, thanks to the intensive use of IT & OT at all levels.

However, these systems were originally designed to be isolated systems instead of connected to a corporate network or Internet, so most of them lack security mechanisms to protect them against external attacks. Replacement of such systems by IT/OT increases the connectivity, but at the same time the criticality of these systems creates a greater need for their safety and security resilience.

This evolution has exposed them to a series of threats for which they are unprepared and has made them vulnerable to malicious attacks that compromises ICS security properties (e.g., integrity, confidentiality, authentication, or availability). On the other hand, this evolution has also allowed the ICS application to not be limited to industry, such as oil & gas, power generation and distribution, transport, health, communications, etc. Attacks on such facilities, especially those categorized as critical infrastructures, would involve extremely serious consequences. Therefore, cybersecurity should be a matter of priority to avoid incidents that interfere with its operation and cause serious economic losses, compromise the safety of people, or cause environmental disasters.

Many cyber events go undetected or unreported. However, there are notable attacks on ICS, such as the German Steel Mill Attack in 2014, where hackers had manipulated the control systems in such a way that a blast furnace could not be properly shut down which resulted in massive damage. Another cyber-attack on the multinational pharmaceutical giant, Merck, reported $385 million in direct financial losses in their 2017 annual report. In this context, cybersecurity of ICS is one of the most important aspects to be taken into consideration. It is necessary to provide robust cybersecurity mechanisms for ICS.

This blog presents some practical and effective steps that ICS providers can take to improve resilience and business continuity in the event of a cyber incident. Since it is not possible to perform the experiments on real control systems, it is therefore required to rely on labs or testbeds. Most of the testbeds have research-oriented purposes to simulate the actual process. With remote lab, the contents should be aligned with the standards and recommendations that are generally used in the field. The system manufacturers, users, and integrators can have the most relevant standards, which defines ICS security concepts and requirements.

Cybersecurity Standards for ICS

The International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) has published the ISO/IEC 27000 standards on IT security techniques for information security management systems and requirements. In 2017, the US Security Framework Adoption Study reported that 70% of IT organizations preferred NIST’s Cybersecurity Framework as the most popular or best practice for IT Security, but also reported that it needs significant investment.

There are a range of standards, regulations, and guidelines available in the ICS field. For guidance on how to secure ICS, there is the “Guide to Industrial Control System (ICS) Security” by the National Institute of Standards & Technology (NIST). Another useful document is “Cyber Resiliency Design Principles,” produced by MITRE Corporation, which provides a set of cyber resiliency design principles. However, one of the most prominent is the ANSI/ISA99 standard by the International Society of Automation (ISA). It is an international standard on “Industrial Automation & Control Systems Security,” being further utilized by IEC in producing the multi-standard ISA/IEC 62443 series.

ISA/IEC 62443 addresses the systems whose compromise can result in any of the following situations:

Endangerment of public or employee safety
Loss of public confidence
Violation of regulatory requirements
Loss of proprietary or confidential information
Economic loss and impact on national security

Though it is possible to consider recommendations at the national and international level, on top of it there are a few region or sector-specific guidelines that must be followed by security practitioners. In Europe, government authorities have increased their involvement in ICS cybersecurity. In March 2013, the European Network & Information Security Agency (ENISA) published a study about ICS cybersecurity called, “Protecting Industrial Control Systems - Recommendations for Europe,” which details the current situation and gives recommendations for improvement.

In the United States, government organizations are also significantly active, establishing a framework to assess cybersecurity in critical sectors. In 2016, ICS-Cyber Emergency Response Team (CERT) published a report with a total of 245 incidents, out of which energy (32% of incidents) and critical manufacturing processes (27% of incidents) were the most affected sectors. Further, North American Electric Reliability Corporation - Critical Infrastructure Protection (NERC CIP) has planned the set of standards designed to secure the assets required to operate the North America’s bulk Electric Systems.

Cybersecurity Resilience Plan for ICS

ICS cybersecurity defense across all industry sectors is inadequate. Unfortunately, the likelihood of a cyber-attack is difficult to estimate. We need a complete approach that includes the relevant aspects or factors that can be categorized, as below:

Size of the Control System: Complexity of automation has been achieved in ICS (e.g., whether its simple digital systems or distributed control systems).
Hardware and Software Integration: The level of third-party hardware integration has been done at the control floor or the number of enterprise resource planning (ERP) software integrations has been done at the plant floor.
Connectivity: The dependency over legacy fieldbus devices like Modbus, TCP, or Profinet. Usage of Internet (including cloud and mobile platforms).
Standardization: Company-wide standard processes and technology used in systems replicate both strengths and weaknesses.

In the case of software integration, ICS providers need some degree of trust with third-party original equipment manufacturers (OEMs), as it is necessary to keep the infrastructure up to date with anti-virus (AV) and security updates. However, sometimes AV and operating system (OS) patch updates can be the highest target for malware (or unintentional errors).

A real example of this is the consequences of McAfee’s AV false positive detection with the 4715 DAT update that incorrectly deleted different file types en masse (including Excel). As OEMs cannot test their updates against every ICS application, these risks can be managed by designing internal testing procedures and hosting cybersecurity services/support within organizations.

Key elements to consider in ICS Cybersecurity Resilience Programs

Security resilience categories range from “very long downtime with high recovery cost” (due to ineffective backup and recovery strategies, unhardened system designs, lack of firewall, etc.) to “short or no downtime with very low recovery cost” (by doing regular maintenance, controlling the applications and implementing IPsec). Standard procedures and policies are shown in Table 1.

Table 1: Standard procedures and policies

Below are four key areas and actions which are practical and effective for resilient systems:

System Architecture: Design the system architecture with in-built resilience, which will be easy to safeguard.
System Version/Update Management: Keep the system up to date with the latest version and remove the obsolescence.
Regular Maintenance and Backup: Maintain the system regularly and improve its ability to recover from any disaster.
Dedicated Support and Resource: Retain standards against pressures of cost, constrained resourcing, and workflow.

A. System Architecture

There are a few things to consider while designing the control system architecture to safeguard it from cybersecurity attacks:

Make it Redundant: Minimize the downtime due to data loss or performance characteristics. In case of critical plants, redundancy can be achieved in many forms over independent standalone systems (e.g., hot or cold standby control systems, automatic failover, etc.). Though the design for redundancy can provide a significant level of resilience against many non-cybersecurity related risks, using identical systems for redundancy can compromise the benefits due to the likelihood of malware.
Make it More Diverse: Minimize the potential damage from a dominant malware attack over the usage of common third-party software (e.g., operating systems, browsers, ERP solutions, etc.). This applies to all the levels of the application, but especially operating system; hence the suggestion is to use a range of different third-party software or its versions to host critical control systems (e.g., OS – Server 2019, 2016 or Browser – IE Edge, 11 or Office 2019, 2016, etc.).

Though greater usage of common software creates greater vulnerabilities, differences in software presents different vulnerabilities and have different patch cycles. Moreover, many attacks are not simultaneously launched across different platforms. There are some similarities in the threats, but not all OS are vulnerable to a common viral threat. This can be challenging for few critical applications, like supervisory control and data acquisition (SCADA), which often supports a single OS, but this recommendation is based on the concept of diversity that will increase the overall resilience.

B. System Version/Update Management

A certain degree of change is required to keep the system up to date (e.g., AV .dat files and software security patches, or system upgrades and obsolescence). These changes need to be managed in a way that it should not weaken the system functioning.

Sometime the “fix” is the virus (e.g., McAfee’s Excel false positives). “Bad” .dat files may cause a mess with such false positive observations or unqualified security patches that cause a stoppage to the control system functioning.

Recommendations to minimize risks:

Do internal analysis/testing of .dat files and security updates before deployment.
Do not use automatic update tools, as some may accidentally break the control systems.
Keep software and hardware within its support age as obsolete systems may contain vulnerabilities (few cannot be rectified).
Use of the most recent software or hardware is also not advisable as it may not be sufficiently tested against control systems.

C. Regular Maintenance and Backup

The ability to successfully recover from an attack is one of the most important aspects of resilience. An effective backup system can make the difference between downtime and not being able to recover. Virtual environments have brought many advantages in this regard, including failover replication.

The purpose of a backup is to provide a copy of the software that is enough to rebuild the system or function. In addition to regular, automated online/offline backups, it is good to periodically backup the critical information to low-cost disposable/removable media that can be write protected and can be physically relocated (e.g., Blu-ray). Some issues may go unnoticed for long periods of time, so it is important to maintain a deep history of backup data.

D. Dedicated Support and Resource

Maximum achievable resilience requires effective/relevant standards, processes, and resources. In many companies, it is a battle to retain standards against the pressures of cost, constrained resourcing, and workflow. Getting correct and immediate support is critical in cyber resilience as the cost of inadequately addressed cybersecurity will be extremely high.

Excessive use of third-party software or the acceptance of irrelevant resilience workflows can collectively and unnecessarily lower the cybersecurity defense, however, the provision of diverse hardware, software, and applications will make it easier for customers to retain the system.

Recommendations to minimize risks:

Learn from the nature of security and integrate the relevant aspects into company standards.
Build cybersecurity collaboration with relevant third-party specialists and the supply chain to maximize defense.
Do not trust OEMs to the max, rather internally exercise managing, testing, and rolling out security-related updates.
Some may prefer virtual environments for offline redundancy options. Ensure that low-cost, high-capacity removable storage is available.

The Importance of Remote Lab in ICS Cybersecurity Resilience Programs

This section presents a laboratory to perform cybersecurity tests remotely for the detection and analysis of vulnerabilities in ICS. In the United States, there is a large-scale testbed program (National SCADA Test Bed-NSTB) dedicated to control system cybersecurity assessment, standards improvement, and training. The proposed internal testbed includes software, controllers, field devices and communication technologies commonly used in real ICS. Automation can work with both real industrial equipment and simulations.

Let’s see a detailed description of both the physical equipment/simulations used to build testbed and the setup/tools used for vulnerability tests, AV or security update validation. We must have effective backup and recovery strategies, system hardening with firewall exceptions, and IPsec implementation. If required, we can have user management and application control policies in place.

Figure 1: Example of a Remote Test Lab for ICS Cybersecurity

The above testbed provides the possibility to perform remote cybersecurity tests using:

ICS Server and Client: It contains the necessary software for the configuration of the SCADA, human machine interface (HMI), and programmable logic controller (PLC). The HMI is designed to control and monitor the physical systems wired to the industrial PLC, whereas SCADA systems are designed for monitoring and storage of the process variables.
OPC Server: An additional communication server through which the open platform communications (OPC) protocol can be implemented using a free tool developed by the Metricon group. It acts as a master, requesting data to and from the device or OPC Clients every second.
Controller with HART and Profibus Devices: An industrial PLC connected to analog and digital modules to simulate the real system. The devices communicate with PLCs designed for this purpose using HART and Profibus protocols.
Simulators for Modbus TCP, DNP3, IEC104, IEC61850, etc.: A simulation tools through which the Modbus TCP, DNP3, IEC104, IEC61850 and other Fieldbuses such as the Profinet protocol can be implemented using a free tool developed by the Axon, Triangle Microworks, and Anybus groups, respectively.
System Hardening with Firewall Enabled at Plant and Control Network: Helps to limit incoming traffic to the PLC, HMI, and SCADA, guaranteeing that they cannot be reprogrammed or modified from unauthorized users or devices. Furthermore, it also blocks all outgoing traffic to isolate the testing environment.
Microsoft and Antivirus Security Update Nodes: Helps to overcome security vulnerabilities and fully manage the distribution of updates that are released over Microsoft, McAfee, or Symantec Update Server to computers in production environment. Microsoft updates will be automatically synchronized with Windows Server Update Services (WSUS) whereas antivirus updates will be auto synchronized with McAfee ePolicy Orchestrator (ePO) or Symantec Endpoint Protection Manager (EPM), respectively.
Vulnerability Scanner: To reduce or mitigate the attack surface to increase the cybersecurity of ICS, it is advisable to perform vulnerability assessments periodically. This type of analysis identifies the vulnerabilities in the system to understand and patch them, for which the vulnerability scanners (such as OpenVAS or Nessus) are useful tools.
Backup and Recovery Server: A fileserver used to store server images and backups of Microsoft-based operating systems. Acronis Cyber Backup delivers data protection as well and provides fast and reliable recovery of apps, systems, and data on any device from any incident.

Conclusion

Increasing the use of information and communication technologies in ICS has exposed them to multiple threats for which they are unprepared, making them vulnerable to malicious attacks. By exploring the ever-changing field of cybersecurity, companies need to manage risks from an expanding attack surface.

This blog presents a few practical and effective measures that companies can take together with existing standards and frameworks which will further increase the cybersecurity resilience. An approach for experimentation in cybersecurity of ICS, based on the replication of a simple ICS, is also proposed. The aim is to provide resilience for an easy definition of ICS cybersecurity. To achieve this purpose, remote laboratories can provide excellent support that companies can consolidate their experimentation with real equipment used in the industry.

View full post