CrowdStrike outage: lessons for operational resilience

Our insights, observations and key lessons from how firms responded to the CrowdStrike outage and their preparedness to respond to future incidents.

Since the beginning of 2023, we’ve seen a continued trend of third-party related incidents. Between 2022 and 2023, third-party related issues were the leading cause of operational incidents reported to us.

These outages emphasise firms’ increasing dependence on unregulated third parties to deliver important business services. This highlights the importance of firms continuing to become operationally resilient in line with our rules.

By March 2025, firms in scope of PS21/3: Building operational resilience[1] must make sure they can deliver important business services in severe but plausible scenarios, like the CrowdStrike outage, to help minimise the impact on consumers and markets.

Below, we outline our key lessons following this incident, including examples of how firms’ compliance with PS21/3 allowed them to respond effectively, and areas firms should strengthen.

We encourage all firms, regardless of how they were affected by the CrowdStrike incident, to consider these lessons, to improve their ability to respond to and recover from future disruptions.

Background to the outage

On 19 July 2024, CrowdStrike released a Falcon content update for Microsoft Windows hosts, with a defect that caused systems to crash. Many firms use CrowdStrike for device protection, threat intelligence and response services. CrowdStrike's core technology, the Falcon Platform, detects and responds to malicious threats.

As CrowdStrike is widely used, we saw varying degrees of operational impact on regulated firms, with no sector more impacted than others, and minimal consumer harm.

We engaged with firms during the incident to understand the impact on firms and the market, operational responses, and recovery. Following the restoration of services, we engaged with firms to better understand the lessons learnt.

Our general observations following the outage

By investing in operational resilience and following our operational resilience rules (PDF)[2], firms were able to identify consumer and market impacts and prioritise their important business services.

Firms that had mapped their important business services, and the resources necessary to deliver these services, were able to prioritise getting key services back online to reduce the overall impact the incident had on their operations.
Firms benefitted from having tested scenarios that were severe but plausible, including those impacting multiple important business services at the same time.
Firms who had clearly defined and tested communications strategies were able to quickly and efficiently respond to, and communicate with, customers and stakeholders.

Next steps

Firms should consider if their current testing scenarios are adequate and assure themselves that impact would be minimised during operational disruptions.

Detailed insights into how firms across the sector responded

Observations on ensuring the resilience of infrastructure

Firms reflected on the need to identify single points of failure within their infrastructure and technology stack, and identify the changes, investment and actions needed to ensure resilience of these.
Firms considered various ways to ensure resilience in their infrastructure, such as procuring systems on different builds and devices with different Operating Systems (OS), and some have considered updating change management processes for third parties with deep-level system access.
Some firms identified the need to review change management processes for software and content updates.

Next steps

Firms should ensure adequate testing of updates before deployment and consider phasing releases across user groups to support containment of any failures.

Observations on third party management

We found that some regulated firms affected by the outage also provided services that supported other regulated firms' important business services. This increased the impact of the disruption.
Firms that had conducted detailed mapping of third and nth party relationships were able to quickly understand exposure and take mitigating actions to manage the impact.
Firms who had existing relationships and pathways to share information with third party providers were able to respond quicker during the outage.

Next steps

Firms may benefit from reviewing third-party management frameworks regularly, and after significant events or incidents, to improve the effectiveness of third-party risk controls. It may be useful to:
- Identify if changes may be required to your third-party categorisation, risk assessment and management processes, due to the potential or actual impact of the incident.
- Review vendors’ performance, service levels, contractual obligations, continuity arrangements and exit plans against your resilience requirements for the third parties, and remediate any gaps identified.
- Consider and understand interdependencies to help identify and limit the impact a disruption may cause.

Observations on incident response and communications

Firms reflected on the need to ensure staff and management are aware of, and familiar with, incident response and crisis management processes. Aligned to this, firms reflected on the importance of ensuring stakeholder contact details are updated and readily available (online and offline).
The timing and completeness of incident notifications to us varied extensively across affected firms. Effective engagement was timely and clearly defined the impact of the incident on the firm’s important business services.
Firms who had pre-defined communication plans were able to use these to respond quickly.
Due to the source of this incident being a third party, firms with contracts clearly setting out responsibilities were able to more effectively receive information from affected third parties.

Next steps

Firms may consider making communications more efficient through pre-approved communication templates, preparation of service status pages, banners, or other communication formats accessible to stakeholders.
Firms may benefit from ensuring third-party contracts clearly set out responsibilities for service monitoring, incident notification and timely updates, during and after incidents, to enable effective incident response where service providers are affected.
Firms may consider conducting a post-incident review following a significant disruption or any event that affects the market. This would include a review of the overall effects to determine if any changes are needed to your important business services or impact tolerances, for example, the need to classify a service as an important business service, or revise impact tolerances.

First published: 31/10/2024 Last updated: 31/10/2024

CrowdStrike outage: lessons for operational resilience

On this page

Background to the outage

Our general observations following the outage

Next steps

Detailed insights into how firms across the sector responded

Observations on ensuring the resilience of infrastructure

Next steps

Observations on third party management

Next steps

Observations on incident response and communications

Next steps

On this page

Operational resilience[3]