Home IT Leadership Early IT takeaways from the CrowdStrike outage

by Susan Bradley

Contributing Writer

Early IT takeaways from the CrowdStrike outage

Opinion

23 Jul 20248 mins

Incident ResponseIT Strategy

As the IT world recovers from the massive outage triggered by CrowdStrike’s Falcon update, CISOs and CIOs would be wise to keep a running ledger of lessons learned. Here are some initial considerations.

Programmers engrossed in deep collaboration, diligently working together to solve complex problems and develop innovative mobile applications with seamless functionality.

Credit: dotshock / Shutterstock

Whether you’ve survived the CrowdStrike incident or didn’t use CrowdStrike and are merely seeing the impact to others, taking time to learn lessons from this event is vital. After all, if you couldn’t recover easily from this, then you may be lost trying to recover from a ransomware attack.

At issue are potential shifts you might want to consider making to your staffing strategies, technical processes, and communication channels and culture, as well as your approach to ensuring hardened assets overall.

The list of lessons learned from CrowdStrike will likely grow as more information comes to light about the impacts the outage has had on organizations around the globe, but for now, the following look at the recovery process provides insights into how you might want reconsider or reinforce your strategy around key processes and resources to ensure a more robust response going forward.

Staffing rethink

Recovering from CrowdStrike has been an all-hands-on-deck event. In some instances, companies have needed humans to be able to touch and reboot impacted machines in order to recover — an arduous process, especially at scale.

If you have outsourced IT operations to managed service providers, consider that those MSPs may not have enough staff on hand to mitigate your issues along with those of their other clients, especially when a singular event has widespread fallout.

Instead, you may have only your existing staff to call on to remedy a situation — and to train folks not used to technology tasks to perform key steps in order to help get your network back online as soon as possible. Alternatively, you may need to consider shipping replacement equipment or alternative ways that you can reinstall or refresh operating systems, as was the case with CrowdStrike — all of which requires personnel.

Thinned staffs over-reliant on service providers are at risk of poor recovery from incidents, no matter the source.

Tighten up your technical resources

As Microsoft points out in response to CrowdStrike, besides getting into safe more and being able to enter commands, your next hurdle may be getting access to something intended to protect your device: Bitlocker.

When the computer reboots after entering safe mode, if Bitlocker is enabled you will be asked to enter a recovery key. I speak from experience that, more often than not, accessing Bitlocker recovery keys can take time. They may be backed up in your local Active Directory. They may be printed out and saved in a location that, in the initial moments, you may forget where they have been stored.

Ensure you review recovery steps and processes on a regular basis to guarantee that your team knows exactly where those recovery keys are and what processes are necessary to obtain them. While Bitlocker is often mandated for compliance reasons, it also adds a layer of complications you may not be prepared for.

During this event, we’ve seen interesting workarounds for getting systems operational. Via social media, people such as LetheForgot shared the following:

“We went into advanced restart options to launch the command prompt, skip the bitlocker key ask which then brought us to drive X and ran ‘bcdedit /set {default} safeboot minimal’ which let us boot into safemode and delete the sys file causing the bsod.”

Another poster recommended “Even in safe mode, crowdstrike folder access was denied. Used cacls to give more rights to user (bypassing admin) and deleted file.”

If you are wondering why this works and doesn’t demand a Bitlocker recovery key, when the computer is booting in safe mode by default this is not something that should be encrypted. You still need to provide valid user credentials to access the C drive, bringing up the next roadblock in recovering access. Do you have access to the domain controller, or will you need access to a local username to get to the C drive and delete the file you need to remove to restore to a functional machine? If you have used LAPS or software that randomizes the Local Administrator password, you will need access to that resource as well.

Once you get access to the machine, then you can delete with the following command:

del C-00000291*.sys

The lesson here is not only to review recovery steps often but also to follow community discussions closely for creative technical solutions when collective IT disaster unfolds.

Build a culture of communication

That brings up another key resource needed during any incident: clear information regarding what is happening and what to do.

Late on the evening of Thursday, July 18, it was clear from comments on social media that something was happening. It was also quickly identified what the underlying culprit was, a CrowdStrike update that went faulty. In other incident situations, you may not be so quickly informed. It may not be clear what has happened and what assets have been impacted. Often, you’ll need to reach out to staff who are closely working with impacted assets to determine what is going on and what actions to take. Often what you first think the issue is and what actions to take may not ultimately be the actions you need to take. Or you may find easier steps to take.

In addition, you may need to determine whether a Plan B may be more beneficial as a plan of action. In this instance, I’ve seen companies decide to move up plans to redeploy computer systems to replace impacted machines. Since a hardware refresh was planned in the coming weeks, they merely moved up plans to redeploy hardware rather than attempt to fix the machines.

All of that requires clear communication among all parties involved — a culture you need to build, in addition to having incident communication strategies and processes in place.

Reassess strategies in wake of lessons learned

Just as with any incident, clean up and follow up are essential.

For those who have machines back up and recovered post-CrowdStrike, there are certain items you should review. First is consider reissuing Bitlocker recovery keys. If you handed out the recovery key manually, consider reissuing and rotating keys.

If you are considering changes to your infrastructure, rather than ripping out your technology and replacing it with a different operating system, consider the alternative of changing how you deploy software and restrict what software is allowed to run on these special-purpose machines. We use antivirus because we don’t have a limit on what we allow to run on our systems. If we spent the time and resources limiting what is allowed to run, machines would be more secure.

Of course, you do need to reconsider what operating system is used for what purpose. We’ve seen too many social media posts of bluescreens on what are merely overgrown notification screens. Do you truly need a full operating system to merely provide information? Or are there alternative ways that you can provide that same information?

Should you not rely on vendors to do their own quality control? From Microsoft to now CrowdStrike, it’s unclear whether decreases in budgets for individuals tasked with testing are the true root cause of issues. In the case of CrowdStrike, a logic error in its Falcon update was to blame, CEO George Kurtz wrote. How exactly that came about will need to be sorted out in the fallout.

Even if you weren’t impacted by this event, you may want to review how fast you roll out update files. From vendor updates to definition updates, you may consider that we trust too much that our vendors have done their due diligence. With many firms cutting budgets, we can no longer take this quality control for granted. Consider having updating rings and have your own process of testing and validation when it comes to rolling out updates even to antivirus and protection suites. Ultimately no software should be completely trusted.

by Susan Bradley

Contributing Writer

Susan Bradley has been patching since before the Code Red/Nimda days and remembers exactly where she was when SQL slammer hit (trying to buy something on eBay and wondering why the Internet was so slow). She writes the Patch Watch column for Askwoody.com, is a moderator on the PatchManagement.org listserve, and writes a column of Windows security tips for CSOonline.com. In real life, she’s the IT wrangler at her firm, Tamiyasu, Smith, Horn and Braun, where she manages a fleet of Windows servers, Microsoft 365 deployments, Azure instances, desktops, a few Macs, several iPads, a few Surface devices, several iPhones and tries to keep patches up to date on all of them. In addition, she provides forensic computer investigations for the litigation consulting arm of the firm. She blogs at https://www.askwoody.com/tag/patch-lady-posts/ and is on twitter at @sbsdiva. She lurks on Twitter and Facebook, so if you are on Facebook with her, she really did read what you posted. She has a SANS/GSEC certification in security and prefers Heavy Duty Reynolds wrap for her tinfoil hat.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Early IT takeaways from the CrowdStrike outage

As the IT world recovers from the massive outage triggered by CrowdStrike’s Falcon update, CISOs and CIOs would be wise to keep a running ledger of lessons learned. Here are some initial considerations.

Staffing rethink

Tighten up your technical resources

Build a culture of communication

Reassess strategies in wake of lessons learned

More from this author

Who writes the code in your security software? You need to know

Password Spraying verhindern in 4 Schritten: So sichern Sie Legacy-E-Mail-Konten ab

Beware the tools that can bring risk to a Windows network

Reduce security risk with 3 edge-securing steps

Teams, Slack & Co. absichern: So wird das Collaboration-Tool kein Security-Albtraum

What CISOs need to know about Microsoft’s Copilot+

How to future-proof Windows networks: Take action now on planned phaseouts and changes

3 Windows vulnerabilities that may not be worth patching

Show me more

How to ensure cybersecurity strategies align with the company’s risk tolerance

North Korean hackers actively exploited a critical Chromium zero-day

Ransomware recovery: 8 steps to successfully restore from backup

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

CSO Executive Session India with Charanjit Bhatia, Head of Cybersecurity, COE, Bata Brands

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

Cybersecurity Insights for Tech Leaders: Addressing Dynamic Threats and AI Risks with Resilience

Early IT takeaways from the CrowdStrike outage

As the IT world recovers from the massive outage triggered by CrowdStrike’s Falcon update, CISOs and CIOs would be wise to keep a running ledger of lessons learned. Here are some initial considerations.

Staffing rethink

Tighten up your technical resources

Build a culture of communication

Reassess strategies in wake of lessons learned

Related content

To pay or not to pay: CISOs weigh in on the ransomware dilemma

CISOs urged to prepare now for post-quantum cryptography

How leading CISOs build business-critical cyber cultures

CrowdStrike crisis gives CISOs opportunity to rethink key strategies

From our editors straight to your inbox

More from this author

Who writes the code in your security software? You need to know

Password Spraying verhindern in 4 Schritten: So sichern Sie Legacy-E-Mail-Konten ab

Beware the tools that can bring risk to a Windows network

Reduce security risk with 3 edge-securing steps

Teams, Slack & Co. absichern: So wird das Collaboration-Tool kein Security-Albtraum

What CISOs need to know about Microsoft’s Copilot+

How to future-proof Windows networks: Take action now on planned phaseouts and changes

3 Windows vulnerabilities that may not be worth patching

Show me more

How to ensure cybersecurity strategies align with the company’s risk tolerance

North Korean hackers actively exploited a critical Chromium zero-day

Ransomware recovery: 8 steps to successfully restore from backup

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

CSO Executive Session India with Charanjit Bhatia, Head of Cybersecurity, COE, Bata Brands

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

Cybersecurity Insights for Tech Leaders: Addressing Dynamic Threats and AI Risks with Resilience