#BootHole - "The Rest of the Story" - Patching, Monitoring, Mitigation (Part 2 of 2)
BLUF: Key Take-Aways following today's Eclypsium Webinar on #BOOTHOLE (aka no, it really is bigger than you thought, and now here is how to deal with it!)
These take-aways are specifically of note as they may ore many not have been discussed or written about prior to today’s webinar. I've also added my own tips and thoughts around the edges. I've used bullet-formatting where possible to keep this read as short and impactful as possible.
Why this vulnerability is “not like the others”
We have been hearing that this one is different than most, but here are some of the reasons why it is so. Understanding these helps put the entire challenge into stark perspective.
For this vulnerability you can’t just have Microsoft automatically push out a patch; too many things will break. Ergo, to fully remediate the vulnerability, it requires a manual process. Further, this manual process can (already has done) brick systems and render recovery disks unusable.
While the buffer overflow is not wormable unto itself…. We need to be reminded of the recent $10,000,000,000.00 event called NotPetya that did indeed leverage a similar vector to destructively wipe disks. And while that 2017 attack leveraged the MBR (which has now been partially mitigated in more modern systems using Secure Boot/UEFI), attackers have already demonstrated that they have both the tools and techniques needed to target today’s UEFI environment. Put more simply: There is no reason at all, that another NotPetya like wormable/destructive event can’t happen that would leverage whatever the pervasive remote code execution (RCE) “vulnerability du jour” is, and combine it with a #boothole class of vulnerability to target the sub-operating system environment for sake of evasion, persistence (surviving OS reboots and re-images) or destruction. All a piece of malware needs to do is replace the existing bootloader with a vulnerable one, and name it such that it is the one that runs first. Even if GRUB2 requires a password, it simply won't help, because the password authentication process happens after GRUB2 is already loaded and the damage is done. The researchers at Eclypsium dub this as BYOB (#BringyourownBootloader), and there is already malware that does this. And this works for any system, or appliance or IOT device that has a signed version of GRUB2!
This vulnerability (and to at least some extent this entire ‘class’ of boothole vunlerabilities being discovered) is platform agnostic. This means it runs on x86, ARM, or Android, again to the extent a signed vulnerable version of GRUB2 is utilized (e.g in a dual-boot Android device, network appliances, a bitcoin mining rig, <insert imagination here>.
Systems that would not be vulnerable might be ones in which the config file itself is built Into the same binary as GRUB2. Or, if perhaps the GRUB2 config file is signed in a un-common manner outside of the usual process. Or, ones that are not even running Secure Boot; but make more mistake, if you have critical systems not running Secure Boot you already have much bigger security problems, so no… turning of Secure Boot is not a risk mitigation.
There will be many more of this class of vulnerability discovered now. During the coordinated disclosure effort, which has been monumental, there have already been other teams testing for, and finding, other vulnerabilities that do not yet have their own CVE’s. The goal here is to not have to repeat such a painful patching process in the future, and also, to address the fundamental systemic/process-oriented flaws in the secure boot process itself.
How to actually prioritize and mitigate this risk over time
This is the number one question I personally have been asked, and I think Eclypsium nailed it when someone on the webinar today asked them same type of question.
I will paraphrase for us here: If an organization or mission already has unpatched critical RCE (Remote Code Execution) type of vulnerabilities in its OS or apps, they should prioritize those foremost, as is nearly always the case. RCE’s are the worst by definition: they are remotely exploitable, they can be wormable (leverage in order to create a worming function akin to what we saw in WannaCry and NotPetya), and they allow arbitrary code to run (whatever the attacker wants). RCE’s excepting, however, organizations might want to prioritize the risk associated with these #boothole class of vulnerabilities immediately right after. But why?
This is a pervasive vulnerability across majority of modern computing systems (translation: massive diversity and scope of impact). If a system is using UEFI as part of the secure boot process…it is probably vulnerable, unless it is confirmed it is not.
Attackers already know how to (and have done) take advantage of this kind of vulnerability to great effect
The biggest destructive cyber event in modern history leveraged (in part) this class of vulnerability for it’s primary destructive impact
Attackers can no longer rely on MBR given UEFI secure boot in Windows 10, for example, so they in fact *must* pivot to using *exactly* this method for best chances of persistence, evasion and destruction.
This vulnerability is not going away any time soon, so attackers will learn to rely on its pervasiveness and persistence as a class of vulnerability.
The rest of the traditional OS and network security stack is getting better and better at keeping pace with the adversary, and even when it fails to do so, attackers can’t persist there for very long before the modern enterprises’ new-found visibility and threat-intelliengence /anomaly/correlation capabilities soon uncover any breadcrumbs left. I’ve spent the last 5+ years leading teams working to solve for this space… the bad guys are running out of options.
Addressing the Risk Head On
OK now that you have prioritized accordingly, how do you actually address the risk, plan a patching remediation strategy, and most importantly, mitigate the present risk in the interim? Here’s the short version:
Foremost: realize you are at risk as you read this. This isn’t a ‘future event’, it is a ‘present risk’ and it is pervasive. Frame it accordingly with the rest of the organization as you work to raise their awareness and call to action.
Patching Strategy: As we learned on the Eclypsium webinar today, NSA gently suggests first updating all your firmware for a given class of device, to the same level. After that, then go for the revocation update. This creates fewer cases you might need to test, as there will be less versions floating around in your production environment (that can break if you get this wrong).
Frequently check the Eclypsium.com blog for updates from myriad vendors addressing the vulnerability, patches, etc.
Before jumping to do Linux updates, make certain there are no reports of devices being bricked as we’ve already seen. This is a very complex and tricky patching process even for the vendors, and there is a high likelihood of getting it wrong overall. Let things bake a little bit before you take the plunge unless you have the resources and testing lab to try earlier patch versions out
Test any new firmware on the specific model of device you have with the specific firmware you have; there is a propensity for bugs on certain models of systems and not others (process, implementation, etc)
Don’t disable Secure Boot in order to ‘get rid of the problem’- you will have just made your device exponentially less secure by doing so.
Pay attention to how many different images/OSes you may need to boot on a given device. You need to address all of those images, including mission critical backup images which may not boot if you haven’t addressed them too (Possible work-around, temporarily disable secure boot during backup image restore efforts). Ask: “Have I updated all of the different images prior to installing the new revocation?”
Microsoft Update can’t save you here… this isn't just another Patch Tuesday ritual. If Microsoft were to attempt to universally push the revocation list update to everyone, it would brick an untold number of systems. Ergo, this is a manual process, and must be tested before deploying or riskier still, widely deploying to any large set of similar devices.
Be aware that you likely have (in any given organization) different teams responsible for different operating systems, patching, recovery, etc. Be aware of this for devices that are dual-boot, for example. Make sure image recovery teams are folded in to the process as well so that their images still boot when needed. (Risk Note: this is extremely important! The very moment you need a system restored, is the most costly, high-impact moment of any incident… don’t exasperate those costs/downtime/mission risk by leaving this part out!)
As a part of this process, indeed some SHIM keys will also need to be revoked. This is non-trivial, as it requires higher and higher authority up the certificate chain, given that there is only so much space on hardware to store new entries. Microsoft tried similar updates to only 5 keys in February and there were challenges. For this, we are talking hundreds. So it will take time, planning, complexity-solving, and careful deployment to fix.
If I can’t quickly patch, for all the above reasons, then how do I best mitigate this risk in the interim?
The short answer is to monitor these files on the hardware level and look for changes, anomalies, exploit or payload code, size differences of the same file across a class of devices/firmware, etc. The reason this is so critical is that it buys the organization and mission time against the adversary. When you are dealing with a pervasive and ‘omnipotent’ type of vulnerability like this, time either works for you or against you. Monitoring for malicious sub-OS changes will allow your analysts to pivot from that device to the Operating System, Network, memory and identity/credentials associated with the compromised device. Additionally, these form high fidelity signals from which to prioritize and pivot: if you see exploit code or payload code in a config file, you pretty much know that the most important thing you can do right now, is understand the rest of the stack sitting on top of that device hardware. Period.
The most basic way to do this is:
Note: this is my take on the information provided on the webinar today. Start by looking at the integrity of what’s in the UEFI partition on every system in your environment:
ASK: “Does it have GRUB2?”
If yes: “Does my GRUB2 config file look weird with an exploit or payload?” (If it looks like one, it is one). Take immediate action on this host, and forensically examine for any other indicators of compromise at the OS and network level.
Also important: Inspect other areas of the device's hardware: Firmwares, BIOS, and dozens of other places that badness can hide and happen.
If no, then ask: “Is there any indication of change as to the integrity of what’s in the UEFI boot partition?”
Place that host in a list of suspect hosts and make sure that what you are seeing as a change isn’t the (ironic) result of having patched some devices and not others.
Form a baseline across all device classes, starting with the most critical. Monitor for any UEFI system partition and underlying firmware changes, any bootloaders that aren’t expected to be there, or any config file that looks suspect (or has payload/exploit code, etc.).
What Tools Can I Use to Do This?
Now that we know how and what we need to do, the question becomes “What tools do I have to gain this visibility and understand my exposure and monitor for past or active signs of compromise?”. There are two ways for now that I am aware of:
Eclypsium’s own solution that monitors, analyzes and alerts to any critical / likely malicious platform (aka hardware level) changes across all devices in an enterprise or operational environment. As they mentioned in the webinar, ask them to do a trial in your environment, and from everything I’ve heard from others, it will likely an eye-opener. Their research was what originally turned me on to this entire vector awhile ago, and while we all knew it was going to be a ‘matter of time’ before this vector became mainstream, the scary truth is; it always has been. GRUB2 is a decade old, let alone all the many dozens of other “platform” level vulnerabilities and real-world attacks like Not-Petya. Suffice it to say, I would advise an initial effort to, at the very least, gain visibility into your own environment, into this space, and baseline it. There’s no better way to do that than with their solution. (If there is I want to know about it, so leave a comment or drop me a line).
Eclypsium has released scripts which can be used to detect bootloaders that are being revoked by this dbxupdate. Per the README: “This repository was created to contain relevant helpful scripts and any additional tools or information that can assist others in managing their BootHole vulnerability mitigation plans.” It is always great to see a vendor go the extra mile to help those that may not be in a position to procure their solution, be able to at least get started in assessing their environment. Helpful Hint: running -v (verbose) flag will also get you hashes of the files being looked at. Helpful, indeed! Look at ways your team can automate these scripts so that you can shorted the amount of dwell time the adversary has on the device. Remember, today’s destructive/ransom attacks take only seconds and minutes to unfold. Dwell-time is everything.
One last idea here from me: Leverage these scripts as part of your standard IR playbook going forward. More likely than not, an attack targeting boothole will begin on the host OS. So look to make sure the attacker hasn’t pivoted from the OS to the steel by running these scripts on any devices associated with an incident. In fact if you don’t do anything and just do this, you are much better off against this vector than you were yesterday. Remember, until all of your devices have their revocation database updated per the patching notes above, this vulnerability is not remediated, and will remain exploitable and available as part of any modern malware attack chain scenario.
Hopefully this blog was helpful in capturing some, not all, of what was shared on the webinar today. It is meant to be a follow-up to the earlier blog that focused more on the scope, background, threat actor landscape, geo-political and other tail-winds surrounding this class of vulnerabilities. While there is some overlap, I suggest it as a read in case you landed here first today. Please share this in your social networks if you found it helpful, and don't forget to sign-up for future low-volume, high-value newsletters at the very bottom of this page.
#bootholevuln #threatresearch #vulnerability #coordinateddisclosure #vulnerabilitydisclosure #Eclypsium #platformsecurity #platformsec #armandaintelligence #CVE202010713 #webinar #takeaways #vulnerabilityresearch