#BootHole - It's Bigger Than you Think (part 1 of 2)
Updated: Aug 6, 2020
EDIT: For "part two" of this story, covering patching, monitoring, and risk reduction strategy for this vulnerablity, check out "BOOTHOLE; The Rest of the Story" EDIT: Updated to draw attention to #RedHat devices that are #bricking if/once the#boothole #patch is applied.
Much energy surrounds the incredible discovery of a hardware flaw (CVE-2016-10713) so pervasive and so impactful that it easily eclipses every other bug from 2020. A simple buffer-overflow in the GRUB2 bootloader for every Linux system on earth (and countless other systems as well), allowing for complete device takeover, indefinite persistence, and with no real mitigations or workarounds. While we don't have smoking gun attribution for actors currently persisting on devices via this vector, the fact that creating an exploit for it is relatively trivial (there are Address Space Layout Randomization (ASLR) or Data Execution Prevention (DEP/NX) equivalents in bootloader world), and the fact that this vulnerability has been around since GRUB2 (over a decade...and technically since GRUB 1.99), means it is extraordinarily likely that either criminal or APT (or both) actors have already been leveraging this vector and staying persistent. Especially given how recent advancements in the OS-level anti-malware and EDR space (Predictive AI, context/behavioral AI, deep visibility, etc) have made commodity type malware phishing attacks less and less the go-to method for compromise.
To put #Boothole in perspective, it is important to focus on eight key areas of magnitude:
1) The scope and magnitude of the impact potential
2) The suitability of that potential for various threat actor types
3) The lack of visibility organizations and missions have into sub-operating system activity
4) The difficulty, risk and inertia required by vendors to remediate it (in IT and OT)
5) The general inability to adequately mitigate the risk in the interim
6) The broader tail-winds that are driving APT and criminal actors to hardware level persistence now more than ever
7) What you can actually DO about it as an organization or mission going forward
1) Scope and Magnitude of Impact
There's no other way to say it: This is as big as they come, and arguably bigger than that. I won't spend your time describing why having full control over a device from hardware, OS, and all the way up through applications and secrets and associated user accounts is bad. I also won't waste time describing why an 11 year old GRUB2 bootloader shared by the majority of computing affects every single organization, vertical, mission, datacenter, cloud service, SaaS service, production environment, patient-operational environment or ICS/OT operation on earth. What I will suggest, however, is that there is potential for this vulnerability and indeed this entire class of vulnerabilities, to be used in a worse-case "War of the Worlds" scenario. You may recall the aliens in that movie rise up from under the earth's surface. Sure, perhaps they did not begin under the surface, but that is indeed how they made their grand entrance and surprised humanity. As we all know in cyber security over the last 30+ years... it is not the threat you know about that hurts you the most. It's the one you didn't see coming that was already there, that one the battle before the battle ever began.
This is that threat if there ever was one. How else would we even end up with the situation we find ourselves in... a 10 year old bug that has been exposing billions of computers to complete device compromise, and nothing we can do about it over night.
2) What Kinds of Threat Actors Would Use This Vulnerability?
Here too, the answer is "all of them". But why? The over-arching reason is simple: The modern attack lifecycle is already a converged, blended, multi-objective kill-chain of events. It's already a perfect storm of adaptable capabilities, serving adjustable motives and objectives, along with their associated impacts to an organization. TTPs (Techniques, Tactics and Procedures) of current adversaries are nearly unlimited in scope, capability or objective. I've lived on the front lines of Incident Response. This is no longer a 'trend'...it is in full swing and only going to remain so. So then, why is #boothole in particular so valuable given this pretext? Simple: omnipotence combined with persistence. There's no reason an advisory can't, with #boothole, compromise an entire enterprise undetected, steal all of its sensitive data, relax all it's cloud and internet facing configurations, extort all its executives, and then encrypt and hold ransom its entire operation on both IT and OT sides, and then after that... *persist* through OS backups and do it all over again. Or why not just leverage the vuln outright and do like these guys did for quick wins? And on the APT side... the simple reason is summed up in two words; Information Warfare. We are fast approaching a new era of 'full on' information warfare, which we'll look at in a moment. But for now, know that the reason #boothole is so attractive, is because an attacker can modify data over time, do so undetected, and persist to be able to do so indefinitely, to the attacker's advantage and to the victim's peril. I just defined #InformationWarfare for you, so remember that definition when we get to 6), #tailwinds.
3) The Lack of Visibility
This, too, is bigger than you think. It's true, certainly for #boothole, and it is exponentially more true of an entire class of platform (aka 'hardware') level vulnerabilities. The actual attack surface is beyond massive. Just take a typical modern server blade... there might be several dozen, maybe even a hundred sub operating systems. Half of those are running some form of Linux. All of them sit below the traditional Operating System. Many of them would allow for persistence and vectors to attack the OS. Yet... here most of us are, and as we sit here, we can't even see our own #bootholes on our own devices. There are very few organizations that can actually see the vulnerability, let alone do anything about it. (don't get me wrong, see 8) below for what you can do regardless)
It is no surprise that the same researchers that formed ChipSec, are the same ones that founded @Eclypsium, and are the same ones that discovered #boothole. So it is also no surprise, then, that they indeed have a solution that allows their clients to see #boothole and a plethora more at the hardware level. After all, Boothole, and related UFEI vulns, probably only make up less than 5% of what the bad guys can do on hardware. We absolutely are decades late in solving for visibility here as an industry, but, better late than never!
4) Difficulty, Risk and Inertia Challenges
#Boothole is not something you just plug up like a normal vulnerability. It is going to take a lot of risk and a lot of momentum to be able to patch this without breaking things. I'll just go out on a limb and suggest that majority of OT environments will take a year or more for their vendors to get this right without disrupting operations with unplanned downtime, or at a minimum, without rendering vendor backup and restore SLA's useless in the process.
The technical details of this challenge are already presented perfectly in the #boothole blog. The net net is that the difficulty in patching for this vulnerability is a defining characteristic of what makes this (any of the many similar vulnerabilities in this class that don't yet have CVE's assigned that are alluded to in the blog) such a nasty and pervasive risk going forward. UPDATE: Case in point, RedHat devices bricking.
5) Can't We At Least Mitigate?
Kind of, but not directly. There is no interim work-around that prevents the exploitation of this vulnerability. When you contrast this with the recent Microsoft DNS server vulnerability, it's what stands out the most. For the DNS vulnerability, there is a work-around mitigation that renders the exploit unviable. For #boothole, however, there is no such mitigation. At best you can do the things we'll discuss in 7) below. But no matter what you monitor for, or how you correlate, or what you do to prevent and detect things like privilege escalation; the bad actors will already be one step ahead and security control of the device will be lost for an indefinite amount of time. That makes the current mitigations mere speed bumps, not road barricades.
6) Geo-Political, Economic, and Societal Tailwinds Driving Actors To Hide on the Steel
This can either be a days-long dissertation, or it can be a few sentences. Or even better, some bulleted points for us to contemplate:
Trade-War: China provides the US hardware. A lot of it. This class of vulnerabilities can and are exploited at any stage of the production and operational lifecycle of a device. There is absolutely no reason why the Chinese government or Chinese competitors to the West, wouldn’t want to leverage an elegant, easy, persistent, nearly invisible foothold at the device level. Economic and Political tensions with China can only serve to accelerate this dynamic.
COVID19 Pandemic: There has never been a global event as profound as COVID19 in the post digital transformation world we live in. Entire new threat actor groups have sprung up as a result. Governments have ramped up espionage efforts tied into COVID19 treatments and vaccine research.
Hospitals have been ransacked with ransomware during the toughest of times. #boothole gives any adversary superpowers, and superpowers like Russia (APT29) and China are very much focused on asymmetrical advantages to gain the upper hand in the fight against COVID19.
Traditional OS defenses are getting better: Suffice it to say, we are doing better at detecting and preventing malware and LOL (Living off the Land) threats that target the OS. Predictive AI, behavioral AI, deep visibility, cloud-computing strengths, big data analytics… all of it is coming together finally, and defenses are finally getting faster at keeping up with the pace of the modern, automated adversary. We aren’t there yet. But more and more, it is becoming difficult for actors to rely on malware and LOL techniques and be able to persist in an environment for as long as they’d like. At some point, we cross a threshold, where the “shortest path” to compromise and persistence will be at the hardware level. Every day we get closer to that precipice.
US Elections: Voting machines suffer from a lot of challenges. Well, they suffer from every challenge there is. They are rarely patched, they run Windows and Linux and they are pretty much just a bad idea to begin with at every level. #boothole… just because it even *exists* can cast doubt on whether not a machine is compromised, or an entire inventory of machines has been. If you tell the voting public that there may be a vulnerability that no county has the ability to even see, let alone do anything about, that exists on every machine in the county… what does that do to the public’s confidence in democracy, when democracy itself is being “stress-tested” now more than ever, bearing the weight of the pandemic, the economic fallout, the social unrest, and hyper polarized voting public. #boothole doesn’t even need to be exploited, for it to inflict maximum damage. In Brazil, how might the 2020 Mayoral elections go in October for example, if that population realizes that all of their linux voting machines could be compromised and they have no good way to tell. Is that a viable condition for any democracy?
Social Unrest: We are on the cusp of a new risk paradigm when it comes to social unrest and political affiliation. Soon, BEC will occur that leverages the specter of being exposed for one’s past statements and political affiliations. The Hacktivist that chooses to target an individual for extortion, is the very same hacktivist that a) won’t want to get caught, b) will want to persist over time, c) would value to ability to modify emails, photos stored on a device, and other such data, in order to frame an executive and extort them. Call it: “Activist Information Warfare”. It’s coming. Russian Disinformation Campaigns: Tying in COVID-19 geo politics, and the elections both, and capitalizing on recent social unrest, Russia has been very effective in their disinformation campaigns. Now that this has been exposed in the mainstream, Russia will need to pivot to less-obvious mechanism of disinformation and related information warfare tactics. Enter, #boothole. What better place to hide, to modify data undetected, to gather intelligence, and do so from inside of American news agencies rather than trying to artificially amplify bogus news from elsewhere.
7) What can I actually DO about this vulnerability?
This is the question of the hour. Nay, of the next year and beyond. While the original Blog breaks down strong recommendations and mitigations, many of which require vendors’ actions, here is my own take on which recommendations a) you can control and b) to focus on in the near term:
1) Install all OS updates across desktops, laptops, servers, and appliances. While basic in nature, it is not always common in practice, to urgently force a workforce to update all their computing devices. In light of #boothole, I believe that OS level patching moves to the top of the stack in priority.
2) Do all you can to prevent hashtag #privilegeescalation attacks in both Operating Systems and via Application vulnerabilities. Monitor for this continuously. This buys as much time as possible, for that which you can best control. This is the ‘speed bump’ we spoke of earlier. Slow down the attack lifecycle as much as possible to buy time. Besides it is always a good idea to be able to detect this, so now you have the perfect excuse to motivate SECOPs to get tooling.
3) If you have the ability to monitor for changes to, and contents of, the EFI bootloader System Partition, take advantage of it and start pivoting from it if you have hunt capability. In the hashtag#Eclypsium solution, for example, monitor the “MBR/Bootloader” component. Use this for correlation to anything else of malicious content or behavior on the device, or network activity coming from it, and/or accounts associated with it, priv-esc, etc.
4) Enterprises will need to make sure they have pre-production-stage lab capabilities in sufficient capacity and expertise to address this mission-critical aspect, especially in any OT environments...note this will involve supply-chain/3rd party coordination and vetting/accountability from here on out. As the original blog mentions:
"Further complicating matters, enterprise disaster recovery processes can run into issues where approved recovery media no longer boots on a system if dbx updates have been applied. In addition when a device swap is needed due to failing hardware, new systems of the same model may have already had dbx updates applied and will fail when attempting to boot previously-installed operating systems. Before dbx updates are pushed out to enterprise fleet systems, recovery and installation media must be updated and verified as well."
5) Now is the time to fold this type of risk into your overall supply chain risk management efforts. What are your ICS/OT vendors doing to gain visibility into this class of vulnerabilities? In DOD and other high-side networks, are your CDS (Cross Domain Solutions) actually protected from this vector? In patient operational / medical environments, what life-critical devices should be examined by their respective vendor? How are these devices monitored in their operational context? If and when an active threat campaign is discovered that using this vulnerability for persistence, how would your organization, and your suppliers, be able to monitor for it and detect it so that you knew to take additional forensic actions on those devices immediately?
As I think and learn of other actions organizations and mission can take to get as far afield as possible, I will update this list. Or course, please do read and apply as much as possible from Eclypsium's list of mitigations, as we'll as these recommendations, which I will paste here for quick reference:
1. Right away, start monitoring the contents of the bootloader partition (EFI system partition). This will buy time for the rest of the process and help identify affected systems in your environment. For those who have deployed the Eclypsium solution, you can see this monitoring under the “MBR/Bootloader” component of a device.
2. Continue to install OS updates as usual across desktops, laptops, servers, and appliances. Attackers can leverage privilege escalation flaws in the OS and applications to take advantage of this vulnerability so preventing them from gaining administrative level access to your systems is critical. Systems are still vulnerable after this, but it is a necessary first step. Once the revocation update is installed later, the old bootloader should stop working. This includes rescue disks, installers, enterprise gold images, virtual machines, or other bootable media.
3. Test the revocation list update. Be sure to specifically test the same firmware versions and models that are used in the field. It may help to update to the latest firmware first in order to reduce the number of test cases.
4. To close this vulnerability, you need to deploy the revocation update. Make sure that all bootable media has received OS updates first, roll it out slowly to only a small number of devices at a time, and incorporate lessons learned from testing as part of this process.
5. Engage with your third-party vendors to validate they are aware of, and are addressing, this issue. They should provide you a response as to its applicability to the services/solutions they provide you as well as their plans for remediation of this high rated vulnerability.
Scott Scheferman, Owner
Armanda Intelligence LLC