A vulnerability in runC, which allows an attacker to gain host-level code execution by breaking out of a running container, was discovered and reported by Adam Iwaniuk and Borys Poplawski in early January and published as CVE-2019-5736 on 11 February 2019. This vulnerability is highly significant in that it:
- enables container isolation breakout with minimal interaction from an authorized host user;
- typically allows an attacker to obtain root privileges on the host;
- negatively impacts most container environments because many containers run with default Docker security settings and default user (UID 0); and
- affects runC, the most commonly used low-level container runtime in Docker and Kubernetes environments.
The runC container escape vulnerability occurs due to file descriptor mishandling. Upon creation of a new container or upon exec into an existing container by an authorized host user, this vulnerability ultimately allows attackers to overwrite the host’s runC binary with a malicious binary of their choosing. Because runC is generally invoked from the host within the context of the root user, the malicious binary would have root access to the host. This binary then could be used to run any arbitrary code, including code to initiate a reverse shell.
Anyone who maintains container infrastructure should immediately upgrade to the latest version of runC on all container hosts. All major container platform providers have published specific guidance for their users (e.g., EKS, AKS, GKE, and Red Hat).
What are the technical details behind this vulnerability?
The runC container escape vulnerability takes advantage of file descriptor mishandling, as well as a unique aspect of the Linux /proc filesystem, to ultimately overwrite the host runC binary with a malicious binary.
The role of /proc/self/exe
At any given time, the path /proc/self/exe
points to the path of the process that is currently scheduled in the kernel. For example:
root@10577558e82f:/# ls -al /proc/self/exe
lrwxrwxrwx 1 root root 0 Feb 21 12:14 /proc/self/exe -> /bin/ls
When a user launches a new container or execs into a running container (using runC under the hood), at that moment, /proc/self/exe
inside the container points to the host’s runC binary. If an entrypoint script (if launching a new container) or a binary such as /bin/bash
(if exec’ing into a container) is replaced with a reference to /proc/self/exe
, then when the launch or exec takes place, the container re-executes the binary to which /proc/self/exe
points at that time (i.e., the host’s runC binary). Because the execution of runC happens within the context of the container on the second execution, the host runC binary will be used, but it will load runC’s required dynamic libraries from the container filesystem.
The role of file descriptor mishandling
This situation gives an attacker the opportunity to modify one of the runC libraries on the container filesystem with malicious code. As a result of file descriptor mishandling in runC (i.e., the ability to construct a file descriptor path for the host runC binary which can be opened for writing, which should not be possible), this malicious library can be designed to overwrite the host runC binary with arbitrary code. That code will be executed on the host within whatever user context the original runC binary was run (which is usually root).
Iwaniuk and Poplawski posted an excellent write-up on their research leading up to the discovery of this vulnerability and proof-of-concept code from runC maintainer Aleksa Sarai is now publicly available.
How exactly can this vulnerability be exploited?
Conditions and delivery mechanisms
The runC container escape vulnerability applies to any unpatched system with default Docker security settings and a container running with UID 0.
An attacker can deliver an exploit for this vulnerability in one of two ways:
- By crafting a malicious image and pushing this image to a target-accessible registry for download; or
- By exploiting a vulnerability in a target application to obtain code execution on the underlying container, and then copying the required malicious library and scripts into that container.
In either of these cases, once the container is prepped with the required malicious library and a binary pointing to /proc/self/exe
, an authorized host user must either pull and run the malicious image (case 1) or exec into the container and use the relevant binary (case 2).
Exploit setup
To exploit this vulnerability, an attacker must decide on a delivery mechanism and then perform the following actions in the target container/image:
- Install runC
- Replace a standard runC linked library with one that contains the code to accomplish items 1-4 in “Exploit Execution” below
- Point an entrypoint script or a binary that is likely to be used by someone exec’ing into a container (e.g.,
/bin/bash
) to/proc/self/exe
- [if delivering the exploit via a crafted image] Upload the malicious image to a target-accessible location
- Wait
Exploit execution
When an authorized host user launches a container from a malicious image or execs into a compromised container and uses the relevant binary, the exploit will fire and perform the following steps:
- Boot the malicious linked library
- Open
/proc/self/exe
in read-only mode to obtain the file descriptor of the host runC binary - Use that file descriptor to construct the file descriptor path (e.g.,
/proc/self/fd/<file_descriptor>
), which points to the host runC binary - Close
/proc/self/exe
so it is no longer in use - Open
/proc/self/fd/<file_descriptor>
in write mode and overwrite the target file (the host runC binary) with a malicious binary - Profit!
How was it fixed?
The fix implemented to correct this vulnerability creates an in-memory file descriptor in the container, copies the host runC binary to that file descriptor, then re-executes runC. Under these conditions, if an attacker overwrites the binary to which /proc/self/exe
points, that binary will live in memory in the container and the attacker will not be able to damage the host.
Memory usage implications
This fix does increase memory usage in the container. For larger applications, this increase will most likely go unnoticed, but containers designed to run with low memory limits (ranging around 16MB) may require additional memory allocation.
What should I do?
Patch
As mentioned earlier, you should immediately upgrade to the latest version of runC on your container hosts and see guidance from your provider if you’re using a managed container service.
If you cannot patch, mitigate your exposure
If you are unable to upgrade to the newest version of runC, you can take multiple steps to mitigate your exposure to this vulnerability.
Avoid pulling from untrusted registries
This measure may help to reduce the risk of downloading an image that has been tainted with malicious code by an attacker.
How StackRox can help
The StackRox Kubernetes Security Platform ships with a default policy to help you restrict the registries from which your applications are allowed to pull images.
Enable SELinux
SELinux is not enabled by default, but it will mitigate this vulnerability by preventing container processes from overwriting the host runC binary.
How StackRox can help
Enabling SELinux is a CIS Docker Benchmark recommendation. StackRox offers push-button evaluation of adherence to industry and regulatory compliance standards, including the CIS Docker Benchmark. With a single scan, StackRox can provide visibility into which hosts across your entire container infrastructure have SELinux security options set and which do not.
Do not run containers with UID 0
Unless you specify a lower-privileged user at the time of container creation, Docker containers run with UID 0 by default. Ensuring that you are not running containers as root will prevent the container user from being able to overwrite the runC binary on the host.
How StackRox can help
StackRox ships with a default policy that will alert on any deployments with containers running with UID 0.
If you cannot mitigate your exposure, detect exploitation attempts
Detect installation of runC
In the case of delivery mechanism via exploitation of an already running container, an attacker intending to leverage the runC container escape vulnerability for privilege escalation will need to incorporate runC libraries into the container to carry out the exploit. One way to accomplish this step is to simply install the runC binary in the container itself (Aleksa Sarai leveraged this method in his proof of concept).
How StackRox can help
StackRox ships with a default policy that will identify installation of new packages, including packages such as runC.
Detect modification of binaries
In the case of delivery mechanism via exploitation of an already running container, an attacker intending to leverage the runC container escape vulnerability for privilege escalation will also need to modify a binary to point to /proc/self/exe
. The act of modifying this binary may be detected by either file or process activity on the host, depending on how the attacker chooses to perform the modification.
How StackRox can help
StackRox ships with a default policy that provides visibility into file modifications made in a manner similar to the following method (also used in Sarai’s proof of concept).
Detect reference to /proc/self/exe from within a container
In the case of either delivery mechanism, /proc/self/exe
will be used to reference to the host runC binary. This type of activity by /proc/self/exe
is rarely legitimate and easily detected.
How StackRox can help
StackRox ships with the capability to identify execution of /proc/self/exe
.
Final Thoughts
The runC vulnerability provides a potent path for attack, but you have several means of protecting your container environments. Upgrade to the latest version of runC with security platforms that automatically detect and prevent attempted exploitation of this vulnerability.