In my previous blog, I described a new Docker-based platform we are open-sourcing: Menagerie. Menagerie enables batch payload scheduling and tracking inside Docker containers. The blog focused on architecture and development process, so it’s now time to move into deployment and production topics starting with security, which is naturally close to our hearts. This blog vaguely leans on the previous blog but can also be beneficial to new readers looking into applying security to Docker-based systems.

Identify and plan

The first stage in securing any system is to identify weak points and create a remediation plan. For Menagerie, we performed the following:
  • Covered the CIS best practices, as detailed in their benchmark document. To facilitate testing, the good people at Docker provided the docker-bench-security project which outputs a list of actionable items.
  • Reviewed data flow in our code and identify potential vulnerabilities, focusing on malicious or erroneous user input.
  • Reviewed and picked security features from the various Docker versions. The guiding light for us was that the system is running diverse and somewhat arbitrary payloads. Given that, we wanted to limit executions as much as possible and avoid any interaction with the host. See details on what we chose below.

Remediation

Apply CIS best practices

We executed the docker-bench-security test inside our Vagrant environment. This yields a list of actionable items, which we reviewed and chose to fix points that will carry beyond the specific testbed. It is recommended to run this test on any deployment environment to get the specific points that may be unique to it. Here are some of the things we fixed following the test results:
  • Disable inter-container-communication (--icc=false flag to Docker daemon): allow only explicit connections.
  • Set read-only rootfs to containers (--read-only flag to docker run).
  • Remap the root user using user namespaces; see additional notes below.
  • Added Apparmor profiles to infrastructure and engine containers; see additional notes below.
The vagrant environment includes third-party infrastructure containers: mysql, rabbitmq, and docker-registry. These are likely to be deployed separately in a production environment, and securing them is outside the scope of this blog. Suffice to say there is enough material out there to get you going.

Anticipate and review for code vulnerabilities

This is of course not a Docker issue per se, but is an important part nonetheless. In Menagerie, the data flow starts with a web-API upload, followed by writes to a file store, database, and a message queue, and finalized by reads and execution by the backend workers. We inspected that data going through handoffs, writes, file system access and execution, and made our best effort to see that the external input is sanitized where needed and cannot be used to exploit. You can read more about the importance of sanitizing external input.

Apply hardening features

So as mentioned, since this system runs diverse engines over arbitrary input, we wanted to make sure that even if something does slip in, the system is hardened enough to avoid significant damage. This includes:
  • Block unauthorized external access to API.
  • Escaping — access to host file system and other resources.
  • DoS via excessive resource consumption.
  • Etc.
I will go through the details of each hardening effort.

Limit API Access

When deploying in a production environment, it is recommended to make the service port private and place a secure upstream web server in front of it (e.g. Nginx). This upstream server should be configured to handle SSL communication, use client side certificates, limit rate, and any other security feature that comes to mind.

Docker volumes

Although this is not a security feature, it does have a security gain by eliminating any need to mount host volumes inside the containers. Anything written inside containers and Docker volumes stay there, and a good separation can be kept between the data and config each of them handles. Docker volumes are available from version 1.9

Execution flags

Menagerie executes various engine containers via the backend workers. As this execution is indirect, we wanted to enable the system admin to impose limitations in execution. This is done by stating additional run-flags in the engine config; these are passed to the Docker create/run command. Here is an example from the config we provide for the apktool:
{ 
  "name": "apktool",
  "workers": 2,
  "image": "localhost:5000apktool:stable",
  "runflags": [
     "--read-only",
     "-–net=\”none\"",
     "--security-opt=\"apparmor:apk\""
  ],
  "cmd": "/engine/run.sh",
  "sizelimit": 50000000,
  "inputfilename": "sample.apk",
  "timeout": 240
}
The above disallows network, makes the container rootfs read only, and applies an apparmor profile we created. Note also that we have a ‘sizelimit’ property that limits upload size for a specific engine, at the frontend level. More on execution flags is in the run command documentation.

Apparmor profiles

Apparmor is a Linux security module that implements MAC for application hardening. It allows creating an execution profiles for applications, stating expected file-system access, process creation, network access, and more. By creating an apparmor profile for base containers and engines, we lock them into a specific behavior and limit the freedom of post-exploit malicious activities. The process of creating profiles is built around execution in complain (report-only) mode. It is out of scope for this blog, but I do recommend reviewing the wiki we wrote for adding a profile to a new engine container. Note that Docker 1.10 added support for seccomp profiles as well. This allows locking the application/container into specific system calls. This looks like a great tool which we simply did not yet get a chance to explore.

User namespaces

Although running containers under a non-root UID is supported, this requires explicit work and imposes limitations on the processing code; when planning to embed arbitrary engines we cannot always guarantee they will work in this environment. One great kernel feature that was exposed in Docker 1.10 is the capability to run under a separate user namespace. There is an excellent explanation on this in this blog. Essentially, the containers are executed where internal UIDs are mapped to an external UID range that is exclusive to this specific namespace. In particular, containers can still run under the root user internally, but in effect are mapped to an unprivileged user UID externally. Working with user namespaces imposed a few challenges and exposed a few issues, which we reported to the Docker people. Some points for those of you who want to get hands on with this feature:
  • The –-read-only flag is not supported at this point in conjunction with user namespaces.
  • The docker cp command had an issue in 1.10 (fixed in 1.10.2): wrong user mapping for copied files.
  • Docker build files explicitly create internal directories when using COPY/ADD, but not with correct user mapping. This renders the directory unusable and is avoidable by creating the target directories upfront (this should be fixed in a future).

Conclusion

I hope I provided insights into our hardening process that can help with your Docker system, and of course that it got you curious to read more about Menagerie. We did our best to make Menagerie as secure as possible, from the code level and up to using the best configuration options Docker has to offer. This is a work in progress and we would love to hear you input and contributions.

Join The Discussion

Your email address will not be published. Required fields are marked *