Containers effectively partition the resources managed by a single operating system into isolated groups to better balance the conflicting demands on resource usage between the isolated groups. In contrast to virtualization, neither instruction-level emulation nor just-in-time compilation is required. Containers can run instructions native to the core CPU without any special interpretation mechanisms. None of the complexities of paravirtualization or system call thunking are required either.
By providing a way to create and enter containers, an operating system gives applications the illusion of running on a separate machine while at the same time sharing many of the underlying resources. For example, the page cache of common files—glibc for example—may effectively be shared because all containers use the same kernel and, depending on the container configuration, frequent the same libc library. This sharing can often extend to other files in directories that do not need to be written to.
The savings realized by sharing these resources, while also providing isolation, mean that containers have significantly lower overhead than true virtualization.
Container technology has existed for a long time. Solaris Zones and BSD jails are examples of containers on non-Linux operating systems. Container technologies for Linux have a similarly extensive heritage: Linux-Vserver, OpenVZ, and FreeVPS. While each of these technologies has matured, these solutions have not made significant strides towards integrating their container support into the mainstream Linux kernel.
In Serge Hallyn’s “Linux Security Modules: A containers cookbook” , discover how to truly strengthen lightweight containers with SELinux and Smack policy. And see resources on the right side for more on these technologies.
In contrast, the Linux Resource Containers project (developed and maintained by IBM’s Daniel Lezcano; see resources on the right side for the code) seeks to implement containers by contributing to the mainstream Linux kernel. At the same time, these contributions may be useful for the mature Linux container solutions—offering a common back end for the more mature container projects. This article offers a quick introduction to using the tools created by the LXC project.
To get the most out of this article, you should be comfortable using the command line to run programs like make, gcc, and patch. You should also be familiar with the task of expanding tarballs (*.tar.gz files).
Getting, building, and installing LXC
The LXC project consists of a Linux kernel patch and userspace tools. The userspace tools rely on the new features added to the kernel by the patch in order to offer a simplified set of tools to manipulate containers.
Before being able to use LXC, you need to download Linux kernel source code, apply an appropriate LXC patch, then build, install, and boot it. Then the LXC tools must be downloaded, built, and installed.
I used a patched Linux 2.6.27 kernel. (see resources on the right side for links.) While the lxc patch to the 2.6.27 Linux kernel probably will not apply to the kernel source from your favorite distribution’s kernel, Linux versions after 2.6.27 may contain significant portions of the functionality presented in the patch; hence, using the latest patch and mainline kernel source is highly recommended. Also, instead of downloading and patching kernel source code, you may retrieve the code using
git clone git://git.kernel.org/pub/scm/linux/kernel/git/daveh/linux-2.6-lxc.git
Directions on how to patch, configure, build, install, and boot a kernel can be found at kernelnewbies.org (see resources on the right side for a link).
LXC requires some specific kernel configurations. The easiest way to properly configure the kernel for LXC is to use
make menuconfig, then select Container support. This in turn selects a set of other configuration options depending on which features your kernel supports.
LXC environments for you to play with
In addition to a kernel that supports containers, you will need tools that make starting and managing containers a simple task. The primary tools used for container management in this article come from liblxc (see resources on the right side for a link, and also see libvirt for an alternative). This section discusses:
- The liblxc tool
- The iproute2 tool
- How to configure networking
- How to populate a container filesystem (by building a custom Debian one or by running an ssh container)
- How to connect to a container filesystem (SSH, VNC, VT: tty, VT: GUI)
Download and expand liblxc (see resources on the right side), and then, from within the liblxc directory:
./configure --prefix=/ make make install
If you’re comfortable building a source RPM, one is available (see resources on the right side).
To manage your network interfaces within containers, you need version 2.6.26 or later of the iproute2 package (see resources on the right side). If your Linux distribution lacks this package, download, configure, make, and install it using the instructions from the source tarball.
Another key component of many functional containers is network access. Bridging (connecting Ethernet segments so that they appear to be a single Ethernet segment) is currently the best method of connecting a container to the network. To prepare to use LXC, we will create a bridge (see resources on the right side) and use it to connect our real network interface with the container’s network interface.
To create a bridge named br0:
brctl addbr br0 brctl setfd br0 0
Bring up the bridge interface with your IP from a pre-existing network interface (
10.0.2.15 in this example):
ifconfig br0 10.0.2.15 promisc up. Add your pre-existing network interface (
eth0 in this example) to the bridge and remove its direct association with its IP address:
brctl addif br0 eth0 ifconfig eth0 0.0.0.0 up
Any interface added to the bridge
br0 will respond to that IP address. Finally, ensure that your default route sends packets to your gateway with
route add -net default gw 10.0.2.2 br0. Later, when you configure the container, you specify
br0 as a link to the outside world.
Populate container filesystems
In addition to networking, containers often need their own filesystem. There are several methods to populate a container filesystem depending on your needs. I’ll discuss two:
- Building a custom Debian container
- Running an ssh container
Building a custom Debian container is rather simple using the
debootstrap sid rootfs https://debian.osuosl.org/debian/
If you’re making a large number of containers, you may find it saves time to download the packages into a tarball first like so:
debootstrap --make-tarball sid.packages.tgz sid https://debian.osuosl.org/debian/. As an example, this produces a .tar file that is about 71MB in size (52MB compressed) while a root directory consumes nearly 200MB. Then to start building the root directory in rootfs:
debootstrap --unpack-tarball sid.packages.tgz sid rootfs. (The
debootstrap manpage has more information on building smaller or more suitable containers.)
This will result in an environment (see resources on the right side) that is highly redundant with respect to the host container.
Running an ssh container lets you dramatically reduce the disk space unique to a container’s filesystem. For example, this method uses mere kilobytes to enable running multiple ssh daemons on port 22 of different containers (see resources on the right side) for an example). The container does this by using read-only bind mounts of the critical root directories such as /bin, /sbin, /lib, etc. to share the sshd package contents from the existing Linux system. A network namespace is used, and barebones read-write contents are created.
The techniques used to generate such lightweight containers are primarily those used to generate chroot environments. The difference lies in the read-only bind mounts and the use of namespaces to enhance the isolation of the chroot environment to the point that it becomes an effective container.
Next you need to select a method for connecting to the container.
Connect to a container
Connecting to your container is the next step. Several methods are available depending on how you choose to configure your container:
- VNC (GUI)
- VT: tty (text)
- VT: X (GUI)
Connecting via SSH is good if you do not need a GUI interface to your container. In this case, a simple ssh connection may suffice (see ” Running an ssh container” above). This method has the benefit of relying on IP addressing to enable the creation of nearly arbitrary numbers of containers.
If your ssh connection takes a long time to reach the password prompt, the Avahi multicast DNS/Service Discovery daemon may be timing out during DNS lookups.
Connecting via Virtual Network Computing (VNC) lets you add a GUI interface for your container.
Use vnc4server to start an X server that serves only VNC clients. You will need to have vnc4server installed to run it from the /etc/rc.local file of your container (like so):
echo '/usr/bin/vnc4server :0 -geometry 1024x768 -depth 24' >> rootfs/etc/rc.local. This creates an X display with 1024-by-768 resolution and 24-bit color when the container starts. Then connecting is as simple as:
Connecting via VT: tty (text) is useful if your container shares ttys with its host. In this case, you may use Linux Virtual Terminals (VT) to connect to your container. The simplest use of VTs starts a login on one of these tty devices, which tend to correspond with Linux VTs. The login process is called
getty. To use VT 8:
echo '8:2345:respawn:/sbin/getty 38400 tty8'
Once the container is started, it will run
getty on tty8, allowing users to log in to the container. You can use a similar trick to restart the container using the LXC tools.
This technique does not enable a graphical interface to the container. Furthermore, since only one process at a time can attach to tty8, further configuration would be needed to enable multiple containers.
Connecting via VT: X allows you to run a GUI. To run the GNOME Display Manager (gdm) on VT 9, then edit rootfs/usr/share/gdm/defaults.conf, replacing
While this enables a graphical interface, it still uses one of a limited number of Linux virtual terminals.
Running LXC tools
Now that you are running a suitable kernel, have installed the LXC utilities, and have a working environment, it’s time to learn to manage instances of that environment. (Hint: Much of this is covered in greater detail in the LXC README.)
LXC uses the cgroup filesystem to manage containers. You must first mount this filesystem before using LXC:
mount -t cgroup cgroup /cgroup. You may mount the cgroup filesystem anywhere you like. LXC will use the first cgroup filesystem mounted in /etc/mtab.
The rest of this article shows you some LXC basics and some miscellaneous points and discusses low-level access.
The LXC basics
For the basics in using LXC tools, we’ll look at:
- Creating a container
- Getting information about (or listing) existing containers
- Starting system and application containers
- Signalling processes running in a container
- Pausing, resuming, stopping, and destroying a container
Creating a container associates a name with a configuration file. The name will be used to manage a single container:
lxc-create -n name -f configfile
This allows multiple containers to simultaneously use the same configuration file. Within the configuration file, you specify attributes of the container such as its host name, networking, root filesystem, and fstab. After running the lxc-sshd script (which creates a configuration for you), the ssh container configuration looks like:
lxc.utsname = my_ssh_container lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.ipv4 = 10.0.2.16/24 lxc.network.name = eth0 lxc.mount = ./fstab lxc.rootfs = ./rootfs
Regardless of the configuration file, containers started with the LXC tools have their own view of the processes in the system, their own mount tree, and their own view of interprocess communication (IPC) resources available.
Apart from these, when a container starts, any type of resource omitted from the configuration is assumed to be shared with the host. This allows administrators to compactly specify the critical differences between the container and its host and enables portability of the configurations.
Listing information about existing containers is crucial to managing them. To show the state of a specific container:
lxc-info -n name
To show the processes that are part of a container:
Starting LXC differentiates between two types of containers: system and application containers. System containers resemble virtual machines. In contrast to true virtualization, they have lower overhead at the cost of decreased isolation. This is a direct consequence of the fact that the same Linux kernel is utilized by every container. To resemble a virtual machine, a system container starts at the same place a Linux distribution starts: by running the init program:
lxc-start -n name init
In contrast to a system container, an application container only creates separate namespaces needed to isolate a single application. To start an application container:
lxc-execute -n name cmd
Signalling To send a signal to all processes running inside a container:
lxc-kill -n name -s SIGNAL
Pausing Pausing a container is conceptually similar to sending the
SIGSTOP signal to all the processes in a container. However, sending spurious
SIGSTOP signals can confuse some programs. So LXC uses the Linux process freezer available through the cgroup interface:
lxc-freeze -n name
Resuming To resume a frozen container:
lxc-unfreeze -n name
Stopping Stopping a container causes all the processes started in the container to die and cleans up the container:
lxc-stop -n name
Destroying Destroying the container removes the configuration files and metadata associated with the name by the
lxc-destroy -n name
Here are a few miscellaneous activities (some related to monitoring) you might like to know.
To view and adjust the priority of a container:
lxc-priority -n name lxc-priority -n name -p priority
To continually watch state and priority changes of a container:
lxc-monitor -n name
Hit Ctrl-C to stop monitoring the container.
You may also wait for a container to enter one of a set of states separated by
lxc-wait -n name -s states
To wait for all states except
lxc-wait -n name -s 'STOPPED|STARTING|STOPPING|ABORTING|FREEZING|FROZEN'
This will, of course, return immediately. Barring unforeseen errors, you should expect
lxc-wait to return only when the container has entered the given state.
LXC uses the cgroup filesystem to manage containers. It’s possible to read and manipulate parts of the cgroup filesystem through LXC. Managing cpu usage of each container can be done by reading and adjusting the cpu.shares of the container like so:
lxc-cgroup -n name cpu.shares lxc-cgroup -n name cpu.shares howmany
Now that this basic guide has shown you how to get started with Linux Containers tools, you can start crafting your own effective resources partitions.
This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.