LXC 1.0: Troubleshooting and debugging [10/10]

This is post 10 out of 10 in the LXC 1.0 blog post series.

Logging

Most LXC commands take two options:

  • -o, –logfile=FILE: Location of the logfile (defaults to stder)
  • -l, –logpriority=LEVEL: Log priority (defaults to ERROR)

The valid log priorities are:

  • FATAL
  • ALERT
  • CRIT
  • ERROR
  • WARN
  • NOTICE
  • INFO
  • DEBUG
  • TRACE

FATAL, ALERT and CRIT are mostly unused at this time, ERROR is pretty common and so are the others except for TRACE. If you want to see all possible log entries, set the log priority to TRACE.

There are also two matching configuration options which you can put in your container’s configuration:

  • lxc.logfile
  • lxc.loglevel

They behave exactly like their command like counterparts. However note that if the command line options are passed, any value set in the configuration will be ignored and instead will be overridden by those passed by the user.

When reporting a bug against LXC, it’s usually a good idea to attach a log of the container’s action with a logpriority of at least DEBUG.

API debugging

When debugging a problem using the API it’s often a good idea to try and re-implement the failing bit of code in C using liblxc directly, that helps get the binding out of the way and usually leads to cleaner stack traces and easier bug reports.

It’s also useful to set lxc.loglevel to DEBUG using set_config_item on your container so you can get a log of what LXC is doing.

Testing

Before digging to deep into an issue with the code you are working on, it’s usually a good idea to make sure that LXC itself is behaving as it should on your machine.

To check that, run “lxc-checkconfig” and look for any missing kernel feature, if all looks good, then install (or build) the tests. In Ubuntu, those are shipped in a separate “lxc-tests” package. Most of those tests are expecting to be run on an Ubuntu system (patches welcome…) but should do fine on any distro that’s compatible with the lxc-ubuntu template.

Run each of the lxc-test-* binaries as root and note any failure. Note that it’s possible that they leave some cruft behind on failure, if so, please cleanup any of those leftover containers before processing to the next test as unfortunately that cruft may cause failure by itself.

Reporting bugs

The primary LXC bug tracker is available at: https://github.com/lxc/lxc/issues

You may also report bugs directly through the distributions (though it’s often preferred to still file an upstream bug and then link the two), for example for Ubuntu, LXC bugs are tracked here: https://bugs.launchpad.net/ubuntu/+source/lxc

If you’ve already done some work tracking down the bug, you may also directly contact us on our mailing-list (see below).

Sending patches

We always welcome contributions and are very happy to have such an active development community around LXC (Over 60 people contributed to LXC 1.0). We don’t have many rules governing contributions, we just ask that your contributions be properly licensed and that you own the copyright on the code you are sending us (and indicate so by putting a Signed-off-by line in your commit).

As for the licensing, anything which ends up in the library (liblxc) or its bindings must be LGPLv2.1+ or compatible with it and not adding any additional restriction. Standalone binaries and scripts can either be LGLPv2.1+ (the project default) or GPLv2. If unsure, LGPLv2.1+ is usually a safe bet for any new file in LXC.

Patches may be sent using two different ways:

  • Inline to the lxc-devel@lists.linuxcontainers.org (using git send-email or similar)
  • Using a pull request on github (we will then grab the .patch URL and treat it as if they were e-mails sent to our list)

Getting in touch with us

The primary way of contacting the upstream LXC team is through our mailing-lists. We have two, one for LXC development and one for LXC users questions:

For more real-time discussion, you can also find a lot of LXC users and most of the developers in #lxcontainers on irc.freenode.net.

Final notes

So this is my final blog post before LXC 1.0 is finally released. We’re currently at rc3 with an rc4 coming a bit later today and a final release scheduled for tomorrow evening or Thursday morning.

I hope you have enjoyed this blog post series and that it’ll be a useful reference for people deploying LXC 1.0.

Posted in Canonical voices, LXC, Planet Ubuntu | Tagged | 10 Comments

Your own Ubuntu Touch image server

Ubuntu Touch images

For those not yet familiar with this, Ubuntu Touch systems are setup using a read-only root filesystem on top of which writable paths are mounted using bind-mounts from persistent or ephemeral storage.

The default update mechanism is therefore image based. We build new images on our build infrastructure, generate diffs between images and publish the result on the public server.

Each image is made of a bunch of xz compreseed tarballs, the actual number of tarballs may vary, so can their name. At the end of the line, the upgrader simply mounts the partitions and unpacks the tarball in the order it’s given them. It has a list of files to remove and the rest of the files are simply unpacked on top of the existing system.

Delta images only contain the files that are different from the previous image, full images contain them all. Partition images are stored in binary format in a partitions/ directory which the upgrader checks and flashes automatically.

The current list of tarballs we tend to use for the official images are:

  • ubuntu: Ubuntu root filesystem (common to all devices)
  • device: Device specific data (partition images and Android image)
  • custom: Customization tarball (applied on top of the root filesystem in /custom)
  • version: Channel/device/build metadata

For more details on how this all works, I’d recommend reading our wiki pages which act as the go-to specification for the server, clients and upgrader.

Running a server

There are a lot of reasons why you may want to run your own system-image server but the main ones seem to be:

  • Supporting your own Ubuntu Touch port with over-the-air updates
  • Publishing your own customized version of an official image
  • QA infrastructure for Ubuntu Touch images
  • Using it as an internal buffer/mirror for your devices

Up until now, doing this was pretty tricky as there wasn’t an easy way to import files from the public system-image server into a local one nor was there a simple way to replace the official GPG keys by your own (which would result in your updates to be considered invalid).

This was finally resolved on Friday when I landed the code for a few new file generators in the main system-image server branch.

It’s now reasonably easy to setup your own server, have it mirror some bits from the main public server, swap GPG keys and include your local tarballs.

Before I start with step by step instructions, please note that due to bug 1278589, you need a valid SSL certificate (https) on your server. This may be a problem to some porters who don’t have a separate IP for their server or can’t afford an SSL certificate. We plan on having this resolved in the system-image client soon.

Installing your server

Those instructions have been tried on a clean Ubuntu 13.10 cloud instance, it assumes that you are running them as an “ubuntu” user with “/home/ubuntu” as its home directory.

Install some required packages:

sudo apt-get install -y bzr abootimg android-tools-fsutils 
    python-gnupg fakeroot pxz pep8 pyflakes python-mock apache2

You’ll need a fair amount of available entropy to generate all the keys used by the test suite and production server. If you are doing this for testing only and don’t care much about getting strong keys, you may want to install “haveged” too.

Then setup the web server:

sudo adduser $USER www-data
sudo chgrp www-data /var/www/
sudo chmod g+rwX /var/www/
sudo rm -f /var/www/index.html
newgroups www-data

That being done, now let’s grab the server code, generate some keys and run the testsuite:

bzr branch lp:~ubuntu-system-image/ubuntu-system-image/server system-image
cd system-image
tests/generate-keys
tests/run
cp -R tests/keys/*/ secret/gpg/keys/
bin/generate-keyrings

Now all you need is some configuration. We’ll define a single “test” channel which will contain a single device “mako” (nexus4). It’ll mirror both the ubuntu and device tarball from the main public server (using the trusty-proposed channel over there), repack the device tarball to swap the GPG keys, then download a customization tarball from an http server, stack a keyring tarball (overriding the keys in the ubuntu tarball) and finally generating a version tarball. This channel will contain up to 15 images and will start at image ID “1”.

Doing all this can be done with that bit of configuration (you’ll need to change your server’s FQDN accordingly) in etc/config:

[global]
base_path = /home/ubuntu/system-image/
channels = test
gpg_key_path = secret/gpg/keys/
gpg_keyring_path = secret/gpg/keyrings/
publish_path = /var/www/
state_path = state/
public_fqdn = system-image.test.com
public_http_port = 80
public_https_port = 443

[channel_test]
type = auto
versionbase = 1
fullcount = 15
files = ubuntu, device, custom-savilerow, keyring, version
file_ubuntu = remote-system-image;https://system-image.ubuntu.com;trusty-proposed;ubuntu
file_device = remote-system-image;https://system-image.ubuntu.com;trusty-proposed;device;keyring=archive-master
file_custom-savilerow = http;https://jenkins.qa.ubuntu.com/job/savilerow-trusty/lastSuccessfulBuild/artifact/build/custom.tar.xz;name=custom-savilerow,monitor=https://jenkins.qa.ubuntu.com/job/savilerow-trusty/lastSuccessfulBuild/artifact/build/build_number
file_keyring = keyring;archive-master
file_version = version

Lastly we need to actual create the channel and device in the server, this is done by calling “bin/si-shell” and then doing:

pub.create_channel("test")
pub.create_device("test", "mako")
for keyring in ("archive-master", "image-master", "image-signing", "blacklist"):
    pub.publish_keyring(keyring)

And that’s it! Your server is now ready to use.
To generate your first image, simply run “bin/import-images”.
This will take a while as it’ll need to download files from those external servers, repack some bits but once it’s done, you’ll have a new image published.

You’ll probably want to run that command from cron every few minutes so that whenever any of the referenced files change a new image is generated and published (deltas will also be automatically generated).

To look at the result of the above, I have setup a server here: https://phablet.stgraber.org

To use that server, you’d flash using: phablet-flash ubuntu-system –alternate-server phablet.stgraber.org –channel test

Posted in Canonical voices, Planet Ubuntu, Ubuntu Touch | Tagged | 5 Comments

LXC 1.0: GUI in containers [9/10]

This is post 9 out of 10 in the LXC 1.0 blog post series.

Some starting notes

This post uses unprivileged containers, this isn’t an hard requirement but makes a lot of sense for GUI applications. Besides, since you followed the whole series, you have those setup anyway, right?

I’ll be using Google Chrome with the Google Talk and Adobe Flash plugins as “hostile” piece of software that I do not wish to allow to run directly on my machine.
Here are a few reasons why:

  • Those are binaries I don’t have source for.
    (That alone is usually enough for me to put an application in a container)
  • Comes from an external (non-Ubuntu) repository which may then push anything they wish to my system.
    (Could be restricted with careful apt pining)
  • Installs a daily cronjob which will re-add said repository and GPG keys should I for some reason choose to remove them.
    (Apparently done to have the repository re-added after dist-upgrades)
  • Uses a setuid wrapper to setup their sandbox. Understandable as they switch namespaces and user namespaces aren’t widespread yet.
    (This can be worked around by turning off the sandbox. The code for the sandbox is also available from the chromium project, though there’s no proof that the binary build doesn’t have anything added to it)
  • I actually need to use those piece of software because of my work environment and because the web hasn’t entirely moved away from flash yet…

While what I’ll be describing below is a huge step up for security in my opinion, it’s still not ideal and a few compromises have to be made to make those software even work. Those are basically access to:

  • pulseaudio: possibly recording you
  • X11: possibly doing key logging or taking pictures of your screen
  • dri/snd devices: can’t think of anything that’s not already covered by the first two, but I’m sure someone with a better imagination than mine will come up with something

But there’s only so much you can do while still having the cool features, so the best you can do is make sure never to run the container while doing sensitive work.

Running Google chrome in a container

So now to the actual fun. The plan is rather simple, I want a simple container, running a stable, well supported, version of Ubuntu, in there I’ll install Google Chrome and any plugin I need, then integrate it with my desktop.

First of all, let’s get ourselves an Ubuntu 12.04 i386 container as that’s pretty well supported by most ISVs:

lxc-create -t download -n precise-gui -- -d ubuntu -r precise -a i386

Let’s tweak the configuration a bit by adding the following to ~/.local/share/lxc/precise-gui/config (replacing USERNAME appropriately):

lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file

lxc.hook.pre-start = /home/USERNAME/.local/share/lxc/precise-gui/setup-pulse.sh

Still in that same config file, you’ll have to replace your existing (or similar):

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

By something like (this assume your user’s uid/gid is 1000/1000):

lxc.id_map = u 0 100000 1000
lxc.id_map = g 0 100000 1000
lxc.id_map = u 1000 1000 1
lxc.id_map = g 1000 1000 1
lxc.id_map = u 1001 101001 64535
lxc.id_map = g 1001 101001 64535

So those mappings basically mean that your container has 65536 uids and gids mapped to it, starting at 0 up to 65535 in the container. Those are mapped to host ids 100000 to 165535 with one exception, uid and gid 1000 isn’t translated. That trick is needed so that your user in the container can access the X socket, pulseaudio socket and DRI/snd devices just as your own user can (this saves us a whole lot of configuration on the host).

Now that we’re done with the config file, let’s create that setup-pulse.sh script:

#!/bin/sh
PULSE_PATH=$LXC_ROOTFS_PATH/home/ubuntu/.pulse_socket

if [ ! -e "$PULSE_PATH" ] || [ -z "$(lsof -n $PULSE_PATH 2>&1)" ]; then
    pactl load-module module-native-protocol-unix auth-anonymous=1 
        socket=$PULSE_PATH
fi

Make sure the file is executable or LXC will ignore it!
That script is fairly simple, it’ll simply tell pulseaudio on the host to bind /home/ubuntu/.pulse_socket in the container, checking that it’s not already setup.

Finally, run the following to fix the permissions of your container’s home directory:

sudo chown -R 1000:1000 ~/.local/share/lxc/precise-gui/rootfs/home/ubuntu

And that’s all that’s needed in the LXC side. So let’s start the container and install Google Chrome and the Google Talk plugin in there:

lxc-start -n precise-gui -d
lxc-attach -n precise-gui -- umount /tmp/.X11-unix
lxc-attach -n precise-gui -- apt-get update
lxc-attach -n precise-gui -- apt-get dist-upgrade -y
lxc-attach -n precise-gui -- apt-get install wget ubuntu-artwork dmz-cursor-theme ca-certificates pulseaudio -y
lxc-attach -n precise-gui -- wget https://dl.google.com/linux/direct/google-chrome-stable_current_i386.deb -O /tmp/chrome.deb
lxc-attach -n precise-gui -- wget https://dl.google.com/linux/direct/google-talkplugin_current_i386.deb -O /tmp/talk.deb
lxc-attach -n precise-gui -- dpkg -i /tmp/chrome.deb /tmp/talk.deb
lxc-attach -n precise-gui -- apt-get -f install -y
lxc-attach -n precise-gui -- sudo -u ubuntu mkdir -p /home/ubuntu/.pulse/
echo "disable-shm=yes" | lxc-attach -n precise-gui -- sudo -u ubuntu tee /home/ubuntu/.pulse/client.conf
lxc-stop -n precise-gui

At this point, everything you need is installed in the container.
To make your life easier, create the following launcher script, let’s call it “start-chrome” and put it in the container’s configuration directory (next to config and setup-pulse.sh):

#!/bin/sh
CONTAINER=precise-gui
CMD_LINE="google-chrome --disable-setuid-sandbox $*"

STARTED=false

if ! lxc-wait -n $CONTAINER -s RUNNING -t 0; then
    lxc-start -n $CONTAINER -d
    lxc-wait -n $CONTAINER -s RUNNING
    STARTED=true
fi

PULSE_SOCKET=/home/ubuntu/.pulse_socket

lxc-attach --clear-env -n $CONTAINER -- sudo -u ubuntu -i 
    env DISPLAY=$DISPLAY PULSE_SERVER=$PULSE_SOCKET $CMD_LINE

if [ "$STARTED" = "true" ]; then
    lxc-stop -n $CONTAINER -t 10
fi

Make sure the script is executable or the next step won’t work. This script will check if the container is running, if not, start it (and remember it did), then spawn google-chrome with the right environment set (and disabling its built-in sandbox as for some obscure reasons, it dislikes user namespaces), once google-chrome exits, the container is stopped.

To make things shinier, you may now also create ~/.local/share/applications/google-chrome.desktop containing:

[Desktop Entry]
Version=1.0
Name=Google Chrome
Comment=Access the Internet
Exec=/home/USERNAME/.local/share/lxc/precise-gui/start-chrome %U
Icon=/home/USERNAME/.local/share/lxc/precise-gui/rootfs/opt/google/chrome/product_logo_256.png
Type=Application
Categories=Network;WebBrowser;

Don’t forget to replace USERNAME to your own username so that both paths are valid.

And that’s it! You should now find a Google Chrome icon somewhere in your desktop environment (menu, dash, whatever…). Clicking on it will start Chrome which can be used pretty much as usual, when closed, the container will shutdown.
You may want to setup extra symlinks or bind-mount to make it easier to access things like the Downloads folder but that really depends on what you’re using the container for.

Chrome running in LXC

Obviously, the same process can be used for many different piece of software.

Skype

Quite a few people have contacted me asking about running Skype in that same container. I won’t give you a whole step by step guide as the one for Chrome cover 99% of what you need to do for Skype.

However there are two tricks you need to be aware of to get Skype to work properly:

  • Set “QT_X11_NO_MITSHM” to “1”
    (otherwise you get a blank window as it tries to use shared memory)
  • Set “GNOME_DESKTOP_SESSION_ID” to “this-is-deprecated”
    (otherwise you get an ugly Qt theme)

Those two should be added after the “env” in the launcher script you’ll write for Skype.

Apparently on some NVidia system, you may also need to set an additional environment variable (possibly useful not only for Skype):
LD_PRELOAD=/usr/lib/i386-linux-gnu/mesa/libGL.so.1

Steam

And finally, yet another commonly asked one, Steam.

That one actually doesn’t require anything extra in its environment, just grab the .deb, install it in the container, run an “apt-get -f install” to install any remaining dependency, create a launcher script and .desktop and you’re done.
I’ve been happily playing a few games (thanks to Valve giving those to all Ubuntu and Debian developers) without any problem so far.

Posted in Canonical voices, LXC, Planet Ubuntu | Tagged | 70 Comments

LXC 1.0: Scripting with the API [8/10]

This is post 8 out of 10 in the LXC 1.0 blog post series.

The API

The first version of liblxc was introduced in LXC 0.9 but it was very much at an experimental state. LXC 1.0 however will ship with a much more complete API, covering all of LXC’s features. We’ve actually been rebasing all of our tools (lxc-*) to using that API rather than doing direct calls to the internal functions.

The API also comes with a whole set of tests which we run as part of our continuous integration setup and before distro uploads.

There are also quite a few bindings for those who don’t feel like writing C, we have lua and python3 bindings in-tree upstream and there are official out-of-tree bindings for Go and ruby.

The API documentation can be found at:
https://linuxcontainers.org/lxc/documentation/

It’s not necessarily the most readable API documentation ever and certainly could do with some examples, especially for the bindings, but it does cover all functions that are exported over the API. Any help improving our API documentation is very much welcome!

The basics

So let’s start with a very simple example of the LXC API using C, the following example will create a new container struct called “apicontainer”, create a root filesystem using the new download template, start the container, print its state and PID number, then attempt a clean shutdown before killing it.

#include <stdio.h>

#include <lxc/lxccontainer.h>

int main() {
    struct lxc_container *c;
    int ret = 1;

    /* Setup container struct */
    c = lxc_container_new("apicontainer", NULL);
    if (!c) {
        fprintf(stderr, "Failed to setup lxc_container struct
");
        goto out;
    }

    if (c->is_defined(c)) {
        fprintf(stderr, "Container already exists
");
        goto out;
    }

    /* Create the container */
    if (!c->createl(c, "download", NULL, NULL, LXC_CREATE_QUIET,
                    "-d", "ubuntu", "-r", "trusty", "-a", "i386", NULL)) {
        fprintf(stderr, "Failed to create container rootfs
");
        goto out;
    }

    /* Start the container */
    if (!c->start(c, 0, NULL)) {
        fprintf(stderr, "Failed to start the container
");
        goto out;
    }

    /* Query some information */
    printf("Container state: %s
", c->state(c));
    printf("Container PID: %d
", c->init_pid(c));

    /* Stop the container */
    if (!c->shutdown(c, 30)) {
        printf("Failed to cleanly shutdown the container, forcing.
");
        if (!c->stop(c)) {
            fprintf(stderr, "Failed to kill the container.
");
            goto out;
        }
    }

    /* Destroy the container */
    if (!c->destroy(c)) {
        fprintf(stderr, "Failed to destroy the container.
");
        goto out;
    }

    ret = 0;
out:
    lxc_container_put(c);
    return ret;
}

So as you can see, it’s not very difficult to use, most functions are fairly straightforward and error checking is pretty simple (most calls are boolean and errors are printed to stderr by LXC depending on the loglevel).

Python3 scripting

As much fun as C may be, I usually like to script my containers and C isn’t really the best language for that. That’s why I wrote and maintain the official python3 binding.

The equivalent to the example above in python3 would be:

import lxc
import sys

# Setup the container object
c = lxc.Container("apicontainer")
if c.defined:
    print("Container already exists", file=sys.stderr)
    sys.exit(1)

# Create the container rootfs
if not c.create("download", lxc.LXC_CREATE_QUIET, {"dist": "ubuntu",
                                                   "release": "trusty",
                                                   "arch": "i386"}):
    print("Failed to create the container rootfs", file=sys.stderr)
    sys.exit(1)

# Start the container
if not c.start():
    print("Failed to start the container", file=sys.stderr)
    sys.exit(1)

# Query some information
print("Container state: %s" % c.state)
print("Container PID: %s" % c.init_pid)

# Stop the container
if not c.shutdown(30):
    print("Failed to cleanly shutdown the container, forcing.")
    if not c.stop():
        print("Failed to kill the container", file=sys.stderr)
        sys.exit(1)

# Destroy the container
if not c.destroy():
    print("Failed to destroy the container.", file=sys.stderr)
    sys.exit(1)

Now for that specific example, python3 isn’t that much simpler than the C equivalent.

But what if we wanted to do something slightly more tricky, like iterating through all existing containers, start them (if they’re not already started), wait for them to have network connectivity, then run updates and shut them down?

import lxc
import sys

for container in lxc.list_containers(as_object=True):
    # Start the container (if not started)
    started=False
    if not container.running:
        if not container.start():
            continue
        started=True

    if not container.state == "RUNNING":
        continue

    # Wait for connectivity
    if not container.get_ips(timeout=30):
        continue

    # Run the updates
    container.attach_wait(lxc.attach_run_command,
                          ["apt-get", "update"])
    container.attach_wait(lxc.attach_run_command,
                          ["apt-get", "dist-upgrade", "-y"])

    # Shutdown the container
    if started:
        if not container.shutdown(30):
            container.stop()

The most interesting bit in the example above is the attach_wait command, which basically lets your run a standard python function in the container’s namespaces, here’s a more obvious example:

import lxc

c = lxc.Container("p1")
if not c.running:
    c.start()

def print_hostname():
    with open("/etc/hostname", "r") as fd:
        print("Hostname: %s" % fd.read().strip())

# First run on the host
print_hostname()

# Then on the container
c.attach_wait(print_hostname)

if not c.shutdown(30):
    c.stop()

And the output of running the above:

stgraber@castiana:~$ python3 lxc-api.py
/home/stgraber/<frozen>:313: Warning: The python-lxc API isn't yet stable and may change at any point in the future.
Hostname: castiana
Hostname: p1

It may take you a little while to wrap your head around the possibilities offered by that function, especially as it also takes quite a few flags (look for LXC_ATTACH_* in the C API) which lets you control which namespaces to attach to, whether to have the function contained by apparmor, whether to bypass cgroup restrictions, …

That kind of flexibility is something you’ll never get with a virtual machine and the way it’s supported through our bindings makes it easier than ever to use by anyone who wants to automate custom workloads.

You can also use the API to script cloning containers and using snapshots (though for that example to work, you need current upstream master due to a small bug I found while writing this…):

import lxc
import os
import sys

if not os.geteuid() == 0:
    print("The use of overlayfs requires privileged containers.")
    sys.exit(1)

# Create a base container (if missing) using an Ubuntu 14.04 image
base = lxc.Container("base")
if not base.defined:
    base.create("download", lxc.LXC_CREATE_QUIET, {"dist": "ubuntu",
                                                   "release": "precise",
                                                   "arch": "i386"})

    # Customize it a bit
    base.start()
    base.get_ips(timeout=30)
    base.attach_wait(lxc.attach_run_command, ["apt-get", "update"])
    base.attach_wait(lxc.attach_run_command, ["apt-get", "dist-upgrade", "-y"])

    if not base.shutdown(30):
        base.stop()

# Clone it as web (if not already existing)
web = lxc.Container("web")
if not web.defined:
    # Clone base using an overlayfs overlay
    web = base.clone("web", bdevtype="overlayfs",
                     flags=lxc.LXC_CLONE_SNAPSHOT)

    # Install apache
    web.start()
    web.get_ips(timeout=30)
    web.attach_wait(lxc.attach_run_command, ["apt-get", "update"])
    web.attach_wait(lxc.attach_run_command, ["apt-get", "install",
                                             "apache2", "-y"])

    if not web.shutdown(30):
        web.stop()

# Create a website container based on the web container
mysite = web.clone("mysite", bdevtype="overlayfs",
                   flags=lxc.LXC_CLONE_SNAPSHOT)
mysite.start()
ips = mysite.get_ips(family="inet", timeout=30)
if ips:
    print("Website running at: http://%s" % ips[0])
else:
    if not mysite.shutdown(30):
        mysite.stop()

The above will create a base container using a downloaded image, then clone it using an overlayfs based overlay, add apache2 to it, then clone that resulting container into yet another one called “mysite”. So “mysite” is effectively an overlay clone of “web” which is itself an overlay clone of “base”.

 

So there you go, I tried to cover most of the interesting bits of our API with the examples above, though there’s much more available, for example, I didn’t cover the snapshot API (currently restricted to system containers) outside of the specific overlayfs case above and only scratched the surface of what’s possible to do with the attach function.

LXC 1.0 will release with a stable version of the API, we’ll be doing additions in the next few 1.x versions (while doing bugfix only updates to 1.0.x) and hope not to have to break the whole API for quite a while (though we’ll certainly be adding more stuff to it).

Posted in Canonical voices, LXC, Planet Ubuntu | Tagged | 21 Comments

LXC 1.0: Unprivileged containers [7/10]

This is post 7 out of 10 in the LXC 1.0 blog post series.

Introduction to unprivileged containers

The support of unprivileged containers is in my opinion one of the most important new features of LXC 1.0.

You may remember from previous posts that I mentioned that LXC should be considered unsafe because while running in a separate namespace, uid 0 in your container is still equal to uid 0 outside of the container, meaning that if you somehow get access to any host resource through proc, sys or some random syscalls, you can potentially escape the container and then you’ll be root on the host.

That’s what user namespaces were designed for and implemented. It was a multi-year effort to think them through and slowly push the hundreds of patches required into the upstream kernel, but finally with 3.12 we got to a point where we can start a full system container entirely as a user.

So how do those user namespaces work? Well, simply put, each user that’s allowed to use them on the system gets assigned a range of unused uids and gids, ideally a whole 65536 of them. You can then use those uids and gids with two standard tools called newuidmap and newgidmap which will let you map any of those uids and gids to virtual uids and gids in a user namespace.

That means you can create a container with the following configuration:

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

The above means that I have one uid map and one gid map defined for my container which will map uids and gids 0 through 65536 in the container to uids and gids 100000 through 165536 on the host.

For this to be allowed, I need to have those ranges assigned to my user at the system level with:

stgraber@castiana:~$ grep stgraber /etc/sub* 2>/dev/null
/etc/subgid:stgraber:100000:65536
/etc/subuid:stgraber:100000:65536

LXC has now been updated so that all the tools are aware of those unprivileged containers. The standard paths also have their unprivileged equivalents:

  • /etc/lxc/lxc.conf => ~/.config/lxc/lxc.conf
  • /etc/lxc/default.conf => ~/.config/lxc/default.conf
  • /var/lib/lxc => ~/.local/share/lxc
  • /var/lib/lxcsnaps => ~/.local/share/lxcsnaps
  • /var/cache/lxc => ~/.cache/lxc

Your user, while it can create new user namespaces in which it’ll be uid 0 and will have some of root’s privileges against resources tied to that namespace will obviously not be granted any extra privilege on the host.

One such thing is creating new network devices on the host or changing bridge configuration. To workaround that, we wrote a tool called “lxc-user-nic” which is the only SETUID binary part of LXC 1.0 and which performs one simple task.
It parses a configuration file and based on its content will create network devices for the user and bridge them. To prevent abuse, you can restrict the number of devices a user can request and to what bridge they may be added.

An example is my own /etc/lxc/lxc-usernet file:

stgraber veth lxcbr0 10

This declares that the user “stgraber” is allowed up to 10 veth type devices to be created and added to the bridge called lxcbr0.

Between what’s offered by the user namespace in the kernel and that setuid tool, we’ve got all that’s needed to run most distributions unprivileged.

Pre-requirements

All examples and instructions I’ll be giving below are expecting that you are running a perfectly up to date version of Ubuntu 14.04 (codename trusty). That’s a pre-release of Ubuntu so you may want to run it in a VM or on a spare machine rather than upgrading your production computer.

The reason to want something that recent is because the rough requirements for well working unprivileged containers are:

  • Kernel: 3.13 + a couple of staging patches (which Ubuntu has in its kernel)
  • User namespaces enabled in the kernel
  • A very recent version of shadow that supports subuid/subgid
  • Per-user cgroups on all controllers (which I turned on a couple of weeks ago)
  • LXC 1.0 beta2 or higher (released two days ago)
  • A version of PAM with a loginuid patch that’s yet to be in any released version

Those requirements happen to all be true of the current development release of Ubuntu as of two days ago.

LXC pre-built containers

User namespaces come with quite a few obvious limitations. For example in a user namespace you won’t be allowed to use mknod to create a block or character device as being allowed to do so would let you access anything on the host. Same thing goes with some filesystems, you won’t for example be allowed to do loop mounts or mount an ext partition, even if you can access the block device.

Those limitations while not necessarily world ending in day to day use are a big problem during the initial bootstrap of a container as tools like debootstrap, yum, … usually try to do some of those restricted actions and will fail pretty badly.

Some templates may be tweaked to work and workaround such as a modified fakeroot could be used to bypass some of those limitations but the goal of the LXC project isn’t to require all of our users to be distro engineers, so we came up with a much simpler solution.

I wrote a new template called “download” which instead of assembling the rootfs and configuration locally will instead contact a server which contains daily pre-built rootfs and configuration for most common templates.

Those images are built from our Jenkins server using a few machines I have on my home network (a set of powerful x86 builders and a quadcore ARM board). The actual build process is pretty straightforward, a basic chroot is assembled, then the current git master is downloaded, built and the standard templates are run with the right release and architecture, the resulting rootfs is compressed, a basic config and metadata (expiry, files to template, …) is saved, the result is pulled by our main server, signed with a dedicated GPG key and published on the public web server.

The client side is a simple template which contacts the server over https (the domain is also DNSSEC enabled and available over IPv6), grabs signed indexes of all the available images, checks if the requested combination of distribution, release and architecture is supported and if it is, grabs the rootfs and metadata tarballs, validates their signature and stores them in a local cache. Any container creation after that point is done using that cache until the time the cache entries expires at which point it’ll grab a new copy from the server.

The current list of images is (as can be requested by passing –list):

---
DIST      RELEASE   ARCH    VARIANT    BUILD
---
debian    wheezy    amd64   default    20140116_22:43
debian    wheezy    armel   default    20140116_22:43
debian    wheezy    armhf   default    20140116_22:43
debian    wheezy    i386    default    20140116_22:43
debian    jessie    amd64   default    20140116_22:43
debian    jessie    armel   default    20140116_22:43
debian    jessie    armhf   default    20140116_22:43
debian    jessie    i386    default    20140116_22:43
debian    sid       amd64   default    20140116_22:43
debian    sid       armel   default    20140116_22:43
debian    sid       armhf   default    20140116_22:43
debian    sid       i386    default    20140116_22:43
oracle    6.5       amd64   default    20140117_11:41
oracle    6.5       i386    default    20140117_11:41
plamo     5.x       amd64   default    20140116_21:37
plamo     5.x       i386    default    20140116_21:37
ubuntu    lucid     amd64   default    20140117_03:50
ubuntu    lucid     i386    default    20140117_03:50
ubuntu    precise   amd64   default    20140117_03:50
ubuntu    precise   armel   default    20140117_03:50
ubuntu    precise   armhf   default    20140117_03:50
ubuntu    precise   i386    default    20140117_03:50
ubuntu    quantal   amd64   default    20140117_03:50
ubuntu    quantal   armel   default    20140117_03:50
ubuntu    quantal   armhf   default    20140117_03:50
ubuntu    quantal   i386    default    20140117_03:50
ubuntu    raring    amd64   default    20140117_03:50
ubuntu    raring    armhf   default    20140117_03:50
ubuntu    raring    i386    default    20140117_03:50
ubuntu    saucy     amd64   default    20140117_03:50
ubuntu    saucy     armhf   default    20140117_03:50
ubuntu    saucy     i386    default    20140117_03:50
ubuntu    trusty    amd64   default    20140117_03:50
ubuntu    trusty    armhf   default    20140117_03:50
ubuntu    trusty    i386    default    20140117_03:50

The template has been carefully written to work on any system that has a POSIX compliant shell with wget. gpg is recommended but can be disabled if your host doesn’t have it (at your own risks).

The same template can be used against your own server, which I hope will be very useful for enterprise deployments to build templates in a central location and have them pulled by all the hosts automatically using our expiry mechanism to keep them fresh.

While the template was designed to workaround limitations of unprivileged containers, it works just as well with system containers, so even on a system that doesn’t support unprivileged containers you can do:

lxc-create -t download -n p1 -- -d ubuntu -r trusty -a amd64

And you’ll get a new container running the latest build of Ubuntu 14.04 amd64.

Using unprivileged LXC

Right, so let’s get you started, as I already mentioned, all the instructions below have only been tested on a very recent Ubuntu 14.04 (trusty) installation.
You may want to grab a daily build and run it in a VM.

Install the required packages:

  • sudo apt-get update
  • sudo apt-get dist-upgrade
  • sudo apt-get install lxc systemd-services uidmap

Then, assign yourself a set of uids and gids with:

  • sudo usermod --add-subuids 100000-165536 $USER
  • sudo usermod --add-subgids 100000-165536 $USER
  • sudo chmod +x $HOME

That last one is required because LXC needs it to access ~/.local/share/lxc/ after it switched to the mapped UIDs. If you’re using ACLs, you may instead use “u:100000:x” as a more specific ACL.

Now create ~/.config/lxc/default.conf with the following content:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

And /etc/lxc/lxc-usernet with:

<your username> veth lxcbr0 10

And that’s all you need. Now let’s create our first unprivileged container with:

lxc-create -t download -n p1 -- -d ubuntu -r trusty -a amd64

You should see the following output from the download template:

Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created an Ubuntu container (release=trusty, arch=amd64).
The default username/password is: ubuntu / ubuntu
To gain root privileges, please use sudo.

So looks like your first container was created successfully, now let’s see if it starts:

ubuntu@trusty-daily:~$ lxc-start -n p1 -d
ubuntu@trusty-daily:~$ lxc-ls --fancy
NAME  STATE    IPV4     IPV6     AUTOSTART  
------------------------------------------
p1    RUNNING  UNKNOWN  UNKNOWN  NO

It’s running! At this point, you can get a console using lxc-console or can SSH to it by looking for its IP in the ARP table (arp -n).

One thing you probably noticed above is that the IP addresses for the container aren’t listed, that’s because unfortunately LXC currently can’t attach to an unprivileged container’s namespaces. That also means that some fields of lxc-info will be empty and that you can’t use lxc-attach. However we’re looking into ways to get that sorted in the near future.

There are also a few problems with job control in the kernel and with PAM, so doing a non-detached lxc-start will probably result in a rather weird console where things like sudo will most likely fail. SSH may also fail on some distros. A patch has been sent upstream for this, but I just noticed that it doesn’t actually cover all cases and even if it did, it’s not in any released version yet.

Quite a few more improvements to unprivileged containers are to come until the final 1.0 release next month and while we certainly don’t expect all workloads to be possible with unprivileged containers, it’s still a huge improvement on what we had before and a very good building block for a lot more interesting use cases.

Posted in Canonical voices, LXC, Planet Ubuntu | Tagged | 111 Comments