Category Archives: LXC

Custom user mappings in LXD containers

LXD logo

Introduction

As you may know, LXD uses unprivileged containers by default.
The difference between an unprivileged container and a privileged one is whether the root user in the container is the “real” root user (uid 0 at the kernel level).

The way unprivileged containers are created is by taking a set of normal UIDs and GIDs from the host, usually at least 65536 of each (to be POSIX compliant) and mapping those into the container.

The most common example and what most LXD users will end up with by default is a map of 65536 UIDs and GIDs, with a host base id of 100000. This means that root in the container (uid 0) will be mapped to the host uid 100000 and uid 65535 in the container will be mapped to uid 165535 on the host. UID/GID 65536 and higher in the container aren’t mapped and will return an error if you attempt to use them.

From a security point of view, that means that anything which is not owned by the users and groups mapped into the container will be inaccessible. Any such resource will show up as being owned by uid/gid “-1” (rendered as 65534 or nobody/nogroup in userspace). It also means that should there be a way to escape the container, even root in the container would find itself with just as much privileges on the host as a nobody user.

LXD does offer a number of options related to unprivileged configuration:

  • Increasing the size of the default uid/gid map
  • Setting up per-container maps
  • Punching holes into the map to expose host users and groups

Increasing the size of the default map

As mentioned above, in most cases, LXD will have a default map that’s made of 65536 uids/gids.

In most cases you won’t have to change that. There are however a few cases where you may have to:

  • You need access to uid/gid higher than 65535.
    This is most common when using network authentication inside of your containers.
  • You want to use per-container maps.
    In which case you’ll need 65536 available uid/gid per container.
  • You want to punch some holes in your container’s map and need access to host uids/gids.

The default map is usually controlled by the “shadow” set of utilities and files. On systems where that’s the case, the “/etc/subuid” and “/etc/subgid” files are used to configure those maps.

On systems that do not have a recent enough version of the “shadow” package. LXD will assume that it doesn’t have to share uid/gid ranges with anything else and will therefore assume control of a billion uids and gids, starting at the host uid/gid 100000.

But the common case, is a system with a recent version of shadow.
An example of what the configuration may look like is:

stgraber@castiana:~$ cat /etc/subuid
lxd:100000:65536
root:100000:65536

stgraber@castiana:~$ cat /etc/subgid
lxd:100000:65536
root:100000:65536

The maps for “lxd” and “root” should always be kept in sync. LXD itself is restricted by the “root” allocation. The “lxd” entry is used to track what needs to be removed if LXD is uninstalled.

Now if you want to increase the size of the map available to LXD. Simply edit both of the files and bump the last value from 65536 to whatever size you need. I tend to bump it to a billion just so I don’t ever have to think about it again:

stgraber@castiana:~$ cat /etc/subuid
lxd:100000:1000000000
root:100000:1000000000

stgraber@castiana:~$ cat /etc/subgid
lxd:100000:1000000000
root:100000:100000000

After altering those files, you need to restart LXD to have it detect the new map:

root@vorash:~# systemctl restart lxd
root@vorash:~# cat /var/log/lxd/lxd.log
lvl=info msg="LXD 2.14 is starting in normal mode" path=/var/lib/lxd t=2017-06-14T21:21:13+0000
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2017-06-14T21:21:13+0000
lvl=info msg="Kernel uid/gid map:" t=2017-06-14T21:21:13+0000
lvl=info msg=" - u 0 0 4294967295" t=2017-06-14T21:21:13+0000
lvl=info msg=" - g 0 0 4294967295" t=2017-06-14T21:21:13+0000
lvl=info msg="Configured LXD uid/gid map:" t=2017-06-14T21:21:13+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2017-06-14T21:21:13+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2017-06-14T21:21:13+0000
lvl=info msg="Connecting to a remote simplestreams server" t=2017-06-14T21:21:13+0000
lvl=info msg="Expiring log files" t=2017-06-14T21:21:13+0000
lvl=info msg="Done expiring log files" t=2017-06-14T21:21:13+0000
lvl=info msg="Starting /dev/lxd handler" t=2017-06-14T21:21:13+0000
lvl=info msg="LXD is socket activated" t=2017-06-14T21:21:13+0000
lvl=info msg="REST API daemon:" t=2017-06-14T21:21:13+0000
lvl=info msg=" - binding Unix socket" socket=/var/lib/lxd/unix.socket t=2017-06-14T21:21:13+0000
lvl=info msg=" - binding TCP socket" socket=[::]:8443 t=2017-06-14T21:21:13+0000
lvl=info msg="Pruning expired images" t=2017-06-14T21:21:13+0000
lvl=info msg="Updating images" t=2017-06-14T21:21:13+0000
lvl=info msg="Done pruning expired images" t=2017-06-14T21:21:13+0000
lvl=info msg="Done updating images" t=2017-06-14T21:21:13+0000
root@vorash:~#

As you can see, the configured map is logged at LXD startup and can be used to confirm that the reconfiguration worked as expected.

You’ll then need to restart your containers to have them start using your newly expanded map.

Per container maps

Provided that you have a sufficient amount of uid/gid allocated to LXD, you can configure your containers to use their own, non-overlapping allocation of uids and gids.

This can be useful for two reasons:

  1. You are running software which alters kernel resource ulimits.
    Those user-specific limits are tied to a kernel uid and will cross container boundaries leading to hard to debug issues where one container can perform an action but all others are then unable to do the same.
  2. You want to know that should there be a way for someone in one of your containers to somehow get access to the host that they still won’t be able to access or interact with any of the other containers.

The main downsides to using this feature are:

  • It’s somewhat wasteful with using 65536 uids and gids per container.
    That being said, you’d still be able to run over 60000 isolated containers before running out of system uids and gids.
  • It’s effectively impossible to share storage between two isolated containers as everything written by one will be seen as -1 by the other. There is ongoing work around virtual filesystems in the kernel that will eventually let us get rid of that limitation.

To have a container use its own distinct map, simply run:

stgraber@castiana:~$ lxc config set test security.idmap.isolated true
stgraber@castiana:~$ lxc restart test
stgraber@castiana:~$ lxc config get test volatile.last_state.idmap
[{"Isuid":true,"Isgid":false,"Hostid":165536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":165536,"Nsid":0,"Maprange":65536}]

The restart step is needed to have LXD remap the entire filesystem of the container to its new map.
Note that this step will take a varying amount of time depending on the number of files in the container and the speed of your storage.

As can be seen above, after restart, the container is shown to have its own map of 65536 uids/gids.

If you want LXD to allocate more than the default 65536 uids/gids to an isolated container, you can bump the size of the allocation with:

stgraber@castiana:~$ lxc config set test security.idmap.size 200000
stgraber@castiana:~$ lxc restart test
stgraber@castiana:~$ lxc config get test volatile.last_state.idmap
[{"Isuid":true,"Isgid":false,"Hostid":165536,"Nsid":0,"Maprange":200000},{"Isuid":false,"Isgid":true,"Hostid":165536,"Nsid":0,"Maprange":200000}]

If you’re trying to allocate more uids/gids than are left in LXD’s allocation, LXD will let you know:

stgraber@castiana:~$ lxc config set test security.idmap.size 2000000000
error: Not enough uid/gid available for the container.

Direct user/group mapping

The fact that all uids/gids in an unprivileged container are mapped to a normally unused range on the host means that sharing of data between host and container is effectively impossible.

Now, what if you want to share your user’s home directory with a container?

The obvious answer to that is to define a new “disk” entry in LXD which passes your home directory to the container:

stgraber@castiana:~$ lxc config device add test home disk source=/home/stgraber path=/home/ubuntu
Device home added to test

So that was pretty easy, but did it work?

stgraber@castiana:~$ lxc exec test -- bash
root@test:~# ls -lh /home/
total 529K
drwx--x--x 45 nobody nogroup 84 Jun 14 20:06 ubuntu

No. The mount is clearly there, but it’s completely inaccessible to the container.
To fix that, we need to take a few extra steps:

  • Allow LXD’s use of our user uid and gid
  • Restart LXD to have it load the new map
  • Set a custom map for our container
  • Restart the container to have the new map apply
stgraber@castiana:~$ printf "lxd:$(id -u):1\nroot:$(id -u):1\n" | sudo tee -a /etc/subuid
lxd:201105:1
root:201105:1

stgraber@castiana:~$ printf "lxd:$(id -g):1\nroot:$(id -g):1\n" | sudo tee -a /etc/subgid
lxd:200512:1
root:200512:1

stgraber@castiana:~$ sudo systemctl restart lxd

stgraber@castiana:~$ printf "uid $(id -u) 1000\ngid $(id -g) 1000" | lxc config set test raw.idmap -

stgraber@castiana:~$ lxc restart test

At which point, things should be working in the container:

stgraber@castiana:~$ lxc exec test -- su ubuntu -l
ubuntu@test:~$ ls -lh
total 119K
drwxr-xr-x 5  ubuntu ubuntu 8 Feb 18 2016 data
drwxr-x--- 4  ubuntu ubuntu 6 Jun 13 17:05 Desktop
drwxr-xr-x 3  ubuntu ubuntu 28 Jun 13 20:09 Downloads
drwx------ 84 ubuntu ubuntu 84 Sep 14 2016 Maildir
drwxr-xr-x 4  ubuntu ubuntu 4 May 20 15:38 snap
ubuntu@test:~$ 

Conclusion

User namespaces, the kernel feature that makes those uid/gid mappings possible is a very powerful tool which finally made containers on Linux safe by design. It is however not the easiest thing to wrap your head around and all of that uid/gid map math can quickly become a major issue.

In LXD we’ve tried to expose just enough of those underlying features to be useful to our users while doing the actual mapping math internally. This makes things like the direct user/group mapping above significantly easier than it otherwise would be.

Going forward, we’re very interested in some of the work around uid/gid remapping at the filesystem level, this would let us decouple the on-disk user/group map from that used for processes, making it possible to share data between differently mapped containers and alter the various maps without needing to also remap the entire filesystem.

Extra information

The main LXD website is at: https://linuxcontainers.org/lxd
Development happens on Github at: https://github.com/lxc/lxd
Discussion forun: https://discuss.linuxcontainers.org
Mailing-list support happens on: https://lists.linuxcontainers.org
IRC support happens in: #lxcontainers on irc.freenode.net
Try LXD online: https://linuxcontainers.org/lxd/try-it

Posted in Canonical voices, LXC, LXD, Planet Ubuntu | Tagged | 1 Comment

LXC 2.0 has been released!

LXD logo

Introduction

Today I’m very pleased to announce the release of LXC 2.0, our second Long Term Support Release! LXC 2.0 is the result of a year of work by the LXC community with over 700 commits done by over 90 contributors!

It joins LXCFS 2.0 which was released last week and will very soon be joined by LXD 2.0 to complete our collection of 2.0 container management tools!

What’s new?

The complete changelog is linked below but the main highlights for me are:

  • More consistent user experience between the various LXC tools.
  • Improved checkpoint/restore support.
  • Complete rework of our CGroup handling code, including support for the CGroup namespace.
  • Cleaned up storage backend subsystem, including the addition of a new Ceph RBD backend.
  • A massive amount of bugfixes.
  • And lastly, we managed to get all that done without breaking our API, so LXC 2.0 is fully API compatible with LXC 1.0.

The focus with this release was stability and maintaining support for all the environments in which LXC shines. We still support all kernels from 2.6.32 though the exact feature set does obviously vary based on kernel features. We also improved support for a bunch of architectures and fixed a lot of bugs and other rough edges.

This is the release you want to run in production for the next few years!

Support length

As mentioned, LXC 2.0 is a Long Term Support release.
This is the second time we do such a release with the first being LXC 1.0.

Long Term Support releases come with a 5 years commitment from upstream to do bugfixes and security updates and release new point releases when enough fixes have accumulated.

The end of life date for the various LXC versions is as follow:

  • LXC 1.0, released February 2014 will EOL on the 1st of June 2019
  • LXC 1.1, released February 2015 will EOL on the 1st of September 2016
  • LXC 2.0, released April 2016 will EOL on the 1st of June 2021

We therefore very strongly recommend LXC 1.1 users to update to LXC 2.0 as we will not be supporting this release for very much longer.

We also recommend production deployments stick to our Long Term Support release.

Project information

Upstream website: https://linuxcontainers.org/lxc/
Release announcement: https://linuxcontainers.org/lxc/news/
Code: https://github.com/lxc/lxc
IRC channel: #lxcontainers on irc.freenode.net
Mailing-lists: https://lists.linuxcontainers.org

Try it online

Want to see what a container with LXC 2.0 installed feels like?
You can get one online to play with here.

Posted in Canonical voices, LXC, Planet Ubuntu | Tagged | 7 Comments

NorthSec 2015: behind the scenes

TLDR: NorthSec is an incredible security event, our CTF simulates a whole internet for every participating team. This allows us to create just about anything, from a locked down country to millions of vulnerable IoT devices spread across the globe. However that flexibility comes at a high cost hardware-wise, as we’re getting bigger and bigger, we need more and more powerful servers and networking gear. We’re very actively looking for sponsors so get in touch with me or just buy us something on Amazon!

What’s NorthSec?

NorthSec is one of the biggest on-site Capture The Flag (CTF), security contest in North America. It’s organized yearly over a weekend in Montreal (usually in May) and since the last edition, has been accompanied by a two days security conference before the CTF itself. The rest of this post will only focus on the CTF part though.

the-room

A view of the main room at NorthSec 2015

Teams arrive at the venue on Friday evening, get setup at their table and then get introduced to this year’s scenario and given access to our infrastructure. There they will have to fight their way through challenges, each earning them points and letting them go further and further. On Sunday afternoon, the top 3 teams are awarded their prize and we wrap up for the year.

Size wise, for the past two years we’ve had a physical limit of up to 32 teams of 8 participants and then a bunch of extra unaffiliated visitors. For the 2016 edition, we’re raising this to 50 teams for a grand total of 400 participants, thanks to some shuffling at the venue making some more room for us.

Why is it special?

The above may sound pretty simple and straightforward, however there are a few important details that sets NorthSec apart from other CTFs.

  • It is entirely on-site. There are some very big online CTFs out there but very few on-site ones. Having everyone participating in the same room is valuable from a networking point of view but also ensures fairness by enforcing fixed size teams and equal network bandwidth and latency.
  • Every team gets its very own copy of the whole infrastructure. There are no shared services in the simulated world we provide them. That means one team’s actions cannot impact another.
  • Each simulation is its own virtual world with its own instance of the internet, we use hundreds of LXC containers and thousands of VLANs and networks FOR EVERY TEAM to provide the most realistic and complete environment you can think of.
World map of our fake internet

World map of our fake internet

What’s our infrastructure like?

Due to the very high bandwidth and low latency requirements, most of the infrastructure is hosted on premises and on our hardware. We do plan on offloading Windows virtual machines to a public cloud for the next edition though.

We also provide a mostly legacy free environment to our contestants, all of our challenges are connected to IPv6-only networks and run on 64bit Ubuntu LTS  in LXC with state of the art security configurations.

Our rack

Our rack, on location at NorthSec 2015

 

All in all, for 32 teams (last year’s edition), we had:

  • 48000 virtual network interfaces
  • 2000 virtual carriers
  • 16000 BGP routers
  • 17000 Ubuntu containers
  • 100 Windows virtual machines
  • 20000 routing table entries

And all of that was running on:

  • Two firewalls (DELL SC1425)
  • Two infrastructure servers (DELL SC1425)
  • One management server (HP DL380 G5)
  • Four main contest hosts (HP DL380 G5)
  • Three backup contest hosts (DELL C6100)

On average we had 7 full simulations and 21 virtual machines running on every host (the backup hosts only had one each). That means each of the main contest hosts had:

  • 10500 virtual network interfaces
  • 435 virtual carriers
  • 3500 BGP routers
  • 3700 Ubuntu containers
  • 21 Windows virtual machines
  • 4375 routing table entries

Not too bad for servers that are (SC1425) or are getting close (DL380 G5) to being 10 years old now.

Past infrastructure challenges

In the past editions we’ve found numerous bugs in the various technologies we use when put under such a crazy load:

  • A variety of switch firmware bugs when dealing with several thousand IPv6-only networks.
  • Multiple Linux IPv6 kernel bugs (and one security issue) also related to an excess of IPv6 multicast traffic.
  • Several memory leaks and other bugs in LXC and related components that become very visible when you’re running upwards of 10000 containers.
  • Several more Linux kernel bugs related to performance scaling as we create more and more namespaces and nested namespaces.

As our infrastructure staff is very invested in these technologies by being upstream developers or contributors to the main projects we use, those bugs were all rapidly reported, discussed and fixed. We always look forward to the next NorthSec as an opportunity to test the latest technology at scale in a completely controlled environment.

How can you help?

As I mentioned, we’ve been capped at 32 teams and around 300 attendees for the past two years. Our existing hardware was barely sufficient  to handle  the load during those two editions, we urgently need to refresh our hardware to offer the best possible experience to our participants.

We’re planning on replacing most if not all of our hardware with slightly more recent equivalents, also upgrading from rotating drives to SSDs and improving our network. On the software side, we’ll be upgrading to a newer Linux kernel, possibly to Ubuntu 16.04, switch from btrfs to zfs and from LXC to LXD.

We are a Canadian non-profit organization with all our staff being volunteers so we very heavily rely on sponsors to be able to make the event a success.

If you or your company would like to help by sponsoring our infrastructure, get in touch with me. We have several sponsoring levels and can get you the visibility you’d like, ranging from a mention on our website and at the event to on-site presence with a recruitment booth and even, if our interests align, inclusion of your product in some of our challenges.

We also have an Amazon wishlist of smaller (cheaper) items that we need to buy in the near future. If you buy something from the list, get in touch so we can properly thank you!

Oh and as I briefly mentioned at the beginning, we have a two days, single-track conference ahead of the CTF. We’re actively looking for speakers, if you have something interesting to present, the CFP is here.

Extra resources

Posted in Canonical voices, LXC, LXD, Planet Ubuntu | Tagged | 7 Comments

Getting started with LXD – the container lightervisor

Introduction

For the past 6 months, Serge Hallyn, Tycho Andersen, Chuck Short, Ryan Harper and myself have been very busy working on a new container project called LXD.

Ubuntu 15.04, due to be released this Thursday, will contain LXD 0.7 in its repository. This is still the early days and while we’re confident LXD 0.7 is functional and ready for users to experiment, we still have some work to do before it’s ready for critical production use.

LXD logo

So what’s LXD?

LXD is what we call our container “lightervisor”. The core of LXD is a daemon which offers a REST API to drive full system containers just like you’d drive virtual machines.

The LXD daemon runs on every container host and client tools then connect to those to manage those containers or to move or copy them to another LXD.

We provide two such clients:

  • A command line tool called “lxc”
  • An OpenStack Nova plugin called nova-compute-lxd

The former is mostly aimed at small deployments ranging from a single machine (your laptop) to a few dozen hosts. The latter seamlessly integrates inside your OpenStack infrastructure and lets you manage containers exactly like you would virtual machines.

Why LXD?

LXC has been around for about 7 years now, it evolved from a set of very limited tools which would get you something only marginally better than a chroot, all the way to the stable set of tools, stable library and active user and development community that we have today.

Over those years, a lot of extra security features were added to the Linux kernel and LXC grew support for all of them. As we saw the need for people to build their own solution on top of LXC, we’ve developed a public API and a set of bindings. And last year, we’ve put out our first long term support release which has been a great success so far.

That being said, for a while now, we’ve been wanting to do a few big changes:

  • Make LXC secure by default (rather than it being optional).
  • Completely rework the tools to make them simpler and less confusing to newcomers.
  • Rely on container images rather than using “templates” to build them locally.
  • Proper checkpoint/restore support (live migration).

Unfortunately, solving any of those means doing very drastic changes to LXC which would likely break our existing users or at least force them to rethink the way they do things.

Instead, LXD is our opportunity to start fresh. We’re keeping LXC as the great low level container manager that it is. And build LXD on top of it, using LXC’s API to do all the low level work. That achieves the best of both worlds, we keep our low level container manager with its API and bindings but skip using its tools and templates, instead replacing those by the new experience that LXD provides.

How does LXD relate to LXC, Docker, Rocket and other container projects?

LXD is currently based on top of LXC. It uses the stable LXC API to do all the container management behind the scene, adding the REST API on top and providing a much simpler, more consistent user experience.

The focus of LXD is on system containers. That is, a container which runs a clean copy of a Linux distribution or a full appliance. From a design perspective, LXD doesn’t care about what’s running in the container.

That’s very different from Docker or Rocket which are application container managers (as opposed to system container managers) and so focus on distributing apps as containers and so very much care about what runs inside the container.

There is absolutely nothing wrong with using LXD to run a bunch of full containers which then run Docker or Rocket inside of them to run their different applications. So letting LXD manage the host resources for you, applying all the security restrictions to make the container safe and then using whatever application distribution mechanism you want inside.

Getting started with LXD

The simplest way for somebody to try LXD is by using it with its command line tool. This can easily be done on your laptop or desktop machine.

On an Ubuntu 15.04 system (or by using ppa:ubuntu-lxc/lxd-stable on 14.04 or above), you can install LXD with:

sudo apt-get install lxd

Then either logout and login again to get your group membership refreshed, or use:

newgrp lxd

From that point on, you can interact with your newly installed LXD daemon.

The “lxc” command line tool lets you interact with one or multiple LXD daemons. By default it will interact with the local daemon, but you can easily add more of them.

As an easy way to start experimenting with remote servers, you can add our public LXD server at https://images.linuxcontainers.org:8443
That server is an image-only read-only server, so all you can do with it is list images, copy images from it or start containers from it.

You’ll have to do the following to: add the server, list all of its images and then start a container from one of them:

lxc remote add images images.linuxcontainers.org
lxc image list images:
lxc launch images:ubuntu/trusty/i386 ubuntu-32

What the above does is define a new “remote” called “images” which points to images.linuxcontainers.org. Then list all of its images and finally start a local container called “ubuntu-32” from the ubuntu/trusty/i386 image. The image will automatically be cached locally so that future containers are started instantly.

The “<remote name>:” syntax is used throughout the lxc client. When not specified, the default “local” remote is assumed. Should you only care about managing a remote server, the default remote can be changed with “lxc remote set-default”.

Now that you have a running container, you can check its status and IP information with:

lxc list

Or get even more details with:

lxc info ubuntu-32

To get a shell inside the container, or to run any other command that you want, you may do:

lxc exec ubuntu-32 /bin/bash

And you can also directly pull or push files from/to the container with:

lxc file pull ubuntu-32/path/to/file .
lxc file push /path/to/file ubuntu-32/

When done, you can stop or delete your container with one of those:

lxc stop ubuntu-32
lxc delete ubuntu-32

What’s next?

The above should be a reasonably comprehensive guide to how to use LXD on a single system. Of course, that’s not the most interesting thing to do with LXD. All the commands shown above can work against multiple hosts, containers can be remotely created, moved around, copied, …

LXD also supports live migration, snapshots, configuration profiles, device pass-through and more.

I intend to write some more posts to cover those use cases and features as well as highlight some of the work we’re currently busy doing.

LXD is a pretty young but very active project. We’ve had great contributions from existing LXC developers as well as newcomers.

The project is entirely developed in the open at https://github.com/lxc/lxd. We keep track of upcoming features and improvements through the project’s issue tracker, so it’s easy to see what will be coming soon. We also have a set of issues marked “Easy” which are meant for new contributors as easy ways to get to know the LXD code and contribute to the project.

LXD is an Apache2 licensed project, written in Go and which doesn’t require a CLA to contribute to (we do however require the standard DCO Signed-off-by). It can be built with both golang and gccgo and so works on almost all architectures.

Extra resources

More information can be found on the official LXD website:
https://linuxcontainers.org/lxd

The code, issues and pull requests can all be found on Github:
https://github.com/lxc/lxd

And a good overview of the LXD design and its API may be found in our specs:
https://github.com/lxc/lxd/tree/master/specs

Conclusion

LXD is a new and exciting project. It’s an amazing opportunity to think fresh about system containers and provide the best user experience possible, alongside great features and rock solid security.

With 7 releases and close to a thousand commits by 20 contributors, it’s a very active, fast paced project. Lots of things still remain to be implemented before we get to our 1.0 milestone release in early 2016 but looking at what was achieved in just 5 months, I’m confident we’ll have an incredible LXD in another 12 months!

For now, we’d welcome your feedback, so install LXD, play around with it, file bugs and let us know what’s important for you next.

Posted in Canonical voices, LXC, LXD, Planet Revolution-Linux, Planet Ubuntu | Tagged | 39 Comments

VPN in containers

I often have to deal with VPNs, either to connect to the company network, my own network when I’m abroad or to various other places where I’ve got servers I manage.

All of those VPNs use OpenVPN, all with a similar configuration and unfortunately quite a lot of them with overlapping networks. That means that when I connect to them, parts of my own network are no longer reachable or it means that I can’t connect to more than one of them at once.

Those I suspect are all pretty common issues with VPN users, especially those working with or for companies who over the years ended up using most of the rfc1918 subnets.

So I thought, I’m working with containers every day, nowadays we have those cool namespaces in the kernel which let you run crazy things as a a regular user, including getting your own, empty network stack, so why not use that?

Well, that’s what I ended up doing and so far, that’s all done in less than 100 lines of good old POSIX shell script 🙂

That gives me, fully unprivileged non-overlapping VPNs! OpenVPN and everything else run as my own user and nobody other than the user spawning the container can possibly get access to the resources behind the VPN.

The code is available at: git clone git://github.com/stgraber/vpn-container

Then it’s as simple as: ./start-vpn VPN-NAME CONFIG

What happens next is the script will call socat to proxy the VPN TCP socket to a UNIX socket, then a user namespace, network namespace, mount namespace and uts namespace are all created for the container. Your user is root in that namespace and so can start openvpn and create network interfaces and routes. With careful use of some bind-mounts, resolvconf and byobu are also made to work so DNS resolution is functional and we can start byobu to easily allow as many shell as you want in there.

In the end it looks like this:

stgraber@dakara:~/vpn$ ./start-vpn stgraber.net ../stgraber-vpn/stgraber.conf 
WARN: could not reopen tty: No such file or directory
lxc: call to cgmanager_move_pid_abs_sync(name=systemd) failed: invalid request
Fri Sep 26 17:48:07 2014 OpenVPN 2.3.2 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Feb  4 2014
Fri Sep 26 17:48:07 2014 WARNING: No server certificate verification method has been enabled.  See http://openvpn.net/howto.html#mitm for more info.
Fri Sep 26 17:48:07 2014 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Fri Sep 26 17:48:07 2014 Attempting to establish TCP connection with [AF_INET]127.0.0.1:1194 [nonblock]
Fri Sep 26 17:48:07 2014 TCP connection established with [AF_INET]127.0.0.1:1194
Fri Sep 26 17:48:07 2014 TCPv4_CLIENT link local: [undef]
Fri Sep 26 17:48:07 2014 TCPv4_CLIENT link remote: [AF_INET]127.0.0.1:1194
Fri Sep 26 17:48:09 2014 [vorash.stgraber.org] Peer Connection Initiated with [AF_INET]127.0.0.1:1194
Fri Sep 26 17:48:12 2014 TUN/TAP device tun0 opened
Fri Sep 26 17:48:12 2014 Note: Cannot set tx queue length on tun0: Operation not permitted (errno=1)
Fri Sep 26 17:48:12 2014 do_ifconfig, tt->ipv6=1, tt->did_ifconfig_ipv6_setup=1
Fri Sep 26 17:48:12 2014 /sbin/ip link set dev tun0 up mtu 1500
Fri Sep 26 17:48:12 2014 /sbin/ip addr add dev tun0 172.16.35.50/24 broadcast 172.16.35.255
Fri Sep 26 17:48:12 2014 /sbin/ip -6 addr add 2001:470:b368:1035::50/64 dev tun0
Fri Sep 26 17:48:12 2014 /etc/openvpn/update-resolv-conf tun0 1500 1544 172.16.35.50 255.255.255.0 init
dhcp-option DNS 172.16.20.30
dhcp-option DNS 172.16.20.31
dhcp-option DNS 2001:470:b368:1020:216:3eff:fe24:5827
dhcp-option DNS nameserver
dhcp-option DOMAIN stgraber.net
Fri Sep 26 17:48:12 2014 add_route_ipv6(2607:f2c0:f00f:2700::/56 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:714b::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:b368::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:b511::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:b512::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 Initialization Sequence Completed


To attach to this VPN, use: byobu -S /home/stgraber/vpn/stgraber.net.byobu
To kill this VPN, do: byobu -S /home/stgraber/vpn/stgraber.net.byobu kill-server
or from inside byobu: byobu kill-server

After that, just copy/paste the byobu command and you’ll get a shell inside the container. Don’t be alarmed by the fact that you’re root in there. root is mapped to your user’s uid and gid outside the container so it’s actually just your usual user but with a different name and with privileges against the resources owned by the container.

You can now use the VPN as you want without any possible overlap or conflict with any route or VPN you may be running on that system and with absolutely no possibility that a user sharing your machine may access your running VPN.

This has so far been tested with 5 different VPNs, on a regular Ubuntu 14.04 LTS system with all VPNs being TCP based. UDP based VPNs would probably just need a couple of tweaks to the socat unix-socket proxy.

Enjoy!

Posted in Canonical voices, LXC, Planet Ubuntu | Tagged | 4 Comments