GPU inside a container
LXD supports GPU passthrough but this is implemented in a very different way than what you would expect from a virtual machine. With containers, rather than passing a raw PCI device and have the container deal with it (which it can’t), we instead have the host setup with all needed drivers and only pass the resulting device nodes to the container.
This post focuses on NVidia and the CUDA toolkit specifically, but LXD’s passthrough feature should work with all other GPUs too. NVidia is just what I happen to have around.
The test system used below is a virtual machine with two NVidia GT 730 cards attached to it. Those are very cheap, low performance GPUs, that have the advantage of existing in low-profile PCI cards that fit fine in one of my servers and don’t require extra power.
For production CUDA workloads, you’ll want something much better than this.
Note that for this to work, you’ll need LXD 2.5 or higher.
Host setup
Install the CUDA tools and drivers on the host:
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo apt update sudo apt install cuda
Then reboot the system to make sure everything is properly setup. After that, you should be able to confirm that your NVidia GPU is properly working with:
ubuntu@canonical-lxd:~$ nvidia-smi Tue Mar 21 21:28:34 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 26C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+
And can check that the CUDA tools work properly with:
ubuntu@canonical-lxd:~$ /usr/local/cuda-8.0/extras/demo_suite/bandwidthTest [CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GT 730 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3059.4 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3267.4 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 30805.1 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Container setup
First lets just create a regular Ubuntu 16.04 container:
ubuntu@canonical-lxd:~$ lxc launch ubuntu:16.04 c1 Creating c1 Starting c1
Then install the CUDA demo tools in there:
lxc exec c1 -- wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb lxc exec c1 -- dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb lxc exec c1 -- apt update lxc exec c1 -- apt install cuda-demo-suite-8-0 --no-install-recommends
At which point, you can run:
ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Which is expected as LXD hasn’t been told to pass any GPU yet.
LXD GPU passthrough
LXD allows for pretty specific GPU passthrough, the details can be found here.
First let’s start with the most generic one, just allow access to all GPUs:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:47:54 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 27C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
Now just pass whichever is the first GPU:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu id=0 Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:50:37 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
You can also specify the GPU by vendorid and productid:
ubuntu@canonical-lxd:~$ lspci -nnn | grep NVIDIA 02:06.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287] (rev a1) 02:07.0 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1) 02:08.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287] (rev a1) 02:09.0 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1) ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu vendorid=10de productid=1287 Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:52:40 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 27C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
Which adds them both as they are exactly the same model in my setup.
But for such cases, you can also select using the card’s PCI ID with:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu pci=0000:02:08.0 Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:56:52 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 27C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
And lastly, lets confirm that we get the same result as on the host when running a CUDA workload:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- /usr/local/cuda-8.0/extras/demo_suite/bandwidthTest [CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GT 730 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3065.4 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3305.8 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 30825.7 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Conclusion
LXD makes it very easy to share one or multiple GPUs with your containers.
You can either dedicate specific GPUs to specific containers or just share them.
There is no of the overhead involved with usual PCI based passthrough and only a single instance of the driver is running with the containers acting just like normal host user processes would.
This does however require that your containers run a version of the CUDA tools which supports whatever version of the NVidia drivers is installed on the host.
Extra information
The main LXD website is at: https://linuxcontainers.org/lxd
Development happens on Github at: https://github.com/lxc/lxd
Mailing-list support happens on: https://lists.linuxcontainers.org
IRC support happens in: #lxcontainers on irc.freenode.net
Try LXD online: https://linuxcontainers.org/lxd/try-it
Hi. How can I use multiple containers with same GPU? Can i passthrough the GPU without to use virtualgl?
Just add the gpu device to multiple containers, that gives the container access to /dev/dri, /dev/nvidia, … so no need for something like virtualgl.
Can you share GPU and use GLX across multiple containers without VirtualGL? Do you have an example of this please? I would be very curious. Thank you.
Very cool this is going to be a great replacement for my VM-based cuda development/testing environment.
Could I suggest naming the container something else then “cuda”, perhaps “cuda-container” or something else distinguishable? It’s rather easy to loose track of what the container name is in the commands because of “cuda” being such a central term in the context.
Good point, will rename to something else.
Great tutorial! I have to try it and understand it!
Love the tutorial but it doesn’t seem to be working.
Specifically, “lxc config device add c1 gpu gpu” gives the error “error: Invalid device type for device ‘gpu’ ”
Any advice?
version of lxc is low
@Steven : I had same error response for same command (container name notwithstanding). Did you by chance install the latest cuda toolkit with the run file? I’m wondering if there’s some dependency installed with the package manager install method for this lxc add device command to work. I went with the run file method as I could not get the package manager to work on the lxc host. Unfortunately I cannot test this conjecture.
I got the same error when I was running the version of LXD (2.0) that’s distributed with Ubuntu 16.04. LXD version 2.5 is needed for GPU passthrough to work.
You’ll need to uninstall LXD 2.0 then install the LXD 2.# (currently 2.20) feature release as described at https://linuxcontainers.org/lxd/getting-started-cli/
And for that to work you might also need to enable and configure manual xenial-backports via the guide at https://help.ubuntu.com/community/UbuntuBackports#Enabling_Backports_Manually
Hello, very nice tutorial 🙂
Will this work for Debian 8/9?
Hi, I’m trying to replicate this example, may I ask you what virtualisation platform your VM is using and if there are any specific configuration to allow the VM to see the two CUDA cards?
Many thanks
Paolo
Thank you, very nice.
Had some problems installing the “demo”-package in the container, so I went for “apt install cuda” within the container as well, which worked fine. Also some problems with the keys to the repository.
In the end it all worked fine.
Hey, it seems gpu id starts from 1 not 0? I test with my computer (NVIDIA Tesla K80). And it seems that it does not accept “1-4”.
What would be the equivalent for amd based GPU, i can’t find any eq /dev node.