GPU inside a container
LXD supports GPU passthrough but this is implemented in a very different way than what you would expect from a virtual machine. With containers, rather than passing a raw PCI device and have the container deal with it (which it can’t), we instead have the host setup with all needed drivers and only pass the resulting device nodes to the container.
This post focuses on NVidia and the CUDA toolkit specifically, but LXD’s passthrough feature should work with all other GPUs too. NVidia is just what I happen to have around.
The test system used below is a virtual machine with two NVidia GT 730 cards attached to it. Those are very cheap, low performance GPUs, that have the advantage of existing in low-profile PCI cards that fit fine in one of my servers and don’t require extra power.
For production CUDA workloads, you’ll want something much better than this.
Note that for this to work, you’ll need LXD 2.5 or higher.
Host setup
Install the CUDA tools and drivers on the host:
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo apt update sudo apt install cuda
Then reboot the system to make sure everything is properly setup. After that, you should be able to confirm that your NVidia GPU is properly working with:
ubuntu@canonical-lxd:~$ nvidia-smi Tue Mar 21 21:28:34 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 26C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+
And can check that the CUDA tools work properly with:
ubuntu@canonical-lxd:~$ /usr/local/cuda-8.0/extras/demo_suite/bandwidthTest [CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GT 730 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3059.4 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3267.4 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 30805.1 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Container setup
First lets just create a regular Ubuntu 16.04 container:
ubuntu@canonical-lxd:~$ lxc launch ubuntu:16.04 c1 Creating c1 Starting c1
Then install the CUDA demo tools in there:
lxc exec c1 -- wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb lxc exec c1 -- dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb lxc exec c1 -- apt update lxc exec c1 -- apt install cuda-demo-suite-8-0 --no-install-recommends
At which point, you can run:
ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Which is expected as LXD hasn’t been told to pass any GPU yet.
LXD GPU passthrough
LXD allows for pretty specific GPU passthrough, the details can be found here.
First let’s start with the most generic one, just allow access to all GPUs:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:47:54 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 27C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
Now just pass whichever is the first GPU:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu id=0 Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:50:37 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
You can also specify the GPU by vendorid and productid:
ubuntu@canonical-lxd:~$ lspci -nnn | grep NVIDIA 02:06.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287] (rev a1) 02:07.0 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1) 02:08.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 730] [10de:1287] (rev a1) 02:09.0 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1) ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu vendorid=10de productid=1287 Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:52:40 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:06.0 N/A | N/A | | 30% 30C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 27C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
Which adds them both as they are exactly the same model in my setup.
But for such cases, you can also select using the card’s PCI ID with:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu pci=0000:02:08.0 Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- nvidia-smi Tue Mar 21 21:56:52 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 0000:02:08.0 N/A | N/A | | 30% 27C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ ubuntu@canonical-lxd:~$ lxc config device remove c1 gpu Device gpu removed from c1
And lastly, lets confirm that we get the same result as on the host when running a CUDA workload:
ubuntu@canonical-lxd:~$ lxc config device add c1 gpu gpu Device gpu added to c1 ubuntu@canonical-lxd:~$ lxc exec c1 -- /usr/local/cuda-8.0/extras/demo_suite/bandwidthTest [CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GT 730 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3065.4 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3305.8 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 30825.7 Result = PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Conclusion
LXD makes it very easy to share one or multiple GPUs with your containers.
You can either dedicate specific GPUs to specific containers or just share them.
There is no of the overhead involved with usual PCI based passthrough and only a single instance of the driver is running with the containers acting just like normal host user processes would.
This does however require that your containers run a version of the CUDA tools which supports whatever version of the NVidia drivers is installed on the host.
Extra information
The main LXD website is at: https://linuxcontainers.org/lxd
Development happens on Github at: https://github.com/lxc/lxd
Mailing-list support happens on: https://lists.linuxcontainers.org
IRC support happens in: #lxcontainers on irc.freenode.net
Try LXD online: https://linuxcontainers.org/lxd/try-it