Proxmox Server vGPU

From SEPT Knowledge Base
This assumes NVidia vGPU compatible cards and is meant for the VXRail type system. See the official Proxmox documentation for consumer card passthroughs.

PCI Passthrough

Verifying IOMMU parameters

Verify IOMMU is enabled

iDRAC

  1. Log into the iDRAC
  2. Select the BIOS options under Configuration from IDRAC
  3. Select Processor settings and ensure that Virtualization Technology is enabled.
  4. Click apply if changes were made at the bottom of the list and reboot

LifeCycle Controller

TBC

Verify IOMMU Isolation

Add CPU passthrough for the modules at boot time

cat << EOF >> /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
EOF

Update the kernel by running update-initramfs -u -k all

Update GRUB
  1. Edit the file /etc/default/grub and to the GRUB_CMDLINE_LINUX_DEFAULT="quiet" add intel_iommu=on iommu=pt
  2. Run proxmox-boot-tool refresh
ZFS File systems
  1. Edit the file /etc/kernel/cmdline and add to the end of it intel_iommu=on
  2. Run the command proxmox-boot-tool refresh
  3. Reboot the machine
  4. Confirm the IOMMU parameter with cat /proc/cmdline

Reboot the system

Confirm that IOMMU is enabled after reboot by running: dmesg | grep -e DMAR -e IOMMU

Check PCIe devices show

For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.

When executing (replacing {nodename} with the name of your node):

# pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""

you should get:

┌──────────┬────────┬──────────────┬────────────┬────────┬────────────────────────────────────────┬...
│ class    │ device │ id           │ iommugroup │ vendor │ device_name                            |                         
╞══════════╪════════╪══════════════╪════════════╪════════╪════════════════=═══════════════════════╪
|0x030200  |0x1eb8  | 0000:3b:00.0 |   3        | 0x10de |  TU104GL [Tesla T4]                    |
├──────────┼────────┼──────────────┼────────────┼────────┼────────────────────────────────────────┼

CONFIRM that each of the devices has a unique IOMMU group.

Activate GPU Passthrough

Blacklisting drivers

The standard NVidia and nouveau drivers from Linux needs to be blacklisted.

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf 
echo "blacklist nvidia*" >> /etc/modprobe.d/blacklist.conf

After blacklisting, you will need to reboot.

Setup Proxmox VE Repositories

Ensure that the correct repositories are enabled:

Ensure that the relevant repositories are enabled. Most systems we use are using the no-subscription model. You can use the Repositories management panel in the Proxmox VE web UI for managing package repositories, see the documentation for details.

Setup DKMS:

Because the NVIDIA module is separate from the kernel, it must be rebuilt with Dynamic Kernel Module Support (DKMS) for each new kernel update.

To set up DKMS, you must install the headers package for the kernel and the DKMS helper package. In a root shell, run

apt update
apt install dkms libc6-dev proxmox-default-headers --no-install-recommends

Installing Host Drivers

NOTE: The drivers can be downloaded using the btechts account on the nVidia website or found under the folder \VMware\7 Ent - VXRail\nVidia\ of the software share.

  1. Copy the KVM based run file to the host/node
  2. Make the file executable
  3. Install the drivers with the dkms module enabled
chmod +x NVIDIA-Linux-x86_64-xxx.xxx.xx-vgpu-kvm.run
./NVIDIA-Linux-x86_64-xxx.xxx.xx-vgpu-kvm.run

After the installer has finished successfully, you will need to reboot your system, either using the web interface or by executing reboot.

Enable SR-IOV

  1. Create the folder structure /usr/local/lib/systemd/system/
  2. Create the file /usr/local/lib/systemd/system/nvidia-sriov.service
  3. Add the service information below to the service
  4. Reload the systemctl daemon
  5. Enable the SR-IOV module
mkdir -p /usr/local/lib/systemd/system/
cat <<EOF > /usr/local/lib/systemd/system/nvidia-sriov.service
[Unit]
Description=Enable NVIDIA SR-IOV
After=network.target nvidia-vgpud.service nvidia-vgpu-mgr.service
Before=pve-guests.service

[Service]
Type=oneshot
ExecStart=/usr/lib/nvidia/sriov-manage -e ALL

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now nvidia-sriov.service