Nvidia GPU Pass-through on Qubes 4.3 (Fedora 43 Template)

Original forum link
https://forum.qubes-os.org/t/37795
Original poster
hUt4Ke107Y7VyK
Editors
hUt4Ke107Y7VyK
Created at
2025-12-09 10:15:23
Last wiki edit
2025-12-10 09:47:23
Revisions
1 revision
Posts count
1
Likes count
0

Author's Note: This guide addresses specific dependency conflicts involving the grubby-dummy package that break standard Nvidia driver installations in Qubes. It implements a fully automated "native DNF" solution using an improved custom dummy package and DNF5 hooks, eliminating the need for manual wrapper scripts during updates.


⚠️ WARNINGS AND DISCLAIMERS

  1. Security: This setup weakens isolation and is not recommended for high-security contexts.
  2. Compatibility: This might not work with all hardware configurations.
  3. Stability: Updates might beak this setup.

Phase 1: Dom0 Configuration

We need to isolate the GPU from Dom0 so it can be passed to a VM.

  1. Identify your GPU PCI IDs: Open a terminal in Dom0 and run:

    lspci -nn | grep -i nvidia
    
    Example Output: > 05:00.0 VGA compatible controller [0300]: NVIDIA Corporation... [10de:1e04] > 05:00.1 Audio device [0403]: NVIDIA Corporation... [10de:10f7] > 05:00.2 USB controller... > 05:00.3 Serial bus controller...

  2. Hide the devices from Dom0: Edit /etc/default/grub in Dom0. Find the line GRUB_CMDLINE_LINUX and append the following (replace the IDs with your specific ones found above):

    rd.qubes.hide_pci=05:00.0,05:00.1,05:00.2,05:00.3
    
  3. Update Grub and Reboot:

    sudo grub2-mkconfig -o /boot/grub2/grub.cfg
    sudo reboot
    


Phase 2: TemplateVM Preparation

We will create a specialized template for GPU workloads.

  1. Clone the Template: Clone your standard Fedora 43 template (e.g., fedora-43-xfce) to fedora-43-xfce-gpu.

  2. Configure Template Settings: Open Qube Settings for fedora-43-xfce-gpu:

  3. Install Build Tools: Start the template (or use a temporary Disposable VM) and install the necessary tools:

    sudo dnf install rpm-build rpmrebuild
    


Phase 3: The "Super-Grubby" Fix (Solving Dependency Hell)

The Nvidia driver packages have a strict dependency on /usr/bin/grubby. The standard grubby-dummy package in Qubes does not satisfy this requirement in a way that pleases the DNF5 dependency resolver, causing conflicts with upstream packages (like sdubby or the real grubby).

We solve this by building a "Super-Dummy" package that explicitly provides the binary paths and capabilities required by the driver, preventing DNF from trying to pull in conflicting packages.

  1. Create the Spec File: Run this command in the TemplateVM to create the build recipe:

    cat <<EOF > super-grubby.spec
    Name:       grubby-dummy
    Version:    99.0.0
    Release:    2%{?dist}
    Epoch:      1000
    Summary:    Super Dummy for Grubby and Sdubby
    License:    Public Domain
    BuildArch:  noarch
    
    # Claim to provide the packages
    Provides:   grubby = %{version}
    Provides:   sdubby = %{version}
    Provides:   grubby-dummy = %{version}
    
    # Claim to provide the specific binary paths (Virtual Provision)
    Provides:   /usr/bin/grubby
    Provides:   /usr/sbin/grubby
    
    # Block the real packages
    Obsoletes:  grubby < %{version}
    Obsoletes:  sdubby < %{version}
    
    %description
    Dummy package to satisfy Nvidia driver dependencies for /usr/bin/grubby.
    
    %build
    # Nothing to build
    
    %install
    # Create only /usr/bin
    mkdir -p %{buildroot}/usr/bin
    
    # Create the dummy script
    echo '#!/bin/bash' > %{buildroot}/usr/bin/grubby
    echo 'echo "Dummy grubby called - doing nothing."' >> %{buildroot}/usr/bin/grubby
    echo 'exit 0' >> %{buildroot}/usr/bin/grubby
    
    # Make it executable
    chmod +x %{buildroot}/usr/bin/grubby
    
    %files
    /usr/bin/grubby
    
    EOF
    
  2. Build the Package:

    rpmbuild -bb super-grubby.spec
    

  3. Install the Super-Dummy:

    You might first need to remove old grubby-dummy manually.

    sudo rpm -e --nodeps grubby-dummy
    

    This will replace the existing Qubes dummy and prevent DNF from ever trying to install the conflicting package.

    sudo dnf install ~/rpmbuild/RPMS/noarch/grubby-dummy-99.0.0-2.fc43.noarch.rpm -y
    

Phase 4: Automating Updates (DNF5 Hooks)

We need to automate the Dracut configuration and fix the "Split-Brain" issue (where the headless Template crashes if Nvidia EGL is enabled, but the AppVM needs it enabled).

  1. Create the Hook Script: Create /usr/local/bin/qubes-nvidia-hook.sh:

    #!/bin/bash
    set -e
    
    # --- Configuration ---
    NVIDIA_EGL="/usr/share/glvnd/egl_vendor.d/10_nvidia.json"
    NVIDIA_EGL_BACKUP="/usr/share/glvnd/egl_vendor.d/10_nvidia.json.enabled"
    DRACUT_CONF="/usr/lib/dracut/dracut.conf.d/99-nvidia-dracut.conf"
    XORG_CONF="/usr/share/X11/xorg.conf.d/nvidia.conf"
    
    echo ">>> [Nvidia-Hook] Starting post-update configuration..."
    
    # 1. Handle EGL Split-Brain (Template vs AppVM)
    # We save a copy of the config for the AppVM, then disable it for the Template
    if [ -f "$NVIDIA_EGL" ]; then
        # Check if the file is NOT empty (meaning it was just replaced by an update)
        if [ -s "$NVIDIA_EGL" ] && [ "$(cat "$NVIDIA_EGL")" != "{}" ]; then
            echo " -> New Nvidia EGL config detected."
    
            # Snapshot the fresh config to the .enabled file for the AppVM to use
            cp -f "$NVIDIA_EGL" "$NVIDIA_EGL_BACKUP"
            echo " -> Updated AppVM backup ($NVIDIA_EGL_BACKUP)."
    
            # Neuter the active config for the Template (prevents crash on shutdown/boot)
            echo "{}" > "$NVIDIA_EGL"
            echo " -> Disabled EGL for Template (wrote empty JSON)."
        else
            echo " -> EGL config already neutralized."
        fi
    fi
    
    # 2. Fix Dracut Config (omit -> add)
    # The update usually resets this file, so we force-patch it every time.
    if [ -f "$DRACUT_CONF" ]; then
        sed -i 's/omit_drivers/add_drivers/g' "$DRACUT_CONF"
        echo " -> Dracut config patched (omit_drivers -> add_drivers)."
    fi
    
    # 3. Remove conflicting Xorg config (Fixes VM crash/hang on shutdown)
    if [ -f "$XORG_CONF" ]; then
        rm -f "$XORG_CONF"
        echo " -> Conflicting Xorg config removed."
    fi
    
    # 4. Regenerate Initramfs
    # CRITICAL: Target the LATEST installed kernel, not necessarily the running one.
    LATEST_KERNEL=$(ls /lib/modules | sort -V | tail -n 1)
    
    if [ -n "$LATEST_KERNEL" ]; then
        echo " -> Regenerating initramfs for kernel: $LATEST_KERNEL"
        dracut -f --kver "$LATEST_KERNEL"
    else
        echo " -> Warning: Could not detect kernel version. Skipping dracut."
    fi
    
    echo ">>> [Nvidia-Hook] Cleanup complete."
    
  2. Make it Executable:

    sudo chmod +x /usr/local/bin/qubes-nvidia-hook.sh
    

  3. Register the DNF5 Action: Create /etc/dnf/libdnf5-plugins/actions.d/nvidia-qubes.actions:

    # Trigger the fix script after any transaction involving nvidia packages
    # Syntax: trigger:package_filter:direction:option:command
    post_transaction:*nvidia*:in::/usr/local/bin/qubes-nvidia-hook.sh
    

Phase 5: Install Nvidia Drivers

Now that the infrastructure is in place, installing the drivers is standard.

  1. Enable RPM Fusion nonfree repository:

    sudo dnf config-manager setopt rpmfusion-nonfree.enabled=1
    sudo dnf config-manager setopt rpmfusion-nonfree-updates.enabled=1
    

  2. Install Packages:

    sudo dnf install xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda akmod-nvidia kernel-devel
    
    Note: Due to the DNF hook, the initramfs regeneration and config patching will happen automatically at the end of this transaction.

  3. Shutdown the Template:

    sudo poweroff
    


Phase 6: AppVM Configuration

We need an AppVM that has the physical GPU attached and knows how to restore the EGL config that we disabled in the Template.

  1. Create the AppVM: Create gpu-personal based on fedora-43-xfce-gpu.

  2. Configure AppVM Settings:

  3. Enable Nvidia EGL (The Split-Brain Fix): The Template has an empty EGL config (to prevent crashes). We need the AppVM to use the valid backup we created.

    Start the AppVM, open a terminal, and edit /rw/config/rc.local:

    sudo nano /rw/config/rc.local
    

    Add this content:

    #!/bin/bash
    # Restore Nvidia EGL config for GPU pass-through
    if [ -f /usr/share/glvnd/egl_vendor.d/10_nvidia.json.enabled ]; then
        mount --bind /usr/share/glvnd/egl_vendor.d/10_nvidia.json.enabled /usr/share/glvnd/egl_vendor.d/10_nvidia.json
    fi
    

  4. Reboot the AppVM.


Verification & Troubleshooting

After rebooting the AppVM:

  1. Verify Driver Load: Open a terminal in the AppVM:

    nvidia-smi
    
    You should see your GPU model and memory usage.

  2. Manual Recovery (If updates fail): If the DNF hook ever fails to fire, you can manually trigger the fix in the TemplateVM:

    sudo /usr/local/bin/qubes-nvidia-hook.sh
    

  3. Manual Module Build: If nvidia-smi fails, check if the module was built in the Template:

    rpm -qa | grep kmod-nvidia
    # If missing, force rebuild:
    sudo akmods --rebuild --kernels $(uname -r)
    

Edit: Added enabling RPM Fusion nonfree repository to phase 5.