Salt: automating NVIDIA GPU passthrough: fedora 41 revisions

Go back to topic: Salt: automating NVIDIA GPU passthrough: fedora 41

  1. v10 anchor; v10 full version
  2. v9 anchor; v9 full version
  3. v8 anchor; v8 full version
  4. v7 anchor; v7 full version
  5. v6 anchor; v6 full version
  6. v5 anchor; v5 full version
  7. v4 anchor; v4 full version
  8. v3 anchor; v3 full version
  9. v2 anchor; v2 full version

Revision #10

Edited on
2025-03-02
Edited by user
otter2
## 8. Disable nouveau ## 6. Disable nouveau

Revision #9

Edited on
2025-03-02
Edited by user
otter2
This "guide" aims to explore and give a practical example of leveraging SaltStack to achieve the same goal as [NVIDIA GPU passthrough into Linux HVMs for CUDA applications](https://forum.qubes-os.org/t/nvidia-gpu-passthrough-into-linux-hvms-for-cuda-applications/9515/1). Salt is a management engine that simplifies configuration, and QubesOS has its own flavour. Want to see some? This guide assumes that you're done fiddling with your IOMMU groups and modified grub parameters to allow passthrough. This "guide" aims to explore and give a practical example of leveraging SaltStack to achieve the same goal as [NVIDIA GPU passthrough into Linux HVMs for CUDA applications](https://forum.qubes -os.org/t/nvidia-gpu-passthrough-into-linux-hvms-for-cuda-applications/9515/1). Salt is a management engine that simplifies configuration, and QubesOS has its own flavour. Want to see some? This guide assumes that you're done fiddling with your IOMMU groups and have modified grub parameters to allow passthrough.
Let's start with the obvious. `top.sls` is a top file. It describes high state, which is really just a combination of conventional salt formulas. Stray piece of salt configuration can be referred to as `formula`, although I've seen this word being used in various contexts. `test.sls` is a state file. It contains a configuration written in yaml. `nvidia-driver` is also a state, although it is a directory. This is an alternative way to store state for situations when you want to have multiple state (or not only state) files. When a state directory is referenced, salt evaluates `init.sls` state file inside. State files may or may not be included from `init.sls` or other state files. > Since in this case different formulas are used depending on distribution, it doesn't make much sense to *have* `init.sls`. In this configuration, you can't just call for `nvidia-driver`, and must specify distribution too: `nvidia-driver.f41` Yaml configuration consists of states. In this context, state refers to a module - piece of code that most often does a pretty specific thing. In a configuration, states behave like commands or functions and methods of a programming language. At the same time, salt formulas are distinct from conventional programming languages in their order of execution. Unless you clearly define the order using arguments like `require`, `require_in`, and `order`, you should not expect states to execute in the order you write them. Let's start with the obvious. `top.sls` is a top file. It describes high state, which is really just a combination of conventional salt formulas. Stray piece of salt configuration can be referred to as `formula`, although I've seen this term used to refer to different things. `test.sls` is a state file. It contains a configuration written in yaml. `nvidia-driver` is also a state, although it is a directory. This is an alternative way to store state for situations when you want to have multiple state (or not only state) files. When a state directory is referenced, salt evaluates `init.sls` state file inside. State files may or may not be included from `init.sls` or other state files. > Since in this case different formulas are used depending on distribution, it doesn't make much sense to have `init.sls`. In this configuration, you can't just call for `nvidia-driver`, and must specify distribution too: `nvidia-driver.f41` Yaml configuration consists of states. In this context, state refers to a module - piece of code that most often does a pretty specific thing. In a configuration, states behave like commands or functions and methods of a programming language. At the same time, salt formulas are distinct from conventional programming languages in their order of execution. Unless you clearly define the order using arguments like `require`, `require_in`, and `order`, you should not expect states to execute in any particular sequence.
Here, I use qubes-specific `qvm.vm` state module (which in reality is a wrapper around other modules, like `prefs`, `features`, [etc](https://github.com/QubesOS/qubes-mgmt-salt-dom0-qvm?tab=readme-ov-file#qvm-vm).). Almost all values and keys here are the same as you can set and get using `qvm-prefs` and `qvm-features`. For nvidia drivers to work, kernel must be provided by the qube - that's why the field is empty. Similarly, to pass GPU we need to set virtualization mode to `hvm` and `maxmem` to 0 (it disables memory balancing). Here, I use qubes-specific `qvm.vm` state module (which is a wrapper around other modules, like `prefs`, `features`, [etc](https://github.com/QubesOS/qubes-mgmt-salt-dom0-qvm?tab=readme-ov-file#qvm-vm).). Almost all values and keys here are the same as you can set and get using `qvm-prefs` and `qvm-features`. For nvidia drivers to work, kernel must be provided by the qube - that's why the field is empty. Similarly, to pass GPU we need to set virtualization mode to `hvm` and `maxmem` to 0 (it disables memory balancing).
Now, when we have a qube at the ready (you can check it by applying the formula), how do we install drivers? I want to discuss what's going on next, because at the moment of writing (December 2024) this guide is for fedora 41, and it has some distro-specific issues. Now, when we have a qube at the ready (you can check it by applying the formula), how do we install drivers? I want to discuss what's going on next, because at the moment of writing (March 2025) driver installation processes for fedora 41 and debian 12 are different.
To apply a formula, put your state into the folder in your salt environment together with jinja file and run To apply a formula, put your state into in your salt environment folder together with jinja file and run
1. Prepare qube <-- we're here 2. Enable rpmfusion repository 3. Grow `/tmp/`, because default 1G is too small to fit everything that driver building process spews out. It will fail otherwise. > According to my measurement, driver no longer needs more than 1G to build. I have decided to leave this step in just in case that problem still occurs with different hardware. 6. Install `akmod-nvidia` and `xorg-x11-drv-nvidia-cuda` 8. Wait for the driver to build 9. Delete X config, because we don't need it where we going :sunglasses: 10. **optional** : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver. |#|fedora 41|Debian 12| | --- |--- |--- | |1.|Prepare the qube|Prepare the qube| |2.|Enable rpmfusion repositories|Add repository components| |3.|Grow `/tmp/`, because default 1G is too small to fit everything that driver building process spews out. According to my measurement, driver no longer needs more than 1G to build. I have decided to leave this step in just in case this problem still occurs with different hardware.|-| |4.|Install drivers & wait for build|Update, upgrade, install drivers| |5.|Delete X config, because we don't need it where we going :sunglasses:|-| |6.|**optional** : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver.|-| > :exclamation: Please be aware that both debian and rpmfusion driver package names may be different depending on what graphics card you have. This guide uses most common modern package names, but you should check it for yourself. This guide also assumes that you are starting from default qubes templates - Debian installation process changes if dracut or SecureBoot are used by the system. > :thinking: Debian has `nvidia-detect` program to tell you which drivers you need. I should be able to parse it in a state to create a truly hardware-agnostic formula.
That way, state will be applied to all targets (dom0, prefs.standalone_name), but jinja will edit the state file appropriately for each of them. ## 2. Enable rpmfusion That way, state will be applied to all targets (dom0, prefs.standalone.name), but jinja will edit the state file appropriately for each of them. ## 2. Configure repositories [details="fedora 41"]
&gt; Wow, new dnf syntax. How is this bulkier and less readable than the old one? [/details] [details=&quot;debian 12"] In order to configure Debian repository components, I use the [pkgrepo state](https://docs.saltproject.io/en/latest/ref/states/all/salt.states.pkgrepo.html#module-salt.states.pkgrepo). ```yaml nvidia-driver--enable-repo: pkgrepo.managed: - name: deb https://deb.debian.org/debian bookworm main contrib non-free non-free-firmware - file: /etc/apt/sources.list ``` [/details]
This one is very bad. dnf syntax have changed after the update and salt-native package management state doesn't work anymore. I am forced to run this as a command. Here, I use `- require:` parameter to wait for other states to apply before installing the drivers. Note that it needs both state (e.g. `cmd`) and label to function. Since `pkg.install` doesn&#39;t work with dnf yet, I resort to running a command. I declare jinja variable just before the command and immediately use it instead of writing the long command to keep the state tidy. As far as I can tell, `&#39; '.join()` does the same thing it does in python: converts list into string by connecting its elements with `' '`. Here, I use `- require:` parameter to wait for other states to apply before installing the drivers. Note that it needs both state (e.g. `cmd`) and label to function. [details=&quot;fedora 41&quot;]
{% set packages = [ 'akmod-nvidia', 'xorg-x11-drv-nvidia-cuda', 'vulkan', ] %}
cmd.run: - name: dnf install -y {{ &#39; &#39;.join(packages) }} pkg.installed: - pkgs: - akmod-nvidia - xorg-x11-drv-nvidia-cuda {# - vulkan #}
- cmd: nvidia-driver--enable-repo - pkgrepo: nvidia-driver--enable-repo
``` ## 6. Wait for the drivers to build Well, this one is kind of wonky. `loop.until_no_eval` runs the state specified by `- name:` until it returns stuff from `- expected`. Here it is set to try once in 20 seconds for 600 seconds. I think it totals 10 minutes. `- args:` describe what to pass to the state in the `- name:` Wonkyness comes from the fact that I run `modinfo -F name nvidia`, which translates into "What is the name of the module with the name 'nvidia'?". It just returns an error until module is present (i.e. done building), and then returns 'nvidia'. ```yaml nvidia-driver--assert-install:
## 7. Delete X config In case of fedora I also use `loop.until_no_eval` to wait until driver is done building. It runs the state specified by `- name:` until it returns stuff from `- expected`. Here it is set to try once in 20 seconds for 600 seconds. `- args:` describe what to pass to the state in the `- name:` Essentially, it runs `modinfo -F name nvidia`, which translates into "What is the name of the module with the name 'nvidia'?". It just returns an error until module is present (i.e. done building), and then returns 'nvidia'. [/details] [details="debian 12"] ```yaml nvidia-driver--install: cmd.run: - name: apt update -y && apt upgrade -y - requires: - pkgrepo: nvidia-driver--enable-repo pkg.installed: - names: - linux-headers-amd64 - nvidia-driver - firmware-misc-nonfree - nvidia-open-kernel-dkms - nvidia-cuda-dev - nvidia-cuda-toolkit {# comment `nvidia-open-kernel-dkms` out to go full proprietary #} - requires: - cmd: nvidia-driver--install
Technically, apt only needs to update metadata before installing, but I also run upgrade because the default debian template is pretty far behind. [/details] ## 5. Delete X config ```
- [f41.yaml|attachment](upload://nPz2haLCFV17rmFrz3FNAjUs2Cm.yaml) (1.9 KB) - [f41-disable-nouveau.yaml|attachment](upload://yJVnEZLkahAJXKlsjt6gPQEN85H.yaml) (498 Bytes) - [default.yaml|attachment](upload://6JKuhy0loihkwT1RA9gi1lYekc8.yaml) (379 Bytes) - [d12.yaml|attachment](upload://bmFmVgEasG0Jw8Mp3dZHHqswVRi.yaml) (1.5 KB) - [default.yaml|attachment](upload://mSrCZkU1qF5fkxRDeqyr9HqnJQ4.yaml) (650 Bytes) - [f41.yaml|attachment](upload://5uTlMBj6HUUGR0yt3XGTQpPBYzW.yaml) (1.6 KB) - [f41-disable-nouveau.yaml|attachment](upload://y4DO96s7LM6Ne29uURUhVxTqcG.yaml) (567 Bytes)

Revision #8

Edited on
2025-03-02
Edited by user
otter2

Revision #7

Edited on
2024-12-07
Edited by user
otter2
│ ├── disable-nouveau.sls │ ├── init.sls │ └── map.jinja │ ├── f41-disable-nouveau.sls │ ├── f41.sls │ └── default.jinja
Let's start with the obvious. `top.sls` is a top file. It describes high state, which is really just a combination of convetional salt formulas. Stray piece of salt configuration can be referred to as `formula`, although I've seen this word being used in various contexts. `test.sls` is a state file. It contains a configuration written in yaml. `nvidia-driver` is also a state, although it is a directory. This is an alternative way to store state for situations when you want to have multiple state (or not only state) files. When a state directory is referenced, salt evaluates `init.sls` state file inside. State files may or may not be `include`d from `init.sls` or other state files. Yaml configuration consists of states. In this context, state refers to a module - piece of code that most often does a pretty specific thing. In a configuration, states behave like commands or functions and methods of a programming language. One valuable thing to note here is that not all modules *are* state modules. There are [a lot](https://docs.saltproject.io/en/latest/py-modindex.html) of them, and they can do various things, but here we only need the state kind. In addition to state files, you notice `map.jinja`. [Jinja](https://palletsprojects.com/projects/jinja/) is a templating engine. What it means is that it helps you to generalize your state files by adding variables, conditions and other cool features. You can easily recognize jinja by fancy brackets: `{{ }}`, `{% %}`, `{# #}`. This file in particular stores variable definitions and is used for configuration of the whole state directory thingy (nvidia-driver). Let's start with the obvious. `top.sls` is a top file. It describes high state, which is really just a combination of conventional salt formulas. Stray piece of salt configuration can be referred to as `formula`, although I've seen this word being used in various contexts. `test.sls` is a state file. It contains a configuration written in yaml. `nvidia-driver` is also a state, although it is a directory. This is an alternative way to store state for situations when you want to have multiple state (or not only state) files. When a state directory is referenced, salt evaluates `init.sls` state file inside. State files may or may not be included from `init.sls` or other state files. > Since in this case different formulas are used depending on distribution, it doesn't make much sense to *have* `init.sls`. In this configuration, you can't just call for `nvidia-driver`, and must specify distribution too: `nvidia-driver.f41` Yaml configuration consists of states. In this context, state refers to a module - piece of code that most often does a pretty specific thing. In a configuration, states behave like commands or functions and methods of a programming language. At the same time, salt formulas are distinct from conventional programming languages in their order of execution. Unless you clearly define the order using arguments like `require`, `require_in`, and `order`, you should not expect states to execute in the order you write them. > One thing to note here is that not all modules *are* state modules. There are [a lot](https://docs.saltproject.io/en/latest/py-modindex.html) of them, and they can do various things, but here we only need the state kind. In addition to state files, you notice `default.jinja`. [Jinja](https://palletsprojects.com/projects/jinja/) is a templating engine. What it means is that it helps you to generalize your state files by adding variables, conditions and other cool features. You can easily recognize jinja by fancy brackets: `{{ }}`, `{% %}`, `{# #}`. This file in particular stores variable definitions and is used for configuration of the whole state tree (directory `nvidia-folder`).
- name: {{ prefs.standalone_name }} - name: {{ nvd_f41['standalone']['name'] }}
- template: {{ prefs.template_name }} - label: {{ prefs.standalone_label }} - mem: {{ prefs.standalone_memory }} - vcpus: {{ prefs.standalone_cpus }} - template: {{ nvd_f41['template']['name'] }} - label: {{ nvd_f41['standalone']['label'] }} - flags: - standalone - prefs: - vcpus: {{ nvd_f41['standalone']['vcpus'] }} - memory: {{ nvd_f41['standalone']['memory'] }}
- class: StandaloneVM - prefs: - label: {{ prefs.standalone_label }} - mem: {{ prefs.standalone_memory }} - vcpus: {{ prefs.standalone_cpus }} - pcidevs: {{ devices }} - pcidevs: {{ nvd_f41['devices'] }}
- maxmem: 0 - class: StandaloneVM
Here, I use qubes-specific `qvm.vm` state module (which in reality is a wrapper around other modules, like `prefs`, `features`, [etc](https://github.com/QubesOS/qubes-mgmt-salt-dom0-qvm?tab=readme-ov-file#qvm-vm).). Pretty much all values and keys here are the same as you can set and get using `qvm-prefs` and `qvm-features`. For nvidia drivers to work, kernel must be provided by the qube - that's why the field is empty. Similarly, to pass GPU we need to set virtualization mode to `hvm` and `maxmem` to 0 (it disables memory balancing). Here, I use qubes-specific `qvm.vm` state module (which in reality is a wrapper around other modules, like `prefs`, `features`, [etc](https://github.com/QubesOS/qubes-mgmt-salt-dom0-qvm?tab=readme-ov-file#qvm-vm).). Almost all values and keys here are the same as you can set and get using `qvm-prefs` and `qvm-features`. For nvidia drivers to work, kernel must be provided by the qube - that's why the field is empty. Similarly, to pass GPU we need to set virtualization mode to `hvm` and `maxmem` to 0 (it disables memory balancing).
Now, to the jinja statements. Here, they provide values for keys like label, template, name, etc. Some of them are done this way (as opposed to writing a value by hand) because the value is repeated in the state file multiple times, others are to simplify the process of configuration. In order to figure out why some of them use dot notation whereas other don't, we must check their declaration. In this state file they're imported using the following line: Now, to the jinja statements. Here, they provide values for keys like label, template, name, etc. Some of them are done this way (as opposed to writing a value by hand) because the value is repeated in the state file multiple times, others are to simplify the process of configuration. In this state file jinja variable is imported using the following snippet:
{% from 'nvidia-driver/map.jinja' import prefs,devices,paths %} {% if nvd_f41 is not defined %} {% from 'nvidia-driver/default.jinja' import nvd_f41 %} {% endif %}
This is pretty much just python in brackets. Notice that you need to specify state directory when importing, and use actual path instead of dot notation. Jinja is very similar in its syntax to python. In this case variable from `default.jinja` gets imported only if it is not declared in the current context. It allows us to both call this formula as is (without any jinja context) and include it in other formulas (potentially with custom definition of `nvd_f41`). Note that you need to specify state directory when importing, and use actual path instead of dot notation.
```jinja {% set prefs = { 'standalone_name': 'fedora-40-nvidia', 'standalone_label': 'yellow', 'standalone_memory': 4000, 'standalone_cpus': 4, 'template_name': 'fedora-40-xfce', } %} {# Don't forget to check devices before running! #} {% set devices = [ '01:00.0', '01:00.1', ] %} {% set paths = { 'nvidia_conf': '/usr/share/X11/xorg.conf.d/nvidia.conf', 'grub_conf': '/etc/default/grub', 'grub_out': '/boot/grub2/grub.cfg', ```python {% set nvd_f41 = { 'template':{'name':'fedora-41-xfce'}, 'standalone':{ 'name':'fedora-41-nvidia-cuda', 'label':'yellow', 'memory':'4000', 'vcpus':'4', }, 'devices&#39;:['01:00.0','01:00.1'], 'paths':{ 'nvidia_conf':'usr/share/X11/xorg.conf.d/nvidia.conf', 'grub_conf':'/etc/default/grub', 'grub_out':'/boot/grub2/grub.cfg', },
Here, I declare dictionary `prefs`, list `devices`, and another dictionary `paths`. Since we need to pass all devices from the list to new qube, in the state file I reference the whole list using jinja expression (`{{ devices }}`). Dictionaries are used to fill parameters, and dot notation is used to reference specific values in them. Double brackets tell the parser to "print" the value into state file before the show starts, whereas statements (`{% %}`) do logic. `{# #}` is a comment. Here, I declare dictionary `nvd_f41`. It contains sub-dictionaries for template parameters, standalone qube parameters, list of pci devices to pass through, and another dictionary for paths. Since we need to pass all devices in the list to new qube, in the state file I reference whole list. Jinja behavior differs depending on what delimiter is used. Code in double brackets (called expression) tells the parser to "print" the resulting value into state file before the show starts. Statements (`{% %}`) do logic. `{# #}` is a comment.
Now, when we have a qube at the ready (you can check it by applying it), how to install drivers? I want to discuss what's going on next, because at the moment of writing (November 2024) this guide is for fedora 40 in combination with somewhat modern hardware, and it has some distro-specific issues. &gt; Tip: To apply state, put your state into the folder in your salt environment together with jinja file and run Now, when we have a qube at the ready (you can check it by applying the formula), how do we install drivers? I want to discuss what's going on next, because at the moment of writing (December 2024) this guide is for fedora 41, and it has some distro-specific issues. [details=&quot;How do I apply a formula?"] To apply a formula, put your state into the folder in your salt environment together with jinja file and run
(substitute `<name_of_your_state>`) > > Salt will apply the state to all targets. When not specified, dom0 is the only target. This is what we want here, because dom0 handles creation of qubes, but what to do if situation is different? Add `--skip-dom0` if you want to skip dom0 and add `--targets=<targets>` to add something else. (substitute `<name_of_your_state>`) Salt will apply the state to all targets. When not specified, dom0 is the only target. This is what we want here, because dom0 handles creation of qubes. Add `--skip-dom0` if you want to skip dom0 and add `--targets=<targets>` to add something else. [/details]
4. Delete `grubby-dummy`, becase it confilcts with `sdubby`, and nvidia drivers depend on it. 5. Install `akmod-nvidia` and `xorg-x11-drv-nvidia-cuda` 6. Wait for building process to finish 7. Delete X config, because we don't need it where we going :sunglasses: 8. **optional** : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver. > According to my measurement, driver no longer needs more than 1G to build. I have decided to leave this step in just in case that problem still occurs with different hardware. 6. Install `akmod-nvidia` and `xorg-x11-drv-nvidia-cuda` 8. Wait for the driver to build 9. Delete X config, because we don't need it where we going :sunglasses: 10. **optional** : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver.
Unless you are willing to write (and call for) multiple states to perform single operation, you might be wandering how to make salt apply only first state (qube creation) to dom0, and all others - to the nvidia qube. The answer is to use jinja: Unless you are willing to write (and call for) multiple states to perform one operation, you might be wandering how to make salt apply only first state (qube creation) to dom0, and all others - to the nvidia qube. The answer is to use jinja:
{% elif grains['id'] == prefs.standalone_name %} <!-- prefs.standalone_name stuff goes here --> {% elif grains['id'] == nvd_f41['standalone']['name'] %} <!-- nvd_f41['standalone']['name'] stuff goes here -->
Pretty self-explanatory. `{free,nonfree}` is used to enable multiple repositories at once. This is not salt or jinja-specific. Pretty self-explanatory. `{free,nonfree}` is used to enable multiple repositories at once. It is a feature of the shell, not salt or jinja.
- name: dnf config-manager --enable rpmfusion-{free,nonfree}{,-updates} ``` - name: dnf config-manager setopt rpmfusion-{free,nonfree}{,-updates}.enabled=1 ``` > Wow, new dnf syntax. How is this bulkier and less readable than the old one?
This lasts until reboot. 4G is probably overkill. This lasts until reboot. As I already mentioned, you might not need this. On the other hand, it is non-persistent and generally harmless, so why not?
- name: mount -o remount,size=4G /tmp/ - name: mount -o remount,size=2G /tmp/
## 4. Delete `grubby-dummy` ## 4. Install drivers This one is very bad. dnf syntax have changed after the update and salt-native package management state doesn't work anymore. I am forced to run this as a command. Here, I use `- require:` parameter to wait for other states to apply before installing the drivers. Note that it needs both state (e.g. `cmd`) and label to function. Since `pkg.install` doesn't work with dnf yet, I resort to running a command. I declare jinja variable just before the command and immediately use it instead of writing the long command to keep the state tidy. As far as I can tell, `' '.join()` does the same thing it does in python: converts list into string by connecting its elements with `' '`.
nvidia-driver--remove-grubby: pkg.purged: - pkgs: - grubby-dummy ``` ## 5. Install drivers Here, I use `- require:` parameter to wait for other states to apply before installing the drivers. Note that it needs both state (e.g. `cmd`) and label to function. ```yaml {% set packages = [ 'akmod-nvidia', 'xorg-x11-drv-nvidia-cuda&#39;, 'vulkan', ] %}
pkg.installed: - pkgs: - akmod-nvidia - xorg-x11-drv-nvidia-cuda - nvtop cmd.run: - name: dnf install -y {{ ' '.join(packages) }}
- pkg: nvidia-driver--remove-grubby
- pkg: nvidia-driver--install - cmd: nvidia-driver--install
- name: {{ paths.nvidia_conf }} - name: {{ nvd_f41['paths']['nvidia_conf'] }}
- Why don't just add reboot state into the state file before this? Because only dom0 can reboot qubes, dom0 states are always applied first, and there is no way I know of to make it run part of its state, wait until a condition is met, and continue, without multiple calls to qubesctl, unless... > To run state located inside state folder, use dot notation, e.g.: `state.sls nvidia-driver.disable-nouveau` - Why don't just add reboot state into the state file before this? Because only dom0 can reboot qubes, dom0 states are always applied first, and there is no way I know of to make it run part of its state, wait until a condition is met, and continue, without multiple calls to qubesctl, unless...
{% from 'nvidia-driver/map.jinja' import prefs,paths %} {% if grains['id'] == prefs.standalone_name %} {% if nvd_f41 is not defined %} {% from 'nvidia-driver/default.jinja' import nvd_f41 %} {% endif %} {% if grains['id'] == nvd_f41['standalone']['name'] %}
- name: {{ paths.grub_conf }} - name: {{ nvd_f41['paths']['grub_conf'] }}
- name: grub2-mkconfig -o {{ paths.grub_out }} - name: grub2-mkconfig -o {{ nvd_f41['paths']['grub_out'] }}
Make sure to change the paths if you're not running fedora 40. Make sure to change the paths if you're not running fedora 41.
*** Want to check out the complete state? Here you go: # Downloads ## Current version - [github repo](https://github.com/RandyTheOtter/nvidia-driver/tree/main) - Direct download: - [f41.yaml|attachment](upload://nPz2haLCFV17rmFrz3FNAjUs2Cm.yaml) (1.9 KB) - [f41-disable-nouveau.yaml|attachment](upload://yJVnEZLkahAJXKlsjt6gPQEN85H.yaml) (498 Bytes) - [default.yaml|attachment](upload://6JKuhy0loihkwT1RA9gi1lYekc8.yaml) (379 Bytes) ## Old version (fedora 40) This formula is quite a lot different from the new one due to the changes in dnf, resolution of `grubby-dummy` dependency conflict, and updates to the way I use jinja. Here is the old procedure: 1. Prepare qube 2. Enable rpmfusion repository 3. Grow `/tmp/`, because default 1G is too small to fit everything that driver building process spews out. It will fail otherwise. 5. Delete `grubby-dummy`, becase it confilcts with `sdubby`. Nvidia drivers depend on it. See [this issue](https://github.com/QubesOS/qubes-issues/issues/9556). 6. Install `akmod-nvidia` and `xorg-x11-drv-nvidia-cuda` 7. Wait for the driver to build 8. Delete X config, because we don't need it where we going :sunglasses: 9. **optional** : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver. Old state files. They're kind of uggo, but I like them anyway:

Revision #6

Edited on
2024-12-07
Edited by user
otter2

Revision #5

Edited on
2024-11-08
Edited by user
otter2
If you download state files, you will find it in a separate file. It is done for two reasons: If you download state files, you will find it in a separate file. It is done so for two reasons:

Revision #4

Edited on
2024-11-08
Edited by user
otter2
3. **why?** : ![qubesctl|690x413, 75%](upload://nuZbEI1jUo4Qf3hyWRlQZUUrGAy.jpeg) 3. **why?** : also I don't think it can work ![qubesctl|690x413, 100%](upload://nuZbEI1jUo4Qf3hyWRlQZUUrGAy.jpeg)

Revision #3

Edited on
2024-11-08
Edited by user
otter2
Now, when we have a qube at the ready (you can check it by applying it), how to install drivers? I want to discuss what's going on next, because at the moment of writing (November 2024) this guide is for fedora 40 in combination with somewhat modern hardware, and it has some distro-specific issues. Other distributions might not encounter these, and you need to tweak the state anyway, at least to change paths and repository. In addition to that, `nvidia-open` drivers may be available to you. Now, when we have a qube at the ready (you can check it by applying it), how to install drivers? I want to discuss what's going on next, because at the moment of writing (November 2024) this guide is for fedora 40 in combination with somewhat modern hardware, and it has some distro-specific issues.
- Why don't just add reboot state into the state file before this? Because only dom0 can reboot qubes, dom0 states are always applied first, and there is no way I know of to make it run part of its state, wait until a condition is met, and continue without multiple calls to qubesctl, unless... - Why don't just add reboot state into the state file before this? Because only dom0 can reboot qubes, dom0 states are always applied first, and there is no way I know of to make it run part of its state, wait until a condition is met, and continue, without multiple calls to qubesctl, unless...

Revision #2

Edited on
2024-11-08
Edited by user
otter2
In addition to state files, you notice `map.jinja`. [Jinja](https://palletsprojects.com/projects/jinja/) is a templating engine. What it means is that it helps you to generalize your state files by adding variables, conditions and other cool features. You can easily recognize jinja statement by fancy brackets: `{{ }}`, `{% %}`, `{# #}`. This file in particular stores variable definitions and is used for configuration of the whole state directory thingy (nvidia-driver). In addition to state files, you notice `map.jinja`. [Jinja](https://palletsprojects.com/projects/jinja/) is a templating engine. What it means is that it helps you to generalize your state files by adding variables, conditions and other cool features. You can easily recognize jinja by fancy brackets: `{{ }}`, `{% %}`, `{# #}`. This file in particular stores variable definitions and is used for configuration of the whole state directory thingy (nvidia-driver).
- label: {{ prefs.standalone_label }} - mem: {{ prefs.standalone_memory }} - vcpus: {{ prefs.standalone_cpus }} - maxmem: 0
This is pretty much just python in brackets. Notice that you need to specify directory when importing, and use actual path instead of dot notation. This is pretty much just python in brackets. Notice that you need to specify state directory when importing, and use actual path instead of dot notation.
{# Don't forget to check devices before running! #}
``` Here, I declare dictionary `prefs`, list `devices`, and another dictionary `paths`. Since we need to pass all devices from the list to new qube, in the state file I reference the whole list using jinja expression (`{{ devices }}`). Dictionaries are used to fill parameters, and dot notation is used to reference specific values in them. Double brackets tell the parser to "print" the value into state file before the show starts, whereas statements (`{% %}`) do logic. `{# #}` is a comment. ## 1.5 Interlude: what's next? Now, when we have a qube at the ready (you can check it by applying it), how to install drivers? I want to discuss what's going on next, because at the moment of writing (November 2024) this guide is for fedora 40 in combination with somewhat modern hardware, and it has some distro-specific issues. Other distributions might not encounter these, and you need to tweak the state anyway, at least to change paths and repository. In addition to that, `nvidia-open` drivers may be available to you. > Tip: To apply state, put your state into the folder in your salt environment together with jinja file and run `sudo qubesctl --show-output state.sls <name_of_your_state> saltenv=user` (substitute `<name_of_your_state>`) > > Salt will apply the state to all targets. When not specified, dom0 is the only target. This is what we want here, because dom0 handles creation of qubes, but what to do if situation is different? Add `--skip-dom0` if you want to skip dom0 and add `--targets=<targets>` to add something else. The plan: 1. Prepare qube <-- we're here 2. Enable rpmfusion repository 3. Grow `/tmp/`, because default 1G is too small to fit everything that driver building process spews out. It will fail otherwise. 4. Delete `grubby-dummy`, becase it confilcts with `sdubby`, and nvidia drivers depend on it. 5. Install `akmod-nvidia` and `xorg-x11-drv-nvidia-cuda` 6. Wait for building process to finish 7. Delete X config, because we don't need it where we going :sunglasses: 8. **optional** : Disable nouveau, because nvidia install script may fail to convince the system that it should use nvidia driver. ## 2-0.5. How to choose target *inside* the state file Unless you are willing to write (and call for) multiple states to perform single operation, you might be wandering how to make salt apply only first state (qube creation) to dom0, and all others - to the nvidia qube. The answer is to use jinja: ``` {% if grains['id'] == 'dom0' %} <!-- Dom0 stuff goes here --> {% elif grains['id'] == prefs.standalone_name %} <!-- prefs.standalone_name stuff goes here --> {% endif %}
That way, state will be applied to all targets (dom0, prefs.standalone_name), but jinja will edit the state file appropriately for each of them. ## 2. Enable rpmfusion Pretty self-explanatory. `{free,nonfree}` is used to enable multiple repositories at once. This is not salt or jinja-specific. ```yaml nvidia-driver--enable-repo: cmd.run: - name: dnf config-manager --enable rpmfusion-{free,nonfree}{,-updates} ``` ## 3. Extend `/tmp/` This lasts until reboot. 4G is probably overkill. ```yaml nvidia-driver--extend-tmp: cmd.run: - name: mount -o remount,size=4G /tmp/ ``` ## 4. Delete `grubby-dummy` ```yaml nvidia-driver--remove-grubby: pkg.purged: - pkgs: - grubby-dummy ``` ## 5. Install drivers Here, I use `- require:` parameter to wait for other states to apply before installing the drivers. Note that it needs both state (e.g. `cmd`) and label to function. ```yaml nvidia-driver--install: pkg.installed: - pkgs: - akmod-nvidia - xorg-x11-drv-nvidia-cuda - nvtop - require: - cmd: nvidia-driver--enable-repo - cmd: nvidia-driver--extend-tmp - pkg: nvidia-driver--remove-grubby ``` ## 6. Wait for the drivers to build Well, this one is kind of wonky. `loop.until_no_eval` runs the state specified by `- name:` until it returns stuff from `- expected`. Here it is set to try once in 20 seconds for 600 seconds. I think it totals 10 minutes. `- args:` describe what to pass to the state in the `- name:` Wonkyness comes from the fact that I run `modinfo -F name nvidia`, which translates into "What is the name of the module with the name 'nvidia'?". It just returns an error until module is present (i.e. done building), and then returns 'nvidia'. ```yaml nvidia-driver--assert-install: loop.until_no_eval: - name: cmd.run - expected: 'nvidia' - period: 20 - timeout: 600 - args: - modinfo -F name nvidia - require: - pkg: nvidia-driver--install ``` ## 7. Delete X config ``` nvidia-driver--remove-conf: file.absent: - name: {{ paths.nvidia_conf }} - require: - loop: nvidia-driver--assert-install ``` ## 8. Disable nouveau If you download state files, you will find it in a separate file. It is done for two reasons: 1. It may not be required 1. I think vm must be restarted before this change is applied, so first run the main state and apply this after restarting the qube. - Why? No idea. - Why don't just add reboot state into the state file before this? Because only dom0 can reboot qubes, dom0 states are always applied first, and there is no way I know of to make it run part of its state, wait until a condition is met, and continue without multiple calls to qubesctl, unless... > To run state located inside state folder, use dot notation, e.g.: `state.sls nvidia-driver.disable-nouveau` ```yaml {% from 'nvidia-driver/map.jinja' import prefs,paths %} {% if grains['id'] == prefs.standalone_name %} nvidia-driver.disable-nouveau--blacklist-nouveau: file.append: - name: {{ paths.grub_conf }} - text: 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rd.driver.blacklist=nouveau"' nvidia-driver.disable-nouveau--grub-mkconfig: cmd.run: - name: grub2-mkconfig -o {{ paths.grub_out }} - require: - file: nvidia-driver.disable-nouveau--blacklist-nouveau {% endif %} ``` Make sure to change the paths if you're not running fedora 40. *not* the ways I know of: 1. **advanced** : Check for conditions in grains and pillars, see [this topic](https://forum.qubes-os.org/t/how-to-make-a-salt-state-on-target-a-require-a-state-on-target-b/18417/7) 3. **why?** : ![qubesctl|690x413, 75%](upload://nuZbEI1jUo4Qf3hyWRlQZUUrGAy.jpeg)
I've got tired and busy, will continue explaining salt later. Want to check out the complete state? Here you go: - [init.yaml|attachment](upload://dtQci0ZXEq5syKSEl3R1mE9QN9T.yaml) (2.2 KB) <- how to use note is here - [disable-nouveau.yaml|attachment](upload://6QPZMtjX68YwUWFXSachbJvpPhB.yaml) (486 Bytes) - [map.yaml|attachment](upload://i2iojuyKNIN8oaH84Js2fjxEkGJ.yaml) (412 Bytes) rename to map.jinja Want to check out the complete state? Here you go: [disable-nouveau.yaml|attachment](upload://6QPZMtjX68YwUWFXSachbJvpPhB.yaml) (486 Bytes) [init.yaml|attachment](upload://dtQci0ZXEq5syKSEl3R1mE9QN9T.yaml) (2.2 KB) [map.yaml|attachment](upload://vI6VPNFNPMC5m5UM7ekWpvLljLf.yaml) (468 Bytes)