If you like to run an AI model locally this is how I have been running ollama in a dedicated appVM. Performance is alright depending on the size of the choosen model.
Recommended settings for appVM
private storage max size: 80 GB
initial memory: 16000 MB
max memory: what you can spare
VCPUs: 4
In the template:
sudo pacman -Syu
sudo pacman -S ollama
sudo pacman -S docker docker-compose # optional
In the appVM:
sudo mkdir -p /rw/bind-dirs/var/lib/ollama
sudo mkdir -p /rw/config/qubes-bind-dirs.d
sudo nano /rw/config/qubes-bind-dirs.d/50_user.conf
binds+=( '/var/lib/ollama' ) binds+=( '/var/lib/docker' )
sudo nano /rw/config/rc.local
!/bin/sh
# increase swap size swapoff /dev/xvdc1 parted -s /dev/xvdc rm 1 parted -s /dev/xvdc rm 3 parted -s /dev/xvdc mkpart primary linux-swap 10G mkswap /dev/xvdc swapon -d /dev/xvdc
# service is disabled in template systemctl start ollama
# several AI projects offer docker containers, you could # run ollama in a docker container instead if you like # systemctl start docker
Restart appVM, download a language model and run it
ollama help
ollama pull llama3.2
ollama run llama3.2
run
gives you a chat interface in the terminal, however it's service also offers an API running/listening on 127.0.0.1:11434
. Have fun and may enough RAM be with you.