本文解决TrueNAS 删除 显卡驱动 & 安装vGPU 驱动 & 安装指定版本Linux驱动
删除驱动
使用 dpkg -l | grep nvidia 查看需要删除的nvidia 显卡软件, 使用 dpkg --purge 来删除
Update:
TrueNAS-SCALE-24.10.* 之后,TrueNas不在包含Nvidia 驱动,而是通过网络的形式自动安装
最终 只保留以下软件:
dpkg -l | grep nvidia root@XIMCloudNAS[~]# dpkg -l | grep nvidia ii libnvidia-container-tools 1.13.4-1 amd64 NVIDIA container runtime library (command-line tools) ii libnvidia-container1:amd64 1.13.4-1 amd64 NVIDIA container runtime library ii nvidia-container-runtime 3.13.0-1 all NVIDIA container runtime ii nvidia-container-toolkit 1.13.4-1 amd64 NVIDIA Container toolkit ii nvidia-container-toolkit-base 1.13.4-1 amd64 NVIDIA Container Toolkit Base
配置 apt 代理
配置apt代理用于加速下载
vim /etc/apt/apt.conf.d/proxy.conf Acquire::http::Proxy "http://192.168.5.1:1088/"; Acquire::https::Proxy "http://192.168.5.1:1088/";
在 TrueNAS-SCALE-24.* 之前打开:
chmod +x /usr/bin/apt apt-get apt-key
在 TrueNAS-SCALE-24.10.* 及之后 使用:
install-dev-tools
安装vGPU 驱动
下载地址:Nvidia vGPU 驱动
./NVIDIA-Linux-x86_64-535.161.07-grid.run --tmpdir /tmp
安装完毕通过命令 nvidia-smi 确认
配置gridd服务
cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf vim /etc/nvidia/gridd.conf
# /etc/nvidia/gridd.conf.template - Configuration file for vGPU Licensing Daemon # This is a template for the configuration file for vGPU Licensing Daemon. # For details on the file format, please refer to the nvidia-gridd(1) # man page. # Description: Set License Server Address # Data type: string # Format: "<address>" ServerAddress=nvidia-dls address # Description: Set License Server port number # Data type: integer # Format: <port>, default is 7070 ServerPort=443
重启 host 并通过 nvidia-smi -q 确认 license,有过期时间即可
TrueNAS-SCALE-24.10.* 及之后可能会遇到的疑难杂症
Q1:
[EFAULT] Command /root/tmpj_sx5af4/NVIDIA-Linux-x86_64-550.127.05-no-compat32.run --tmpdir /root/tmpj_sx5af4 -s failed (code 1): Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.127.05....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
TrueNas 下载驱动失败,检查显卡是否是vGPU 或检查网络
Q2:
TrueNAS 24.04 disable the apt command. So need using the one to turn on https://www.truenas.com/community/threads/no-apt-after-update-to-release.99579/post-808108
In DragonFish you can enable apt / toggle "developer" mode by running the command "install-dev-tools" or /usr/local/libexec/disable-rootfs-protection
.
This makes the boot device read-write and sets an internal flag so that we know the base install has been altered (helps for triaging bug reports).
Q3:
truenas 23.10 配置 containerd 镜像加速地址
在/etc/rancher/k3s路径下新建registries.yaml,写入
mirrors:
"docker.io":
endpoint:
- "https://docker.nju.edu.cn/" ##加速地址,我使用的是南京大学开源镜像站
- "https://registry-1.docker.io"
然后重启k3s服务 systemctl restart k3s.service
已证实重启不会失效,
估计升级要重新配置,待验证
参考 https://www.cnblogs.com/rancherlabs/p/14324469.html
Q3:
[EFAULT] Failed to render compose templates: base_v1_1_4.utils.TemplateException: Expected [uuid] to be set for GPU inslot [0000:02:00.0] in [nvidia_gpu_selection]
未配置GPU UUID
执行命令
midclt call -job docker.update '{"nvidia": true}'
https://ixsystems.atlassian.net/browse/NAS-132086 midclt call app.gpu_choices | jq '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}' Now for each app that you encounter the mentioned error: On the following command, before running. - Replace APP_NAME with the name you entered in the application (Example “plex”) - Replace PCI_SLOT with the pci slot from the error (Example “0000:2d:00.0”) - Replace GPU_UUID with the uuid that you retrieved from the above command, that matches the pci slot. midclt call -job app.update APP_NAME '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'
Q4:
root@XIMCloudNAS[...loudMassStorage/XIMCloudSharedStorage]# install-dev-tools + FORCE_ARG= + [[ '' == \-\-\f\o\r\c\e ]] + [[ ! -S /var/run/middleware/middlewared.sock ]] + PACKAGES=(make open-iscsi python3-cryptography python3-pip python3-pyfakefs python3-pyotp python3-pytest python3-pytest-asyncio python3-pytest-dependency python3-pytest-rerunfailures python3-pytest-timeout snmp sshpass zstd) + PIP_PACKAGES=() + '[' -f /usr/local/libexec/disable-rootfs-protection ']' + /usr/local/libexec/disable-rootfs-protection /usr is currently provided by a readonly systemd system extension. This may occur if nvidia module support is enabled. System extensions must be disabled prior to disabling rootfs protection.
解决办法
$sudo systemd-sysext unmerge
$sudo install-dev-tools
$sudo systemd-sysext merge
https://forums.truenas.com/t/is-install-dev-tools-broken-in-24-10-2/28673/3
10 评论
匿名用户 发表:
2月 23, 2024TrueNAS 24.04 disable the apt command. So need using the one to turn on https://www.truenas.com/community/threads/no-apt-after-update-to-release.99579/post-808108
匿名用户 发表:
2月 23, 2024In DragonFish you can enable apt / toggle "developer" mode by running the command "install-dev-tools" or
/usr/local/libexec/disable-rootfs-protection
.This makes the boot device read-write and sets an internal flag so that we know the base install has been altered (helps for triaging bug reports).
匿名用户 发表:
2月 28, 2024truenas 23.10 配置 containerd 镜像加速地址
在/etc/rancher/k3s路径下新建registries.yaml,写入
mirrors:
"docker.io":
endpoint:
- "https://docker.nju.edu.cn/" ##加速地址,我使用的是南京大学开源镜像站
- "https://registry-1.docker.io"
然后重启k3s服务 systemctl restart k3s.service
已证实重启不会失效,
估计升级要重新配置,待验证
参考 https://www.cnblogs.com/rancherlabs/p/14324469.html
匿名用户 发表:
10月 30, 2024[EFAULT] Command /root/tmpj_sx5af4/NVIDIA-Linux-x86_64-550.127.05-no-compat32.run --tmpdir /root/tmpj_sx5af4 -s failed (code 1): Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.127.05....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
匿名用户 发表:
10月 30, 2024[EFAULT] Failed to render compose templates: base_v1_1_4.utils.TemplateException: Expected [uuid] to be set for GPU inslot [0000:02:00.0] in [nvidia_gpu_selection]
匿名用户 发表:
10月 31, 2024midclt call -job docker.update '{"nvidia": true}'
usami mizugi 发表: 作者
11月 09, 2024{
"data-root": "/mnt/.ix-apps/docker",
"exec-opts": ["native.cgroupdriver=cgroupfs"],
"iptables": true,
"storage-driver": "overlay2",
"default-address-pools": [{
"base": "172.17.0.0/12",
"size": 24
}],
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"registry-mirrors": [
"https://docker.1panel.live"
],
"default-runtime": "nvidia"
}
usami mizugi 发表: 作者
11月 09, 2024https://ixsystems.atlassian.net/browse/NAS-132086
midclt call app.gpu_choices | jq
'{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'
Now for each app that you encounter the mentioned error:
On the following command, before running.
- Replace
APP_NAME
with the name you entered in the application (Example “plex”)- Replace
PCI_SLOT
with the pci slot from the error (Example “0000:2d:00.0”)- Replace
GPU_UUID
with the uuid that you retrieved from the above command, that matches the pci slot.midclt call -job app.update APP_NAME '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'
usami mizugi 发表: 作者
11月 13, 2024https://update.sys.truenas.net/scale/TrueNAS-SCALE-ElectricEel-Nightlies/TrueNAS-SCALE-24.10.1-MASTER-20241111-040152.update?download=1
usami mizugi 发表: 作者
2月 07, 2025{
"live-restore": true,
"proxies": {
"http-proxy": "http://192.168.5.1:1088",
"no-proxy": "localhost,127.0.0.0/8",
"https-proxy": "http://192.168.5.1:1088"
}
}
添加评论