cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf
vim /etc/nvidia/gridd.conf
# /etc/nvidia/gridd.conf.template - Configuration file for vGPU Licensing Daemon
# This is a template for the configuration file for vGPU Licensing Daemon.
# For details on the file format, please refer to the nvidia-gridd(1)
# man page.
# Description: Set License Server Address
# Data type: string
# Format: "<address>"
ServerAddress=nvidia-dls address
# Description: Set License Server port number
# Data type: integer
# Format: <port>, default is 7070
ServerPort=443
重启 host 并通过 nvidia-smi -q 确认 license,有过期时间即可
TrueNAS-SCALE-24.10.* 及之后可能会遇到的疑难杂症
Q1:
[EFAULT] Command /root/tmpj_sx5af4/NVIDIA-Linux-x86_64-550.127.05-no-compat32.run --tmpdir /root/tmpj_sx5af4 -s failed (code 1): Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.127.05....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page atwww.nvidia.com.
In DragonFish you can enable apt / toggle "developer" mode by running the command "install-dev-tools" or/usr/local/libexec/disable-rootfs-protection. This makes the boot device read-write and sets an internal flag so that we know the base install has been altered (helps for triaging bug reports).
[EFAULT] Failed to render compose templates: base_v1_1_4.utils.TemplateException: Expected [uuid] to be set for GPU inslot [0000:02:00.0] in [nvidia_gpu_selection]
未配置GPU UUID
执行命令
midclt call -job docker.update '{"nvidia": true}'
https://ixsystems.atlassian.net/browse/NAS-132086
midclt call app.gpu_choices | jq
'{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'
Now for each app that you encounter the mentioned error:
On the following command, before running.
- Replace APP_NAME with the name you entered in the application (Example “plex”)
- Replace PCI_SLOT with the pci slot from the error (Example “0000:2d:00.0”)
- Replace GPU_UUID with the uuid that you retrieved from the above command, that matches the pci slot.
midclt call -job app.update APP_NAME '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'
Q4:
root@XIMCloudNAS[...loudMassStorage/XIMCloudSharedStorage]# install-dev-tools
+ FORCE_ARG=
+ [[ '' == \-\-\f\o\r\c\e ]]
+ [[ ! -S /var/run/middleware/middlewared.sock ]]
+ PACKAGES=(make open-iscsi python3-cryptography python3-pip python3-pyfakefs python3-pyotp python3-pytest python3-pytest-asyncio python3-pytest-dependency python3-pytest-rerunfailures python3-pytest-timeout snmp sshpass zstd)
+ PIP_PACKAGES=()
+ '[' -f /usr/local/libexec/disable-rootfs-protection ']'
+ /usr/local/libexec/disable-rootfs-protection
/usr is currently provided by a readonly systemd system extension. This may occur if nvidia module support is enabled. System extensions must be disabled prior to disabling rootfs protection.
Breadcrumbs
TrueNAS 更新/安装 nvidia 显卡驱动
本文解决TrueNAS 删除 显卡驱动 & 安装vGPU 驱动 & 安装指定版本Linux驱动
删除驱动
使用 dpkg -l | grep nvidia 查看需要删除的nvidia 显卡软件, 使用 dpkg --purge 来删除
Update:
TrueNAS-SCALE-24.10.* 之后,TrueNas不在包含Nvidia 驱动,而是通过网络的形式自动安装
最终 只保留以下软件:
配置 apt 代理
配置apt代理用于加速下载
在 TrueNAS-SCALE-24.* 之前打开:
在 TrueNAS-SCALE-24.10.* 及之后 使用:
安装vGPU 驱动
下载地址:Nvidia vGPU 驱动
安装完毕通过命令 nvidia-smi 确认
配置gridd服务
重启 host 并通过 nvidia-smi -q 确认 license,有过期时间即可
TrueNAS-SCALE-24.10.* 及之后可能会遇到的疑难杂症
Q1:
[EFAULT] Command /root/tmpj_sx5af4/NVIDIA-Linux-x86_64-550.127.05-no-compat32.run --tmpdir /root/tmpj_sx5af4 -s failed (code 1): Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.127.05....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
TrueNas 下载驱动失败,检查显卡是否是vGPU 或检查网络
Q2:
TrueNAS 24.04 disable the apt command. So need using the one to turn on https://www.truenas.com/community/threads/no-apt-after-update-to-release.99579/post-808108
In DragonFish you can enable apt / toggle "developer" mode by running the command "install-dev-tools" or
/usr/local/libexec/disable-rootfs-protection
.This makes the boot device read-write and sets an internal flag so that we know the base install has been altered (helps for triaging bug reports).
Q3:
truenas 23.10 配置 containerd 镜像加速地址
在/etc/rancher/k3s路径下新建registries.yaml,写入
mirrors:
"docker.io":
endpoint:
- "https://docker.nju.edu.cn/" ##加速地址,我使用的是南京大学开源镜像站
- "https://registry-1.docker.io"
然后重启k3s服务 systemctl restart k3s.service
已证实重启不会失效,
估计升级要重新配置,待验证
参考 https://www.cnblogs.com/rancherlabs/p/14324469.html
Q3:
[EFAULT] Failed to render compose templates: base_v1_1_4.utils.TemplateException: Expected [uuid] to be set for GPU inslot [0000:02:00.0] in [nvidia_gpu_selection]
未配置GPU UUID
执行命令
Q4:
解决办法
$sudo systemd-sysext unmerge
$sudo install-dev-tools
$sudo systemd-sysext merge
https://forums.truenas.com/t/is-install-dev-tools-broken-in-24-10-2/28673/3