页面历史记录
本文解决TrueNAS 删除 显卡驱动 & 安装vGPU 驱动 & 安装指定版本Linux驱动
删除驱动
使用 dpkg -l | grep nvidia 查看需要删除的nvidia 显卡软件, 使用 dpkg --purge 来删除
Update:
TrueNAS-SCALE-24.10.* 之后,TrueNas不在包含Nvidia 驱动,而是通过网络的形式自动安装
最终 只保留以下软件:
代码块 | ||
---|---|---|
| ||
dpkg -l | grep nvidia root@XIMCloudNAS[~]# dpkg -l | grep nvidia ii libnvidia-container-tools 1.13.4-1 amd64 NVIDIA container runtime library (command-line tools) ii libnvidia-container1:amd64 1.13.4-1 amd64 NVIDIA container runtime library ii nvidia-container-runtime 3.13.0-1 all NVIDIA container runtime ii nvidia-container-toolkit 1.13.4-1 amd64 NVIDIA Container toolkit ii nvidia-container-toolkit-base 1.13.4-1 amd64 NVIDIA Container Toolkit Base |
配置 apt 代理
配置apt代理用于加速下载
代码块 | ||
---|---|---|
| ||
vim /etc/apt/apt.conf.d/proxy.conf Acquire::http::Proxy "http://192.168.5.1:1088/"; Acquire::https::Proxy "http://192.168.5.1:1088/"; |
在 TrueNAS-SCALE-24.* 之前打开:
代码块 |
---|
chmod +x /usr/bin/apt apt-get apt-key |
在 TrueNAS-SCALE-24.10.* 及之后 使用:
代码块 | ||
---|---|---|
| ||
install-dev-tools |
安装vGPU 驱动
下载地址:Nvidia vGPU 驱动
代码块 | ||
---|---|---|
| ||
./NVIDIA-Linux-x86_64-535.161.07-grid.run --tmpdir /tmp |
安装完毕通过命令 nvidia-smi 确认
配置gridd服务
代码块 | ||
---|---|---|
| ||
cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf vim /etc/nvidia/gridd.conf |
代码块 | ||
---|---|---|
| ||
# /etc/nvidia/gridd.conf.template - Configuration file for vGPU Licensing Daemon # This is a template for the configuration file for vGPU Licensing Daemon. # For details on the file format, please refer to the nvidia-gridd(1) # man page. # Description: Set License Server Address # Data type: string # Format: "<address>" ServerAddress=nvidia-dls address # Description: Set License Server port number # Data type: integer # Format: <port>, default is 7070 ServerPort=443 |
重启 host 并通过 nvidia-smi -q 确认 license,有过期时间即可
TrueNAS-SCALE-24.10.* 及之后可能会遇到的疑难杂症
Q1:
[EFAULT] Command /root/tmpj_sx5af4/NVIDIA-Linux-x86_64-550.127.05-no-compat32.run --tmpdir /root/tmpj_sx5af4 -s failed (code 1): Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.127.05....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
TrueNas 下载驱动失败,检查显卡是否是vGPU 或检查网络
Q2:
TrueNAS 24.04 disable the apt command. So need using the one to turn on https://www.truenas.com/community/threads/no-apt-after-update-to-release.99579/post-808108
In DragonFish you can enable apt / toggle "developer" mode by running the command "install-dev-tools" or /usr/local/libexec/disable-rootfs-protection
.
This makes the boot device read-write and sets an internal flag so that we know the base install has been altered (helps for triaging bug reports).
Q3:
truenas 23.10 配置 containerd 镜像加速地址
在/etc/rancher/k3s路径下新建registries.yaml,写入
mirrors:
"docker.io":
endpoint:
- "https://docker.nju.edu.cn/" ##加速地址,我使用的是南京大学开源镜像站
- "https://registry-1.docker.io"
然后重启k3s服务 systemctl restart k3s.service
已证实重启不会失效,
估计升级要重新配置,待验证
参考 https://www.cnblogs.com/rancherlabs/p/14324469.html
Q3:
[EFAULT] Failed to render compose templates: base_v1_1_4.utils.TemplateException: Expected [uuid] to be set for GPU inslot [0000:02:00.0] in [nvidia_gpu_selection]
未配置GPU UUID
执行命令
代码块 |
---|
midclt call -job docker.update '{"nvidia": true}' |
代码块 |
---|
https://ixsystems.atlassian.net/browse/NAS-132086
midclt call app.gpu_choices | jq
'{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'
Now for each app that you encounter the mentioned error:
On the following command, before running.
- Replace APP_NAME with the name you entered in the application (Example “plex”)
- Replace PCI_SLOT with the pci slot from the error (Example “0000:2d:00.0”)
- Replace GPU_UUID with the uuid that you retrieved from the above command, that matches the pci slot.
midclt call -job app.update APP_NAME '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}' |