- Home
- Documentation
- Nebula System
- Nebula GPU for Instances
Nebula GPU for Instances
-
Documentation
- Release Notes
- Get Started
- Nexus Server
- Nexus Application
- Nexus Stacks
- Nexus Two Factor Authentication
- Nexus GUI and Modules
- Access Gates
- Access Keys
- Block Storage
- Codespaces
- Cron Scheduler
- Data Bright
- Data Gate
- Data Insight
- Data Spark-house
- Data Spark-nodes
- Data Spark-solaris
- Data Stream
- Desktops
- Event Hub
- Firewall
- Flow-fx
- Groups
- Identities
- Instances-cn
- Instances-vm
- Instances-xvm
- Load Balancer
- Magna-app
- Magna-buckets
- Magna-db
- Magna-nodes
- Magna-s3
- Magna-se
- Magna-sqld
- Magna-sqlr
- Name Server
- Notification Gate
- Object Storage
- Private Network
- Repositories
- Roles
- SIEM Collector
- Secret Keys
- Security Scanner
- Serverless-api
- Serverless-flow
- Serverless-fx
- Serverless-json
- Serverless-mq
- Serverless-spark
- Sky Link
- Sky Nodes
- Solution Stacks
- VPN Manager
- Vista Sessions
- Nebula System
- Vista Connect
Nebula GPU for Instances
The Nebula system can leverage Nvidia GPU cards, sharing them with Instances-cn machines or using Nvidia vGPU management software to partition the GPU into multiple vGPU mdev devices, which can then be utilized by Instances-xvm machines.
Officially, Nvidia GPU RTX/T/V Series devices and the Nvidia vGPU Linux_KVM Driver 17+ are supported. However, there are reports that older devices and driver versions may also work, provided you can install and configure mdev device slicing on the Linux distribution of a Sky Node. For more details, please refer to the Nvidia Documentation.
- Before enabling GPU support for instances, ensure that you have installed the correct Nvidia driver for your GPU model and Linux distribution. Refer to Nvidia's installation guide to find the driver version that matches your GPU.
- Do not use the PASSTHROUGH method to set up your GPU, as Nebula does not support passthrough devices.
- Install the Nvidia drivers and complete this setup before deploying Nebula.
GPU for Instances-cn machines
Containers can use the Nvidia GPU device installed on a Sky Node by installing additional packages on the Sky Node.
- Debian Linux
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list; sudo apt-get update; sudo apt-get install -y nvidia-container-toolkit;
- RHEL-based distributions
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo; sudo yum install -y nvidia-container-toolkit;
GPU for Instances-xvm machines
Using Nvidia vGPU management software to partition the GPU into multiple vGPU mdev devices for use by Instances-xvm machines involves a more complex setup and requires basic Linux skills. After installing the Nvidia KVM driver on the Sky Node, follow these steps:
-
Verify loaded kernel modules:
lsmod | grep nvidia; nvidia_vgpu_vfio 49152 9 nvidia 14393344 229 nvidia_vgpu_vfio mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio vfio 32768 6 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
-
Print the GPU device status with the nvidia-smi command. The output should be similar to
the following one:
nvidia-smi; +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.63 Driver Version: 470.63 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A40 Off | 00000000:84:00.0 Off | 0 | | 0% 46C P0 39W / 300W | 0MiB / 45634MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
-
Verify whether the driver has created the mdev_supported_types directory, for example:
ls /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/ nvidia-105 nvidia-106 nvidia-107 nvidia-108 nvidia-109 nvidia-110 [...]
Each Nvidia folder represents an mdev-supported device and contains information about the function, profile, and other relevant details. For this purpose the Q1, Q2, Q4, ... profiles are important which determine how we can slice the GPU card into vGPU mdev devices.
For example, if you have an Nvidia RTX 6000 card with 24GB of memory and use the Q1 profile, you can create 24 vGPU devices for 24 Instances-xvm machines. With the Q2 profile, you would create 12 vGPU devices, and with the Q4 profile, you would create 6 vGPU devices, and so on.
-
Once you have chosen your profile, you can configure the "/nvidia/vgpu.sh" script to
partition the Nvidia card and enable the Nexus system to detect the vGPU devices.
To do this, create a file named "/nvidia/vgpu.sh" and make it executable:
mkdir /nvidia; touch /nvidia/vgpu.sh; chmod +x /nvidia/vgpu.sh;
-
Now, use the vi or nano editor to add the following contents. In this example, we are
using the Q2 profile to partition the RTX 6000 into 12 mdev devices. Please note the
###Q2 tag.
#!/bin/bash ###Q2 echo "abcdefab-abcd-abcd-abcd-abcdefabcd01" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd02" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd03" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd04" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd05" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd06" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd07" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd08" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd09" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd10" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd11" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create; echo "abcdefab-abcd-abcd-abcd-abcdefabcd12" > /sys/bus/pci/devices/0000\:84:\00.0/mdev_supported_types/nvidia-376/create;
-
Edit the root user's crontab to add the following entry:
@reboot /nvidia/vgpu.sh;
-
Finally, reboot and deploy Nebula. When provisioning Instances-xvm machines, you'll
have the option to select one of the available vGPU devices for your instance.