There are 2 differences in configuration on:
- AMD or Intel based systems.
- ConnectX-3 or ConnectX-6 Mellanox cards.
This example uses:
- Proxmox VE 6.4 (iso file proxmox-ve_6.4-1.iso)
- Mellanox driver v.4.9 (file MLNX_OFED_LINUX-4.9-3.1.5.0-debian10.0-x86_64.iso)
NOTE: I'm not using currently availble Mellanox driver v.5.3 because support for ConnectX-3 was depricated.
Install Proxmox VE 6.4 on server.
In BIOS enable SR-IOV. This option can have differen names: AMD-Vi or VT-d or IOMMU or SR-IOV.
Disable Commercial Repo
sed -i "s/^deb/\#deb/" /etc/apt/sources.list.d/pve-enterprise.list
Add PVE Community Repo and upgrade
echo "deb http://download.proxmox.com/debian/pve $(grep "VERSION=" /etc/os-release | sed -n 's/.*(\(.*\)).*/\1/p') pve-no-subscription" > /etc/apt/sources.list.d/pve-no-enterprise.list
apt update
apt upgrade
reboot
Enable SR-IOV on Linux kernel
Modify file /etc/default/grub
for INTEL
GRUB_CMDLINE_LINUX_DEFAULT="quite intel_iommu=on iommu=pt"
for AMD
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream"
Update GRUB and reboot
update-grub
reboot
Verify dmesg output:
dmesg | grep IOMMU
Install packages needed by Mellanox package:
apt install pve-headers chrpath pkg-config graphviz quilt swig libltdl-dev tk flex automake dpatch gfortran libgfortran4 bison tcl m4 autotools-dev dkms autoconf debhelper ethtool
Download Mellanox driver and mount:
wget https://content.mellanox.com/ofed/MLNX_OFED-4.9-3.1.5.0/MLNX_OFED_LINUX-4.9-3.1.5.0-debian10.0-x86_64.iso
mount -o loop,ro MLNX_OFED_LINUX-4.9-3.1.5.0-debian10.0-x86_64.iso /media/
I'll not install all Mellanox packags in this example. Because it will require to delete Proxmox packages and install them again.
dpkg -i /media/DEBS/COMMON/mlnx-ofed-kernel-utils_4.9-OFED.4.9.3.1.5.1_amd64.deb
dpkg -i /media/DEBS/COMMON/mlnx-ofed-kernel-dkms_4.9-OFED.4.9.3.1.5.1_all.deb
dpkg -i /media/DEBS/COMMON/mft_4.15.1-9_amd64.deb
reboot
Update firmware settings
Enable 8 Virtual Functions for ConnectX-3 on Intel based system:
# lspci | grep -i mell
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
# mlxconfig -d 03:00.0 s SRIOV_EN=1
# mlxconfig -d 03:00.0 s NUM_OF_VFS=8
# mlxconfig -d 03:00.0 q | grep -e SRIOV_EN -e NUM_OF_VFS
SRIOV_EN True(1)
NUM_OF_VFS 8
Enable 16 Virtual Functions for ConnectX-6 on AMD based system:
# lspci | grep -i mell
81:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
# mlxconfig -d 81:00.0 s SRIOV_EN=1
# mlxconfig -d 81:00.0 s NUM_OF_VFS=16
# mlxconfig -d 81:00.0 q | grep -e SRIOV_EN -e NUM_OF_VFS
NUM_OF_VFS 16
SRIOV_EN True(1)
Only for ConnectX-3 - change mellanox driver options.
vi /etc/modprobe.d/mlx4_core.conf
options mlx4_core num_vfs=8 port_type_array=1,2 probe_vf=1
And reboot
Only for ConnectX-6 - change mellanox driver options
echo 16 > /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs
Manually for each Virtual Function create uniq IDs:
echo Follow > /sys/class/infiniband/mlx5_0/device/sriov/0/policy
echo 11:22:33:44:77:66:77:90 > /sys/class/infiniband/mlx5_0/device/sriov/0/node
echo 11:22:33:44:77:66:77:91 > /sys/class/infiniband/mlx5_0/device/sriov/0/port
echo 0000:3b:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:3b:00.1 > /sys/bus/pci/drivers/mlx5_core/bind
Or automate and run it as last service:
Create file /etc/systemd/system/mlnx_sriov.service
[Unit]
Description=Initialize 16 Virtual Function on mlx5_0
After=openibd.service
[Service]
Type=simple
RemainAfterExit=yes
ExecStartPre=/usr/bin/echo 16 > /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs
ExecStart=/usr/local/sbin/mlnx_sriov.sh mlx5_0 16
[Install]
WantedBy=multi-user.target
Enable service:
systemctl enable mlnx_sriov
Create script file /usr/local/sbin/mlnx_sriov.sh. And make it executable:
chmod u+x /usr/local/sbin/mlnx_sriov.sh
Script from here :
#!/bin/bash
# params
# - device name (e.g. mlx5_0)
# - number of virtual functions (e.g. 10)
configure_dev () {
local num_of_vfs="$2"
local devid=$(echo $1 | cut -d_ -f2)
local max_id="0"
local num_vfs_path="/sys/class/infiniband/$1/device/mlx5_num_vfs"
if [[ "$(cat $num_vfs_path)" -lt "$num_of_vfs" ]]; then
echo $num_of_vfs > /sys/class/infiniband/$1/device/mlx5_num_vfs
fi
let "max_id=$num_of_vfs-1"
for vf in $(seq 0 $max_id); do
echo ' ' ' ' Configuring virtual function $vf
# enable the virtual function
echo Follow > /sys/class/infiniband/$1/device/sriov/$vf/policy
# assign GUID to virtual card and port
let "first_part=$vf/100"
let "second_part=$vf-$first_part*100"
local ip_last_seg=$(hostname -i | cut -d. -f4)
let "ip_last_seg_first=$ip_last_seg/100"
let "ip_last_seg_second=$ip_last_seg-$ip_last_seg_first*100"
local guid_prefix="$(printf "%02d" $devid):22:33:$(printf "%02d" $first_part):$(printf "%02d" $second_part):$(printf "%02d" $ip_last_seg_first):$(printf "%02d" $ip_last_seg_second)"
echo "$guid_prefix:90" > /sys/class/infiniband/$1/device/sriov/$vf/node
echo "$guid_prefix:91" > /sys/class/infiniband/$1/device/sriov/$vf/port
# reload driver to make the change effective
pcie_addr="$(readlink -f /sys/class/infiniband/$1/device/virtfn${vf} | awk -F/ '{print $NF}')"
echo $pcie_addr > /sys/bus/pci/drivers/mlx5_core/unbind
echo $pcie_addr > /sys/bus/pci/drivers/mlx5_core/bind
done
}
# if specific devices are provided, only those will be configured
# otherwise, all devices supporting SR-IOV will be configured
if [[ "$#" -eq "0" ]]; then
echo Configuring SR-IOV for all supported devices
for dev in $(ls /sys/class/infiniband); do
totalvfs_path="/sys/class/infiniband/$dev/device/sriov_totalvfs"
if [[ -e "$totalvfs_path" && "$(cat $totalvfs_path)" -gt "0" ]]; then
echo ' ' Configuring for $dev $(cat $totalvfs_path)
#configure_dev $dev $(cat $totalvfs_path)
fi
done
elif ! (( $# % 2 )); then
echo Configuring SR-IOV for specified devices
while (( "$#" )); do
dev=$1
num_of_vfs=$2
echo ' ' Configuring for $dev
configure_dev $dev $num_of_vfs
shift 2
done
else
echo Please use the script in the following two ways:
echo ' ' ./mlnx.sh
echo ' ' ./mlnx.sh mlx5_0 10 mlx5_1 25
fi
Reboot
Final checks
Verify that 8 or 16 Virtual Functions available after reboot. Example output:
# lspci | grep Mellanox
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
03:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
. . .
. . .
Check that you got uniq IOMMU Groups for each Virtual Function with this script iommu_groups.sh. Script from here :
#!/bin/bash
shopt -s nullglob
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done;
Output example:
# bash iommu_groups.sh | grep Mellanox
IOMMU Group 108 81:00.0 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b]
IOMMU Group 131 81:00.1 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [15b3:101c]
IOMMU Group 132 81:00.2 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [15b3:101c]
. . .
Now you can add Virtual Function as PCI device to your Virtual Machines.