Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate EL kernel version alignment #43

Open
kmittman opened this issue Apr 6, 2023 · 0 comments
Open

Investigate EL kernel version alignment #43

kmittman opened this issue Apr 6, 2023 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on

Comments

@kmittman
Copy link
Collaborator

kmittman commented Apr 6, 2023

Investigate EL kernel version alignment

NVIDIA provided precompiled kmod RPMs only officially support RHEL kernels. These are built and tested on Red Hat Enterprise Linux for that specific kernel release. This blog post goes into more detail.

A frequently asked question is regarding technical reasons for why other RHEL-like kernels would not be compatible. The primary blocker is that in order to avoid any potential ABI incompatibility, the precompiled design requires a exact match of the kernel version string.

Let's look at some kernel-core data for

  • Red Hat Enterprise Linux
  • Rocky Linux
  • Oracle Linux
  • Alma Linux

Pre-requisites

Rocky Linux and Alma Linux both archive packages from previous y-stream releases, so first enable those repos.

rockylinux:8 define old_releases=('8.6' '8.5' '8.4' '8.3')
rockylinux:9 define old_releases=('9.0')

rockyvault="https://dl.rockylinux.org/vault/rocky"
for ver in ${old_releases[@]}; do
    repo="$rockyvault/$ver/BaseOS/x86_64/os"
    echo -e "[Rocky-Vault-$ver]\nname=Rocky-Vault-$ver\nbaseurl=$repo/\ngpgcheck=1\nenabled=1\ngpgkey=$repo/RPM-GPG-KEY-rockyofficial" | tee /etc/yum.repos.d/Rocky-Vault-$ver.repo
done

almalinux:8 define old_releases=('8.6' '8.5' '8.4' '8.3')
almalinux:9 define old_releases=('9.0')

almavault="https://repo.almalinux.org/vault"
almagpg="https://repo.almalinux.org/almalinux/RPM-GPG-KEY-AlmaLinux-8"
for ver in ${old_releases[@]}; do
    repo="$almavault/$ver/BaseOS/x86_64/os"
    echo -e "[Alma-Vault-$ver]\nname=Alma-Vault-$ver\nbaseurl=$repo/\ngpgcheck=1\nenabled=1\ngpgkey=$almagpg" | tee /etc/yum.repos.d/Alma-Vault-$ver.repo
done

List kernel packages

dnf list kernel-core --showduplicates

# Filter output
dnf list kernel-core --showduplicates | awk '{print $2}' | grep "\.el" | sort -uV

Plot EL8 kernels

RHEL8 precompiled status page

-------------A--B--C--D------------------A--B--C--D---
| 1|    8.0 [+][ ][+][ ]    |30|        [+][+][+][+]
| 2|        [+][ ][+][ ]    |31|        [+][+][+][+]
| 3|        [+][ ][+][ ]    |32|        [+][+][+][+]
| 4|        [+][ ][+][ ]    |33|        [+][+][+][+]
| 5|        [+][ ][+][ ]    |34|        [+][+][+][+]
| 6|        [+][ ][+][ ]    |35|        [+][+][+][+]
| 7|        [+][ ][+][ ]    |36|    8.5 [+][+][+][+]
| 8|        [+][ ][+][ ]    |37|        [+][+][+][+]
| 9|    8.1 [+][ ][+][ ]    |38|        [+][+][+][+]
|10|        [+][ ][+][ ]    |39|        [+][+][+][+]
|11|        [+][ ][+][ ]    |40|        [+][+][+][+]
|12|        [+][ ][+][ ]    |41|        [+][+][+][+]
|13|        [+][ ][+][ ]    |42|    8.6 [+][+][+][+]
|14|        [+][ ][+][ ]    |43|        [+][+][ ][+]
|15|    8.2 [+][ ][+][ ]    |44|        [ ][ ][+][ ]
|16|        [+][ ][+][ ]    |45|        [+][+][ ][+]
|17|        [+][ ][+][ ]    |46|        [ ][+][ ][ ]
|18|        [+][ ][+][ ]    |47|        [ ][ ][+][ ]
|19|        [+][ ][+][ ]    |48|        [+][+][ ][+]
|20|        [+][ ][+][ ]    |49|        [ ][ ][+][ ]
|21|        [+][ ][+][ ]    |50|        [+][+][ ][+]
|22|    8.3 [+][ ][+][+]    |51|        [ ][ ][+][ ]
|23|        [+][ ][+][ ]    |52|        [+][+][ ][+]
|24|        [+][ ][+][ ]    |53|        [ ][ ][+][ ]
|25|        [+][ ][+][ ]    |54|    8.7 [+][+][+][+]
|26|        [+][ ][+][+]    |55|        [+][+][+][+]
|27|        [+][+][+][+]    |56|        [+][+][+][+]
|28|    8.4 [+][ ][+][+]    |57|        [+][+][+][+]
|29|        [+][+][+][+]                    
------------------------------------------------------
A. Red Hat Enterprise Linux
B. Rocky Linux
C. Oracle Linux
D. Alma Linux

Plot EL9 kernels

RHEL9 precompiled status page

-------------A--B--C--D--------------
| 1|    9.0 [+][ ][ ][+]
| 2|        [ ][ ][+][ ]
| 3|        [+][ ][ ][+]
| 4|        [ ][ ][+][ ]
| 5|        [+][ ][ ][+]
| 6|        [ ][ ][+][ ]
| 7|        [+][ ][ ][+]
| 8|        [ ][ ][+][ ]
| 9|        [+][+][ ][+]
|10|        [ ][ ][+][ ]
|11|    9.1 [+][ ][+][+]
|12|        [+][ ][+][+]
|13|        [+][ ][+][+]
|14|        [+][+][+][+]
-------------------------------------
A. Red Hat Enterprise Linux
B. Rocky Linux
C. Oracle Linux
D. Alma Linux

Summary

While there is some overlap with kernel versions, it is often the case where there is not overlap (missing kernels, versioned differently, etc.). This results in non-deterministic install behavior - depending on when the dnf transaction occurs.

To explain another way, for example, let's assume the kernels are aligned today and the precompiled install succeeds on machine A (RHEL-like) — however next week there is a new kernel released, it may not succeed on machine B (RHEL-like) because there is not a compatible kmod package available.

As such, attempts to use the precompiled modular streams provided in the CUDA repository on non-RHEL distros results in a degraded user experience and is not supported by NVIDIA.

Instead sysadmins are encouraged to build DIY precompiled kmod RPMs using the instructions provided in this git repo, otherwise the DKMS modular streams may be used.

@kmittman kmittman added documentation Improvements or additions to documentation wontfix This will not be worked on labels Apr 6, 2023
@kmittman kmittman self-assigned this Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant