Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump based CUDA image to ubuntu24.04 #1166

Merged
merged 23 commits into from
Dec 4, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d602ff3
Test docker hub ubuntu24.04
DwarKapex Nov 21, 2024
7a93390
Adobt build for ubuntu-24.04
DwarKapex Nov 22, 2024
3f4efa5
Fix build for pax, t5x, gemma
DwarKapex Nov 22, 2024
b2eab65
Use master branch of TF-text
DwarKapex Nov 22, 2024
71ad68b
Fix gemma TF-text urls
DwarKapex Nov 22, 2024
0b452c4
Fix T5x build
DwarKapex Nov 25, 2024
62e7ed7
Address comments
DwarKapex Nov 26, 2024
beb4f82
Fix gemma build
DwarKapex Nov 27, 2024
3c2ec97
Clone airio
DwarKapex Nov 27, 2024
d279373
Merge remote-tracking branch 'origin/main' into vkozlov/move-to-ubunt…
DwarKapex Nov 27, 2024
173ddc5
Update maxtext docker
DwarKapex Nov 27, 2024
92996e3
Uninstall several packages and add PIP_BREAK_SYSTEM_PACKAGES=1 env var
DwarKapex Dec 2, 2024
8993deb
Uninstall several packages and add PIP_BREAK_SYSTEM_PACKAGES=1 env var
DwarKapex Dec 2, 2024
8c10287
Edit remove packages list
DwarKapex Dec 2, 2024
c75c825
Edit remove packages list
DwarKapex Dec 3, 2024
8468c9f
Edit remove packages list
DwarKapex Dec 3, 2024
008b3fc
[skip ci] Resurect amd64/arm64 dockerfiles
DwarKapex Dec 3, 2024
d633578
[skip ci] Resurect amd64/arm64 dockerfiles: fix whitespace error
DwarKapex Dec 3, 2024
81b50cc
[skip ci] Resurect amd64/arm64 dockerfiles: fix whitespace error
DwarKapex Dec 3, 2024
14c52be
Merge branch 'main' into vkozlov/move-to-ubuntu24.04
DwarKapex Dec 3, 2024
96c16a9
Add comment for pip install pip-23.3.1
DwarKapex Dec 3, 2024
8461c7a
Merge branch 'vkozlov/move-to-ubuntu24.04' of github.com:NVIDIA/JAX-T…
DwarKapex Dec 3, 2024
2c1ee0d
remove arch-specific Dockerfiles and add pointer to utopian versions
yhtang Dec 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions .github/container/Dockerfile.base
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1-labs
ARG BASE_IMAGE=nvidia/cuda:12.6.2-devel-ubuntu22.04
ARG BASE_IMAGE=nvidia/cuda:12.6.2-devel-ubuntu24.04
ARG GIT_USER_NAME="JAX Toolbox"
ARG GIT_USER_EMAIL=jax@nvidia.com
ARG CLANG_VERSION=18
Expand Down Expand Up @@ -53,12 +53,14 @@ apt_packages=(
liblzma-dev
python-is-python3
python3-pip
python3-venv
rsync
vim
wget
jq
# llvm.sh
lsb-release software-properties-common
lsb-release
software-properties-common
# GCP autoconfig
pciutils hwloc bind9-host
)
Expand Down Expand Up @@ -127,7 +129,11 @@ git apply </opt/pip/pip-vcs-equivalency.patch
git add -u
git commit -m 'Adds JAX_TOOLBOX_VCS_EQUIVALENCY as a trigger to treat all github VCS installs for a package as equivalent. The spec of the last encountered version will be used'
EOF
RUN pip install --upgrade --no-cache-dir -e /opt/pip pip-tools && rm -rf ~/.cache/*
# Create a system-wide venv for the pip-installed world inside the containers
RUN python -m venv --prompt jax /opt/venv && /opt/venv/bin/pip install --ignore-installed --no-cache-dir -e /opt/pip pip-tools
# Make sure `python` refers to the venv version
ENV PATH=/opt/venv/bin:${PATH}
yhtang marked this conversation as resolved.
Show resolved Hide resolved


###############################################################################
## Install TCPx
Expand Down
4 changes: 2 additions & 2 deletions .github/container/Dockerfile.equinox
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ ARG SRC_PATH_EQUINOX=/opt/equinox
## Download source and add auxiliary scripts
###############################################################################

FROM ${BASE_IMAGE} as mealkit
FROM ${BASE_IMAGE} AS mealkit
ARG URLREF_EQUINOX
ARG SRC_PATH_EQUINOX

Expand All @@ -22,6 +22,6 @@ EOF
## Install accumulated packages from the base image and the previous stage
###############################################################################

FROM mealkit as final
FROM mealkit AS final

RUN pip-finalize.sh
1 change: 0 additions & 1 deletion .github/container/Dockerfile.jax
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ RUN --mount=type=ssh \
--mount=type=secret,id=SSH_KNOWN_HOSTS,target=/root/.ssh/known_hosts \
<<"EOF" bash -ex
git-clone.sh ${URLREF_JAX} ${SRC_PATH_JAX}
sed 's/^numpy.*/numpy<2.0.0/' ${SRC_PATH_JAX}/build/requirements.in
git-clone.sh ${URLREF_XLA} ${SRC_PATH_XLA}
EOF

Expand Down
4 changes: 2 additions & 2 deletions .github/container/Dockerfile.levanter
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ARG SRC_PATH_HALIAX=/opt/haliax
## Download source and add auxiliary scripts
###############################################################################

FROM ${BASE_IMAGE} as mealkit
FROM ${BASE_IMAGE} AS mealkit
ARG URLREF_LEVANTER
ARG URLREF_HALIAX
ARG SRC_PATH_LEVANTER
Expand All @@ -34,6 +34,6 @@ COPY levanter-cache-warn.sh /opt/nvidia/entrypoint.d/
## Install accumulated packages from the base image and the previous stage
###############################################################################

FROM mealkit as final
FROM mealkit AS final

RUN pip-finalize.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

ARG BASE_IMAGE=ghcr.io/nvidia/jax-mealkit:jax
ARG URLREF_MAXTEXT=https://github.com/google/maxtext.git#main
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#v2.13.0
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master
DwarKapex marked this conversation as resolved.
Show resolved Hide resolved
ARG SRC_PATH_MAXTEXT=/opt/maxtext
ARG SRC_PATH_TFTEXT=/opt/tensorflow-text

Expand All @@ -20,15 +20,16 @@ FROM ${BASE_IMAGE} as wheel-builder
FROM wheel-builder as tftext-builder
ARG URLREF_TFTEXT
ARG SRC_PATH_TFTEXT

RUN pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.18.0
RUN git-clone.sh ${URLREF_TFTEXT} ${SRC_PATH_TFTEXT}
RUN <<"EOF" bash -exu -o pipefail
pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.13.0
git-clone.sh ${URLREF_TFTEXT} ${SRC_PATH_TFTEXT}
cd ${SRC_PATH_TFTEXT}

# The tftext build script queries GitHub, but these requests are sometimes
# throttled by GH, resulting in a corrupted uri for tensorflow in WORKSPACE.
# A workaround (needs to be updated when the tensorflow version changes):
sed -i "s/# Update TF dependency to installed tensorflow/commit_sha=1cb1a030a62b169d90d34c747ab9b09f332bf905/" oss_scripts/prepare_tf_dep.sh
sed -i "s/# Update TF dependency to installed tensorflow./commit_slug=6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/" oss_scripts/prepare_tf_dep.sh

# Newer versions of LLVM make lld's --undefined-version check of lld is strict
# by default (https://reviews.llvm.org/D135402), but the tftext build seems to
Expand All @@ -38,14 +39,13 @@ echo "write_to_bazelrc \"build --linkopt='-Wl,--undefined-version'\"" >> oss_scr
./oss_scripts/run_build.sh
EOF


###############################################################################
## Download source and add auxiliary scripts
###############################################################################

FROM ${BASE_IMAGE} as mealkit
ARG URLREF_MAXTEXT
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#v2.13.0
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master
ARG SRC_PATH_MAXTEXT
ARG SRC_PATH_TFTEXT=/opt/tensorflow-text

Expand All @@ -56,6 +56,16 @@ RUN echo "tensorflow-text @ file://$(ls /opt/tensorflow_text*.whl)" >> /opt/pip-
RUN <<"EOF" bash -ex
git-clone.sh ${URLREF_MAXTEXT} ${SRC_PATH_MAXTEXT}
echo "-r ${SRC_PATH_MAXTEXT}/requirements.txt" >> /opt/pip-tools.d/requirements-maxtext.in
for pattern in \
"s|@git+https://github.com/mlperf/logging.git||g" \
"s|absl-py|absl-py==2.1.0|g" \
"s|protobuf==3.20.3|protobuf>=3.19.0|g" \
"s|tensorflow-datasets|tensorflow-datasets>=4.8.0|g" \
"s|@git+https://github.com/google/pathways-utils.git||g" \
DwarKapex marked this conversation as resolved.
Show resolved Hide resolved
; do
sed -i "${pattern}" ${SRC_PATH_MAXTEXT}/requirements.txt;
done
echo "tensorflow-metadata>=1.15.0" >> ${SRC_PATH_MAXTEXT}/requirements.txt
EOF

###############################################################################
Expand All @@ -72,4 +82,4 @@ FROM mealkit as final

RUN pip-finalize.sh

WORKDIR ${SRC_PATH_MAXTEXT}
WORKDIR ${SRC_PATH_MAXTEXT}
yhtang marked this conversation as resolved.
Show resolved Hide resolved
34 changes: 0 additions & 34 deletions .github/container/Dockerfile.maxtext.amd64

This file was deleted.

4 changes: 2 additions & 2 deletions .github/container/Dockerfile.mjx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ ARG SRC_PATH_L2R=/opt/language-to-reward-2023
## Download source and add auxiliary scripts
###############################################################################

FROM ${BASE_IMAGE} as mealkit
FROM ${BASE_IMAGE} AS mealkit
ARG URLREF_MUJOCO
ARG URLREF_MUJOCO_MPC
ARG URLREF_L2R
Expand Down Expand Up @@ -49,6 +49,6 @@ EOF
## Install accumulated packages from the base image and the previous stage
###############################################################################

FROM mealkit as final
FROM mealkit AS final

RUN pip-finalize.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
ARG BASE_IMAGE=ghcr.io/nvidia/jax-mealkit:jax
ARG URLREF_PAXML=https://github.com/google/paxml.git#main
ARG URLREF_PRAXIS=https://github.com/google/praxis.git#main
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#v2.13.0
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master
yhtang marked this conversation as resolved.
Show resolved Hide resolved
ARG URLREF_LINGVO=https://github.com/tensorflow/lingvo.git#master
ARG SRC_PATH_PAXML=/opt/paxml
ARG SRC_PATH_PRAXIS=/opt/praxis
Expand All @@ -25,20 +25,20 @@ FROM wheel-builder as tftext-builder
ARG URLREF_TFTEXT
ARG SRC_PATH_TFTEXT
RUN <<"EOF" bash -exu -o pipefail
pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.13.0
pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.18.0
git-clone.sh ${URLREF_TFTEXT} ${SRC_PATH_TFTEXT}
cd ${SRC_PATH_TFTEXT}

DwarKapex marked this conversation as resolved.
Show resolved Hide resolved
# The tftext build script queries GitHub, but these requests are sometimes
# throttled by GH, resulting in a corrupted uri for tensorflow in WORKSPACE.
# A workaround (needs to be updated when the tensorflow version changes):
sed -i "s/# Update TF dependency to installed tensorflow/commit_sha=1cb1a030a62b169d90d34c747ab9b09f332bf905/" oss_scripts/prepare_tf_dep.sh

sed -i "s/# Update TF dependency to installed tensorflow./commit_slug=6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/" oss_scripts/prepare_tf_dep.sh
# Newer versions of LLVM make lld's --undefined-version check of lld is strict
# by default (https://reviews.llvm.org/D135402), but the tftext build seems to
# rely on this behavior.
echo "write_to_bazelrc \"build --linkopt='-Wl,--undefined-version'\"" >> oss_scripts/configure.sh

./oss_scripts/run_build.sh
EOF

Expand All @@ -50,20 +50,21 @@ FROM wheel-builder as lingvo-builder
ARG URLREF_LINGVO
ARG SRC_PATH_TFTEXT
ARG SRC_PATH_LINGVO

# Preserve the version of tensorflow-text
COPY --from=tftext-builder /opt/manifest.d/git-clone.yaml /opt/manifest.d/git-clone.yaml
COPY --from=tftext-builder ${SRC_PATH_TFTEXT}/tensorflow_text*.whl /opt/

RUN <<"EOF" bash -exu -o pipefail
git-clone.sh ${URLREF_LINGVO} ${SRC_PATH_LINGVO}
EOF


ENV USE_BAZEL_VERSION=7.1.2

# build lingvo
RUN <<"EOF" bash -exu -o pipefail
git-clone.sh ${URLREF_LINGVO} ${SRC_PATH_LINGVO}
pushd ${SRC_PATH_LINGVO}

CPU_ARCH="$(dpkg --print-architecture)"
if [[ "${CPU_ARCH}" == "arm64" ]]; then

# Use aarch distribution of protobufs
patch -p1 <<"EOFINNER"
diff --git a/lingvo/repo.bzl b/lingvo/repo.bzl
Expand All @@ -84,13 +85,32 @@ index ce65822d2..d9c0277aa 100644
def icu():
EOFINNER

pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.13.0 /opt/tensorflow_text*.whl
sed -i 's/tensorflow=/#tensorflow=/' docker/dev.requirements.txt
sed -i 's/tensorflow-text=/#tensorflow-text=/' docker/dev.requirements.txt
sed -i 's/dataclasses=/#dataclasses=/' docker/dev.requirements.txt
fi

pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.18.0 /opt/tensorflow_text*.whl
for pattern in \
"s|tensorflow=|#tensorflow=|g" \
"s|tensorflow-text=|#tensorflow-text=|g" \
"s|dataclasses=|#dataclasses=|g" \
"s|==.*||g" \
; do
sed -i "${pattern}" ${SRC_PATH_LINGVO}/docker/dev.requirements.txt
done
for pattern in \
"s|tensorflow-text~=2.13.0|tensorflow-text~=2.18.0|g" \
"s|tensorflow~=2.13.0|tensorflow~=2.18.0|g" \
"s|python_requires='>=3.8,<3.11'|python_requires='>=3.8,<3.13'|" \
; do
sed -i "${pattern}" ${SRC_PATH_LINGVO}/pip_package/setup.py;
done
yhtang marked this conversation as resolved.
Show resolved Hide resolved
pip install -r docker/dev.requirements.txt

# Some tests are flaky right now, so we skip running the tests.
BUILD_ARCH="x86_64"
if [[ "$CPU_ARCH" == "arm64" ]]; then
BUILD_ARCH="aarch64";
fi
sed -i 's/manylinux2014_x86_64/manylinux_2_38_'"${BUILD_ARCH}"'/' pip_package/build.sh
SKIP_TESTS=1 PYTHON_MINOR_VERSION=$(python --version | cut -d ' ' -f 2 | cut -d '.' -f 2) pip_package/build.sh
EOF

Expand All @@ -108,15 +128,14 @@ ARG SRC_PATH_TFTEXT

# Preserve version information of tensorflow-text and lingvo
COPY --from=lingvo-builder /opt/manifest.d/git-clone.yaml /opt/manifest.d/git-clone.yaml
COPY --from=lingvo-builder /tmp/lingvo/dist/lingvo*linux_aarch64.whl /opt/
COPY --from=lingvo-builder /tmp/lingvo/dist/lingvo*-linux*.whl /opt/
RUN echo "lingvo @ file://$(ls /opt/lingvo*.whl)" >> /opt/pip-tools.d/requirements-paxml.in

COPY --from=tftext-builder ${SRC_PATH_TFTEXT}/tensorflow_text*.whl /opt/
RUN echo "tensorflow-text @ file://$(ls /opt/tensorflow_text*.whl)" >> /opt/pip-tools.d/requirements-paxml.in

# paxml + praxis
RUN <<"EOF" bash -ex
echo "tensorflow==2.13.0" >> /opt/pip-tools.d/requirements-paxml.in
echo "tensorflow_datasets==4.9.2" >> /opt/pip-tools.d/requirements-paxml.in
echo "auditwheel" >> /opt/pip-tools.d/requirements-paxml.in

Expand All @@ -131,11 +150,14 @@ for src in ${SRC_PATH_PAXML} ${SRC_PATH_PRAXIS}; do
for pattern in \
"s| @ git+https://github.com/google/flax||g" \
"s| @ git+https://github.com/google/jax||g" \
"s| @ git+https://github.com/google/fiddle||g" \
"s|^tensorflow|#tensorflow|" \
"s|^lingvo|#lingvo|" \
"s|^scikit-learn|#scikit-learn|" \
"s|^protobuf|#protobuf|" \
"s|^numpy|#numpy|" \
"s|^orbax-checkpoint|#orbax-checkpoint|" \
"s| @ git+https://github.com/google/CommonLoopUtils||g" \
; do
sed -i "${pattern}" */pip_package/requirements.txt requirements.in
done
Expand All @@ -148,6 +170,7 @@ for src in ${SRC_PATH_PAXML} ${SRC_PATH_PRAXIS}; do
fi
popd
done
sed -i 's/pysimdjson==[0-9.]*/pysimdjson/' ${SRC_PATH_PAXML}/setup.py
EOF

ADD test-pax.sh /usr/local/bin
Expand Down
53 changes: 0 additions & 53 deletions .github/container/Dockerfile.pax.amd64

This file was deleted.

Loading
Loading