Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nightly] Create Nightly Pipeline, make docker-nightly-publish.yml & integration.yml more modular #2628

Merged
merged 65 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
33f14f6
comment out something doesn't work for fork
HappyAmazonian Dec 4, 2024
39a9050
use my iam role
HappyAmazonian Dec 4, 2024
c8233a3
add checkout back
HappyAmazonian Dec 4, 2024
b96628b
use my repo
HappyAmazonian Dec 4, 2024
85b15e5
remov ecreate runner
HappyAmazonian Dec 4, 2024
ec5bf65
fix-tag
HappyAmazonian Dec 4, 2024
7505dff
make everything push to ECR
HappyAmazonian Dec 4, 2024
1f17c17
add mode in tag
HappyAmazonian Dec 5, 2024
3a74892
add condition to push
HappyAmazonian Dec 5, 2024
f63a7fe
remove blank lin
HappyAmazonian Dec 5, 2024
8d43fb3
add call integration workflow
HappyAmazonian Dec 6, 2024
2674540
remove push for testing
HappyAmazonian Dec 6, 2024
4133b2e
fix push condition
HappyAmazonian Dec 6, 2024
850a69a
fix repo name
HappyAmazonian Dec 6, 2024
f182852
change repo for testing in djl
HappyAmazonian Dec 6, 2024
6f01ae3
fix role
HappyAmazonian Dec 6, 2024
5ace20e
fix neuron image, disable pytest capture
HappyAmazonian Dec 6, 2024
e974d65
add docker credential
HappyAmazonian Dec 7, 2024
8de85fa
change env
HappyAmazonian Dec 7, 2024
9808a93
fix neuron docker tag
HappyAmazonian Dec 7, 2024
14185c5
add back aarch build
HappyAmazonian Dec 9, 2024
9f4a91a
fix for PR
HappyAmazonian Dec 9, 2024
3dde6de
merge
HappyAmazonian Dec 9, 2024
0adabd0
add back docker push
HappyAmazonian Dec 9, 2024
7a75c66
fix format
HappyAmazonian Dec 9, 2024
c40f9f5
add the missing tag step
HappyAmazonian Dec 10, 2024
9fb5574
reorg
HappyAmazonian Dec 12, 2024
0bc8a25
fix region var
HappyAmazonian Dec 12, 2024
b02ca24
test split push
HappyAmazonian Dec 12, 2024
a0a1b16
fix cli
HappyAmazonian Dec 13, 2024
7dfb9c3
fix syntax + add time
HappyAmazonian Dec 13, 2024
86bd0d9
fix syntax error
HappyAmazonian Dec 13, 2024
c28e392
remove test for faster test
HappyAmazonian Dec 13, 2024
b1b0871
build,test
HappyAmazonian Dec 14, 2024
277483a
fix matrix value
HappyAmazonian Dec 14, 2024
60cd26d
fix push .github/workflows/docker-nightly-publish.yml
HappyAmazonian Dec 14, 2024
45092e4
fix uri tag
HappyAmazonian Dec 14, 2024
66466fd
fix typo
HappyAmazonian Dec 14, 2024
1dd747b
fix aws permisioon
HappyAmazonian Dec 14, 2024
49322cb
add tests back
HappyAmazonian Dec 16, 2024
0374887
fix typo
HappyAmazonian Dec 16, 2024
086065b
add override image suffix in tag
HappyAmazonian Dec 16, 2024
03f6e17
fix neuron image
HappyAmazonian Dec 16, 2024
186e602
fix condition in neuron ut
HappyAmazonian Dec 16, 2024
5d4ac70
fix neuron uri
HappyAmazonian Dec 16, 2024
3949dcc
fix format
HappyAmazonian Dec 16, 2024
bcd3555
clean
HappyAmazonian Dec 16, 2024
5a9f70c
fix based on comment
HappyAmazonian Dec 16, 2024
f396e74
update default arch value
HappyAmazonian Dec 16, 2024
33ab630
rebase on other pr
HappyAmazonian Dec 17, 2024
74b72fa
test docker publish
HappyAmazonian Dec 17, 2024
53b0d1c
fix permission
HappyAmazonian Dec 17, 2024
26636d7
use sha
HappyAmazonian Dec 17, 2024
468a260
improve scripts
HappyAmazonian Dec 17, 2024
b5eaf03
fix for loop
HappyAmazonian Dec 17, 2024
ea3b518
improve code quality
HappyAmazonian Dec 17, 2024
0faa058
fix path
HappyAmazonian Dec 17, 2024
ab7e10b
fix multiple typo
HappyAmazonian Dec 17, 2024
5761450
merge
HappyAmazonian Dec 18, 2024
ebc7f89
use credential only for ubuntu
HappyAmazonian Dec 18, 2024
cf56d95
enable docker push
HappyAmazonian Dec 18, 2024
1d6b0e4
fix naming
HappyAmazonian Dec 18, 2024
583e8a2
echo
HappyAmazonian Dec 18, 2024
d772470
Merge branch 'master' into nightly-integ-remodel
HappyAmazonian Dec 18, 2024
ce98891
log image under tests for integration tests
siddvenk Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 49 additions & 16 deletions .github/workflows/docker-nightly-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ on:
description: 'release/nightly/temp, default is nightly'
required: true
default: 'nightly'
skip_nightly_integ_test:
description: 'buld and push the nightly without running integ test'
required: false
default: false
type: boolean
workflow_call:
inputs:
mode:
Expand All @@ -21,6 +26,10 @@ permissions:
id-token: write
contents: read

env:
AWS_ECR_REPO: "185921645874.dkr.ecr.us-east-1.amazonaws.com/djl-ci-temp"
DOCKER_HUB_REPO: "deepjavalibrary/djl-serving"

jobs:
nightly-build:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -70,29 +79,25 @@ jobs:
run: |
./gradlew --refresh-dependencies :serving:dockerDeb -Psnapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this as a command under Build temp docker image? I don't think we need a separate step for this only

- name: Build and push nightly docker image
if: ${{ inputs.mode == '' || inputs.mode == 'nightly' }}
siddvenk marked this conversation as resolved.
Show resolved Hide resolved
working-directory: serving/docker
run: |
export NIGHTLY="-nightly"
docker compose build --no-cache \
--build-arg djl_version=${{ env.DJL_VERSION }}-SNAPSHOT \
--build-arg djl_serving_version=${{ env.SERVING_VERSION }}-SNAPSHOT \
${{ matrix.arch }}
docker compose push ${{ matrix.arch }}
- name: Build and push temp image
if: ${{ inputs.mode == 'temp' }}
- name: Tag and push temp image to ECR repo
if: ${{ !inputs.skip_nightly_integ_test && inputs.mode == 'nightly' || inputs.mode == 'temp'}}
working-directory: serving/docker
run: |
export NIGHTLY="-nightly"
docker compose build --no-cache \
--build-arg djl_version=${{ env.DJL_VERSION }}-SNAPSHOT \
--build-arg djl_serving_version=${{ env.SERVING_VERSION }}-SNAPSHOT \
${{ matrix.arch }}
repo="185921645874.dkr.ecr.us-east-1.amazonaws.com/djl-ci-temp"
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $repo
tempTag="$repo:${{ matrix.arch }}-${GITHUB_SHA}"
docker tag deepjavalibrary/djl-serving:${{ matrix.arch }}-nightly $tempTag
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ${{env.AWS_ECR_REPO}}
HappyAmazonian marked this conversation as resolved.
Show resolved Hide resolved
tempTag="${{ env.AWS_ECR_REPO }}:${{ matrix.arch }}-${GITHUB_SHA}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in new temp images in ECR nightly. Have we configured the retention policy in this ECR repo? We should discuss what that retention period looks like

Copy link
Contributor Author

@HappyAmazonian HappyAmazonian Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone created the ECR repo already with 7-day retention policy for the temp mode.

Copy link
Contributor Author

@HappyAmazonian HappyAmazonian Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The price for ECR storage is $0.09 per GB per month. ~$90 per TB per month.
If we want to make it more frugal. I believe a retention policy of 2days should suffice the requirement since tests only last a few hours

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really worried about the cost here. I am worried about the time wasted in saving and retrieving images from ECR. ECR is a fairly slow service.

docker tag ${{ env.DOCKER_HUB_REPO }}:${{ matrix.arch }}-nightly $tempTag
docker push $tempTag
- name: Push nightly to dockerhub
if: ${{ inputs.skip_nightly_integ_test && inputs.mode == 'nightly' }}
run: |
docker push ${{ env.DOCKER_HUB_REPO }}:${{ matrix.arch }}-nightly
- name: Build and push release docker image
if: ${{ inputs.mode == 'release' }}
working-directory: serving/docker
Expand All @@ -111,6 +116,34 @@ jobs:
docker tag deepjavalibrary/djl-serving:${{ env.SERVING_VERSION }} deepjavalibrary/djl-serving:latest
docker push deepjavalibrary/djl-serving:latest

run-integration-tests:
if: ${{ inputs.mode == 'nightly' && !inputs.skip_integ_test }}
needs: [nightly-build, nightly-aarch64]
uses: ./.github/workflows/integration.yml
secrets: inherit
with:
djl-version: temp

push-to-dockerhub:
runs-on: ubuntu-latest
needs: [run-integration-tests]
strategy:
matrix:
arch: [ cpu, cpu-full, pytorch-inf2, pytorch-gpu, tensorrt-llm, lmi ]
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're a binary success/fail situation here for all containers. Integ tests for all containers must pass for any new nightly publish. We should think through what it takes to make this more granular, which may involve splitting up the current integration.yml file into a few separate worfklows, or adding specific steps here that run tests on a container by container basis

- name: Login to Docker
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Pull Image from ECR and Push it to Dockerhub
run: |
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ${{env.AWS_ECR_REPO}}
tempTag=" ${{env.AWS_ECR_REPO}}:${{ matrix.arch }}-${GITHUB_SHA}"
docker pull $tempTag
docker tag $tempTag ${{ env.DOCKER_HUB_REPO }}:${{ matrix.arch }}-nightly
docker push ${{ env.DOCKER_HUB_REPO }}:${{ matrix.arch }}-nightly

create-runner:
runs-on: [ self-hosted, scheduler ]
steps:
Expand Down Expand Up @@ -186,7 +219,7 @@ jobs:
aarch64
docker compose push aarch64
- name: Build and push temp image
if: ${{ inputs.mode == 'temp' }}
if: ${{ inputs.mode == 'temp' || inputs.mode == 'nightly' }}
working-directory: serving/docker
run: |
export NIGHTLY="-nightly"
Expand All @@ -195,8 +228,8 @@ jobs:
--build-arg djl_serving_version=${{ env.SERVING_VERSION }}-SNAPSHOT \
aarch64
repo="185921645874.dkr.ecr.us-east-1.amazonaws.com/djl-ci-temp"
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $repo
tempTag="$repo:aarch64-${GITHUB_SHA}"
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ${{env.AWS_ECR_REPO}}
tempTag="${{env.AWS_ECR_REPO}}:aarch64-${GITHUB_SHA}"
docker tag deepjavalibrary/djl-serving:aarch64-nightly $tempTag
docker push $tempTag
- name: Build and push release docker image
Expand Down
40 changes: 33 additions & 7 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,15 @@ on:
description: 'The released version of DJL'
required: false
default: ''
schedule:
- cron: '0 15 * * *'

workflow_call:
inputs:
djl-version:
description: 'The released version of DJL'
required: false
type: string
default: ''
env:
AWS_ECR_REPO: "185921645874.dkr.ecr.us-east-1.amazonaws.com/djl-ci-temp"

jobs:
create-runners:
Expand Down Expand Up @@ -175,12 +181,18 @@ jobs:
wget https://publish.djl.ai/awscurl/awscurl
chmod +x awscurl
mkdir outputs
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::185921645874:role/github-actions-djl-serving
aws-region: us-east-1
- name: Test
working-directory: tests/integration
env:
TEST_DJL_VERSION: ${{ inputs.djl-version }}
run: |
python -m pytest -k ${{ matrix.test.test }} tests.py
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ${{env.AWS_ECR_REPO}}
python -m pytest -s -k ${{ matrix.test.test }} tests.py
- name: Cleanup
working-directory: tests/integration
run: |
Expand Down Expand Up @@ -224,11 +236,25 @@ jobs:
python-version: '3.10.x'
- name: Install pip dependencies
run: pip3 install requests numpy pillow wheel
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::185921645874:role/github-actions-djl-serving
aws-region: us-east-1
- name: Build container name
run: ./serving/docker/scripts/docker_name_builder.sh pytorch-inf2 ${{ github.event.inputs.djl-version }}
run: |
./serving/docker/scripts/docker_name_builder.sh pytorch-inf2 ${{ github.event.inputs.djl-version }}
- name: Download models and dockers
run: |
docker pull deepjavalibrary/djl-serving:$DJLSERVING_DOCKER_TAG
if [ ${{ github.event.inputs.djl-version }} == "temp" ]; then
DOCKER_IMAGE_URI="185921645874.dkr.ecr.us-east-1.amazonaws.com/djl-ci-temp:pytorch-inf2-${GITHUB_SHA}"
else
DOCKER_IMAGE_URI="deepjavalibrary/djl-serving:$DJLSERVING_DOCKER_TAG"
fi
echo "DOCKER_IMAGE_URI=$DOCKER_IMAGE_URI" >>$GITHUB_ENV
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ${{env.AWS_ECR_REPO}}
HappyAmazonian marked this conversation as resolved.
Show resolved Hide resolved
echo $DOCKER_IMAGE_URI
docker pull $DOCKER_IMAGE_URI
- name: Run djl_python unit/integration tests on container
working-directory: engines/python/setup
run: |
Expand All @@ -241,7 +267,7 @@ jobs:
-v $PWD/:/opt/ml/model/ \
-w /opt/ml/model \
--device=/dev/neuron0:/dev/neuron0 \
deepjavalibrary/djl-serving:$DJLSERVING_DOCKER_TAG \
$DOCKER_IMAGE_URI \
/bin/bash -c "'pip install /opt/ml/model/dist/*.whl pytest' && \
pytest djl_python/tests/neuron_test_scripts/ | tee logs/results.log"

Expand Down
Loading