Skip to content

Commit

Permalink
Refactor agent (#1)
Browse files Browse the repository at this point in the history
This PR:
- Refactors the code to be flexible enough to introduce:
  - additional repos and deployment keys
- additional buckets like the "off-perm" bucket for storing files that
no longer need to be in perm. This is to avoid filling up the temp
bucket when removing files from perm.
- Removes the loop over branches and uses additional arguments to `git
grep` to search over branches
- Uses GitPython to handle Git commands.
- Turns all code into Python.
- Uses typer and docker compose for a more convenient dev experience.
- Cleans up legacy dependencies like s3cmd.
- Updates the README file with development instructions.

Note that this is a breaking change because environment variables and
deployment key mounts are different.
  • Loading branch information
ben-z authored Jun 29, 2024
1 parent c622956 commit 442510c
Show file tree
Hide file tree
Showing 11 changed files with 363 additions and 253 deletions.
5 changes: 5 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Copy this file to .env and replace the values with your own
S3_TEMP_ACCESS_KEY=your_access_key
S3_TEMP_SECRET_KEY=your_secret_key
S3_PERM_ACCESS_KEY=your_access_key
S3_PERM_SECRET_KEY=your_secret_key
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

/tmp
33 changes: 8 additions & 25 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,31 +1,14 @@
FROM python:3.11-bookworm

# Install dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt --break-system-packages

# Copy files into container
WORKDIR /app
COPY /src/main.py /app
COPY /src/run.sh /app
COPY requirements.txt /app

# Copy the private SSH key from github provisioning to the container
RUN mkdir /root/.ssh/

# RUN echo -e "${SSH_DEPLOY_KEY}" > /root/.ssh/id_rsa
# RUN cat /root/.ssh/id_rsa
# COPY id_rsa /root/.ssh/id_rsa

# Set permissions for the SSH key
# RUN chmod 600 /root/.ssh/id_rsa

# Add the Git host to the list of known hosts
# RUN ssh-keyscan -t rsa github.com >> /root/.ssh/known_hosts

# Clone the repository using the deploy key
# RUN git clone git@github.com:WATonomous/infra-config.git
COPY /src /app

# Install s3cmd
# RUN pip install s3cmd
# Add github.com to known hosts
RUN mkdir /root/.ssh/ && ssh-keyscan -t rsa github.com >> /root/.ssh/known_hosts

# Run asset-management script
RUN chmod +x run.sh
CMD ["./run.sh"]
# CMD ["echo $PATH"]
CMD ["python", "agent.py", "run-agent"]
29 changes: 27 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,27 @@
# asset-management
Asset management for WATcloud
# WATcloud Asset Management System

This repo contains the asset management system for WATcloud.
Currently, only the agent implementation is in this repo.
Additional components, including the SDK, the S3 bucket configuration, and deployment code, reside in the internal monorepo [infra-config](https://github.com/WATonomous/infra-config).

## Useful Links

- [Asset Manager Frontend](https://cloud.watonomous.ca/docs/utilities/assets)

## Getting Started (Agent Development)

Copy the `.env.example` file to `.env` and fill in the necessary information.

Create `./tmp/deploy-keys` directory and place the required deploy keys in the directory. The list of deploy keys can be configured in `docker-compose.yml`.

Run the following commands to start the development environment:

```bash
docker compose up -d --build
```

Enter the container:

```bash
docker compose exec agent bash
```
36 changes: 36 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# This file is used to assist development

services:
agent:
build: .
entrypoint: [ "sleep", "infinity" ]
init: true
volumes:
- ./src:/app
- ./tmp/deploy-keys:/deploy-keys:ro
environment:
- |
AGENT_CONFIG={
"repos": {
"git@github.com:WATonomous/infra-config.git": {"deploy_key_path": "/deploy-keys/infra-config"}
},
"buckets": {
"temp": {
"endpoint": "https://rgw.watonomous.ca",
"bucket": "asset-temp",
"access_key_env_var": "S3_TEMP_ACCESS_KEY",
"secret_key_env_var": "S3_TEMP_SECRET_KEY"
},
"perm": {
"endpoint": "https://rgw.watonomous.ca",
"bucket": "asset-perm",
"access_key_env_var": "S3_PERM_ACCESS_KEY",
"secret_key_env_var": "S3_PERM_SECRET_KEY"
}
}
}
# These can be set in the .env file
- S3_TEMP_ACCESS_KEY=${S3_TEMP_ACCESS_KEY:?}
- S3_TEMP_SECRET_KEY=${S3_TEMP_SECRET_KEY:?}
- S3_PERM_ACCESS_KEY=${S3_PERM_ACCESS_KEY:?}
- S3_PERM_SECRET_KEY=${S3_PERM_SECRET_KEY:?}
5 changes: 4 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
boto3
boto3>=1.34.136,<2
GitPython>=3.1.43,<4
typer>=0.12.3,<1
requests>=2.32.3,<3
103 changes: 103 additions & 0 deletions src/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import logging
import os
from hashlib import sha256
from tempfile import TemporaryDirectory

from utils import app, clone_repos, flatten, get_watcloud_uris, get_bucket


@app.command()
def run_agent():
logging.info("Starting agent")

logging.info("Cloning repos")
repos = list(clone_repos())

logging.info("Extracting WATcloud URIs")
watcloud_uris = list(
# sorting to ensure consistent order for testing
sorted(flatten([get_watcloud_uris(repo.working_dir) for repo in repos]))
)

logging.info(f"Found {len(watcloud_uris)} WATcloud URIs:")
for uri in watcloud_uris:
logging.info(uri)

desired_perm_objects = set(uri.sha256 for uri in watcloud_uris)

temp_bucket = get_bucket("temp")
perm_bucket = get_bucket("perm")

temp_objects = set(obj.key for obj in temp_bucket.objects.all())
perm_objects = set(obj.key for obj in perm_bucket.objects.all())

logging.info(f"Found {len(temp_objects)} objects in temp bucket")
logging.info(f"Found {len(perm_objects)} objects in perm bucket")

errors = []

if not desired_perm_objects.issubset(temp_objects | perm_objects):
errors.append(
ValueError(
f"Cannot find the following objects in any bucket: {desired_perm_objects - temp_objects - perm_objects}"
)
)

# Objects that need to be copied to perm bucket
to_perm = desired_perm_objects - perm_objects
# Objects that need to be copied from temp bucket to perm bucket
temp_to_perm = to_perm & temp_objects
# Objects that need to be deleted from perm bucket
perm_to_temp = perm_objects - desired_perm_objects
# Objects that need to be deleted from the temp bucket (already exists in another bucket)
delete_from_temp = desired_perm_objects & temp_objects - temp_to_perm

logging.info(
f"{len(desired_perm_objects&perm_objects)}/{len(desired_perm_objects)} objects are already in the perm bucket"
)
logging.info(f"Copying {len(temp_to_perm)} object(s) from temp to perm bucket:")
for obj_key in temp_to_perm:
logging.info(obj_key)
logging.info(f"Copying {len(perm_to_temp)} object(s) from perm to temp bucket:")
for obj_key in perm_to_temp:
logging.info(obj_key)
logging.info(f"Deleting {len(delete_from_temp)} redundant object(s) from temp bucket:")
for obj_key in delete_from_temp:
logging.info(obj_key)

with TemporaryDirectory() as temp_dir:
for obj_key in temp_to_perm:
temp_bucket.download_file(obj_key, os.path.join(temp_dir, obj_key))
# Verify checksum
with open(os.path.join(temp_dir, obj_key), "rb") as f:
checksum = sha256(f.read()).hexdigest()
if checksum != obj_key:
errors.append(
ValueError(
f"Checksum mismatch for object {obj_key} in temp bucket! Not uploading to perm bucket."
)
)
continue

perm_bucket.upload_file(os.path.join(temp_dir, obj_key), obj_key)
temp_bucket.delete_objects(Delete={"Objects": [{"Key": obj_key}]})

for obj_key in perm_to_temp:
perm_bucket.download_file(obj_key, os.path.join(temp_dir, obj_key))
temp_bucket.upload_file(os.path.join(temp_dir, obj_key), obj_key)
perm_bucket.delete_objects(Delete={"Objects": [{"Key": obj_key}]})

for obj_key in delete_from_temp:
temp_bucket.delete_objects(Delete={"Objects": [{"Key": obj_key}]})

if errors:
logging.error("Encountered the following errors during execution:")
for error in errors:
logging.error(error)
raise ValueError("Encountered errors during agent execution.")

logging.info("Agent execution complete")


if __name__ == "__main__":
app()
Loading

0 comments on commit 442510c

Please sign in to comment.