Serverless image augmentation pipeline to generate additional training data from existing images for machine learning use-cases
This repository provides a deployable solution using Infrastructure-as-Code (IaC) templates to help you set up a serverless image augmentation pipeline to generate additional training data from existing images for machine learning use-cases.
One significant challenge in Machine Learning is obtaining high-quality training data across various applications and domains. The quality and quantity of training data directly influences model performance, generalization, robustness, bias mitigation, and interpretability. Insufficient or low-quality data can hinder a model's ability to learn effectively, generalize to unseen examples, handle diverse conditions, avoid biases, and provide interpretable results.
Image augmentation is a technique of altering the existing data to create more data for the model training process, by applying a variety of transformations to the original images.
Possible transformations include:
- Rotation: Rotating the image by a certain angle, introducing variations in orientation.
- Scaling: Resizing the image to a different scale, simulating different distances or perspectives.
- Translation: Shifting the image horizontally or vertically, mimicking changes in position.
- Flipping: Mirroring the image horizontally or vertically, creating reflections.
- Shearing: Tilting the image along its axis, introducing distortion.
- Zooming: Zooming in or out of the image, simulating changes in perspective.
- Brightness and Contrast Adjustment: Changing the brightness or contrast of the image, altering lighting conditions.
- Noise Injection: Adding random noise to the image, simulating variations in texture or artifacts.
However, image augmentation poses several challenges during implementation in terms of scalability, performance and cost-efficiency in maintaining the system - especially when training data are huge.
This repository showcases a deployable automated workflow to implement image augmentation mechanisms on a large set of training images using Serverless technologies on AWS.
The architecture diagram above showcases the serverless image augmentation pipeline.
The serverless solution is built using AWS Serverless Application Model (SAM), an open-source framework for building serverless applications. During deployment, AWS SAM transforms and expands the SAM syntax into AWS CloudFormation syntax, allowing you to easily deploy the entire solution using a few simple commands.
The solution provisions the following AWS resources:
- S3 Bucket 1 for raw training images
- S3 Bucket 2 for augmented training images
- Lambda function written in Python that contains the image transformation logic
- Event trigger to execute Lambda function when images are uploaded to S3 bucket 1
- Users upload original training images into the S3 bucket for raw images, either in batches or all-at-once.
- The upload operation triggers a Lambda function to be executed against the raw images, with Python code that contains the data transformation logic.
- Depending on the volume of raw images uploaded, Lambda is able to scale out to accommodate the load by provisioning multiple concurrent lambda execution environments for data processing.
- Once the image augmentation process is completed, the end result is being uploaded to another S3 bucket for consumption.
To effectively deploy the project using the AWS SAM Framework, you'd require the following prerequisites:
Follow the steps below to deploy the solution using AWS SAM:
-
Clone this project into your local environment
-
Navigate to the
image_augmentation_sam_app
folder, runsam build
-
Upon successful build, deploy the project with
sam deploy --guided
-
Adapt the image transformation logic to your specific use-case by editing the Python code in
image_augmentation_sam_app/image_augmentation_function/app.py
def generate_augmented_images(original_img, NUM_OF_IMAGES_GENERATED):
# Insert your image augmentation logic here
print("Image augmentation completed!")
return imgs_distorted
Tip: you can leverage external Python libraries such as Albumentations to implement your image augmentation logic
- AWS Serverless Application Model (SAM) Documentation
- Global sections of the AWS SAM Template
- Working with AWS Lambda and Lambda Layers in AWS SAM
To remove the solution from your AWS account, follow these steps:
- Navigate to the
image_augmentation_sam_app
folder, runsam delete
.
See CONTRIBUTING for more information.