This web application allows users to generate captions and audio for images using a deep learning model.
- Upload images: Users can upload images through the web interface.
- Caption generation: The application generates captions for the uploaded images using a pre-trained deep learning model.
- Display captions: The generated captions are displayed on the web interface for users to see.
- Audio Generation: Converts the generated captions to audio
- Frontend: React.js
- Backend: Flask (Python)
- Deep Learning Framework: TensorFlow/Keras, Tesseract OCR, ViT
- Image Processing: PIL (Python Imaging Library)
- Audio Generation: gTTS
- Clone the repository:
git clone <repository_url>
cd image-captioning-web-app
- Frontend dependencies
cd frontend
npm install
- Backend dependencies
cd ../backend
python app.py
Open your web browser and navigate to http://localhost:3000 to access the web application.
-
Upload Image: Click on the "Choose File" button to select an image from your local filesystem.
-
Generate Caption: After selecting an image, click on the "Upload" button to generate a caption for the image.
-
View Caption: The generated caption will be displayed on the web page below the uploaded image.
Contributions are welcome! Please feel free to submit a pull request or open an issue if you encounter any bugs or have suggestions for improvements.