Skip to content

Team HackAttack:Our solution combines state-of-the-art technologies to enhance web accessibility for visually impaired users. Leveraging the Vision Transformer model, we achieve real-time image recognition, accurately identifying objects and text within images. The integration of the Tesseract OCR API enables dynamic text extraction from images, pr

Notifications You must be signed in to change notification settings

DarkKnightSgh/Dotslash5.0HackAttack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning Web Application

This web application allows users to generate captions and audio for images using a deep learning model.

Features

  • Upload images: Users can upload images through the web interface.
  • Caption generation: The application generates captions for the uploaded images using a pre-trained deep learning model.
  • Display captions: The generated captions are displayed on the web interface for users to see.
  • Audio Generation: Converts the generated captions to audio

Technologies Used

  • Frontend: React.js
  • Backend: Flask (Python)
  • Deep Learning Framework: TensorFlow/Keras, Tesseract OCR, ViT
  • Image Processing: PIL (Python Imaging Library)
  • Audio Generation: gTTS

Installation

  1. Clone the repository:
git clone <repository_url>
cd image-captioning-web-app
  1. Frontend dependencies
cd frontend
npm install
  1. Backend dependencies
cd ../backend
python app.py

Access the Web Application

Open your web browser and navigate to http://localhost:3000 to access the web application.

Usage

  1. Upload Image: Click on the "Choose File" button to select an image from your local filesystem.

  2. Generate Caption: After selecting an image, click on the "Upload" button to generate a caption for the image.

  3. View Caption: The generated caption will be displayed on the web page below the uploaded image.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue if you encounter any bugs or have suggestions for improvements.

About

Team HackAttack:Our solution combines state-of-the-art technologies to enhance web accessibility for visually impaired users. Leveraging the Vision Transformer model, we achieve real-time image recognition, accurately identifying objects and text within images. The integration of the Tesseract OCR API enables dynamic text extraction from images, pr

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •