This is a Python-based application designed to identify and group similar images within a specified directory. It utilizes image hashing, specifically average hashing, to compare images efficiently. The tool offers the flexibility to scan directories at a single level or recursively, depending on the user's needs. The similarity threshold can be adjusted to control the sensitivity of the comparison process. The results are presented in a JSON format, making it easy to understand and process the grouped images further.
The primary purpose of the project is to address the challenge of identifying and organizing similar images within large datasets. Manually comparing and categorizing images can be a laborious and impractical task, especially when dealing with a significant number of images. By employing image hashing techniques, the project aims to streamline this process, making it more efficient and accurate. The tool can be particularly useful for image deduplication, image organization, and content-based image retrieval tasks, offering a valuable solution for individuals and organizations working with extensive image collections.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
To get started with the Similarity Finder project, you need to have the following software installed on your local machine:
- Git - Download & Install Git
- Python (version 3.6 or higher) - Download & Install Python
Here's a step-by-step guide to help you set up a development environment for the Similarity Finder project:
- Clone the repository
git clone https://github.com/blackmonk13/similar_images.git
- Navigate to the project directory
cd similar_images
- Set up a Python virtual environment (optional but recommended)
python3 -m venv env
- Activate the virtual environment
On Windows:
.\env\Scripts\activate
On macOS and Linux:
source env/bin/activate
- Install the dependencies from the
requirements.txt
file
pip install -r requirements.txt
- Run the project using the command mentioned in the Usage section.
To use the Similarity Finder, run the following command in your terminal:
python -m similar_images -t 1 -r -o json -f output.json path/to/your/image/directory
Replace path/to/your/image/directory with the path to the directory containing images you want to analyze. The -t flag sets the similarity threshold (default is 10), the -r flag enables or disables recursive directory scanning, the -o flag sets the output format (default is json), and the -f flag specifies the path to the output file.
The application will output a JSON or CSV file containing groups of similar images found in the specified directory.
You can find the latest wheel (.whl) file in the release page. To deploy the project on a live system:
Run the following command in your terminal:
curl -s https://api.github.com/repos/blackmonk13/similar_images/releases/latest | jq -r '.assets[] | select(.name | endswith(".whl")) | .browser_download_url' | xargs pip install
Open PowerShell and run the following command:
(Invoke-WebRequest -Uri "https://api.github.com/repos/blackmonk13/similar_images/releases/latest" -UseBasicParsing | ConvertFrom-Json).assets | Where-Object { $_.name -like "*whl" } | ForEach-Object { pip install $_.browser_download_url }
This command will install the project and its dependencies, allowing you to run the similar_images
command directly from your terminal or Command Prompt without any hassle.
- Python - Programming Language
- Pillow - Image Processing Library
- imagehash - Image Hashing Library
- ThreadPoolExecutor - Concurrency Library for Python
- @blackmonk13 - Idea & Initial work
See also the list of contributors who participated in this project.
We welcome contributions from the community! If you'd like to contribute to Banner, please follow these steps:
- Fork the repository and create a new branch for your changes.
- Commit your changes and push them to your fork.
- Open a pull request against the main branch of the original repository.
Please make sure that your contributions adhere to the project's coding style and guidelines.
Before submitting a pull request, please make sure that:
- Your changes do not introduce any new bugs or regressions.
- Your code is well-documented and easy to understand.