Reddit Data Scraper 📊

A powerful Reddit data scraping tool with a user-friendly Streamlit interface. Extract posts and comments from subreddits or specific posts with ease.

🚀 Features

📱 User-friendly web interface
🔍 Scrape posts from any subreddit
💬 Extract comments from specific posts
📊 Export data to CSV
⏱️ Time-based filtering
🔄 Caching for better performance

🛠️ Tech Stack

Python - Core programming language
Streamlit - Web interface framework
PRAW - Reddit API wrapper
Pandas - Data manipulation and analysis
python-dotenv - Environment variable management

📋 Prerequisites

Python 3.9 or higher
Reddit API credentials (Get them here)

⚙️ Installation

Clone the repository:

git clone https://github.com/pakagronglb/reddit-scraper.git
cd reddit-scraper

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables: Create a .env file in the project root:

REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_user_agent

🚀 Usage

Start the application:

streamlit run main.py

Access the web interface at http://localhost:8501
Choose your scraping option:
- Subreddit Posts: Enter subreddit name, post limit, and time filter
- Specific Post: Enter the Reddit post URL
Click "Scrape" and download the results as CSV

🌐 Deployment

Streamlit Cloud

Push your code to GitHub
Visit share.streamlit.io
Connect your repository
Add your Reddit API credentials in Streamlit secrets

Heroku

Create a Heroku app:

heroku create your-app-name

Set environment variables:

heroku config:set REDDIT_CLIENT_ID=your_client_id
heroku config:set REDDIT_CLIENT_SECRET=your_client_secret
heroku config:set REDDIT_USER_AGENT=your_user_agent

Deploy:

git push heroku main

📝 Configuration

requirements.txt - Project dependencies
.env - Local environment variables
Procfile - Heroku deployment configuration
runtime.txt - Python runtime specification

🔒 Security

Never commit your .env file or .streamlit/secrets.toml
Use environment variables for sensitive data
Keep your Reddit API credentials secure

🤝 Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👏 Acknowledgments

📧 Contact

Your Name - @pakagronglb

Project Link: https://github.com/pakagronglb/reddit-scraper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Data Scraper 📊

🚀 Features

🛠️ Tech Stack

📋 Prerequisites

⚙️ Installation

🚀 Usage

🌐 Deployment

Streamlit Cloud

Heroku

📝 Configuration

🔒 Security

🤝 Contributing

📄 License

👏 Acknowledgments

📧 Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

License

pakagronglb/reddit-scraper

Folders and files

Latest commit

History

Repository files navigation

Reddit Data Scraper 📊

🚀 Features

🛠️ Tech Stack

📋 Prerequisites

⚙️ Installation

🚀 Usage

🌐 Deployment

Streamlit Cloud

Heroku

📝 Configuration

🔒 Security

🤝 Contributing

📄 License

👏 Acknowledgments

📧 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages