Use Scrapy to realize the web crawler
This project uses the Scrapy framework to realize the crawler.
- Python2.7、setuptools、zope.interface
- Twisted、Scrapy
- lxml、BeautifulSoup4、win32py、pyOpenSSL
- douban: the main program, use to run the crawler
//Scrapy crawl ItemName
Scrapy crawl bookItem
Scrapy crawl movieItem
- fetch_proxies: use to fetch the usalbe IP proxies
- This is designed to avoid the web forbid the crawler running
//finally catch the values in the file "proxies.json"
//then paste this into the douban/douban-setting.py instead of the contents of the PROXIES
python fetch_free_proxies.py
- If you can't run the crawler, please run the "fetch_free_proxies.py" at first
- How to avoid getting banned document