Twitter Scraping

Twitter scrapping can be done in variety of ways. This repository works with one such solution.

Logic behind this tweet_scrape.py

Twitter is a social media platform that doesn't restrict "ethical" bot usage.
Twitter's Search query page: https://twitter.com/search-home gives us access to a whole lot of tweets belonging to a particular query. Let's make a search request and see the results!


Search query on twitter

Now we know that making a query with a word can be made using syntax https://twitter.com/search?q=
But how to get specific tweet?
- Looking closely on inspection using dev tools in browser that each tweet belongs to a css class - "tweet-text".


Search query on twitter

How can we do it programmatically?
- Selenium allows us to control a browser using simple programmable controls and even get the elements by class.
- This means we can scroll down and load more tweets (selenium.webdriver.common.keys.Keys.PAGE_DOWN), and on the same time access specific css classes (selenium.webdriver.<Chrome/Firefox Driver>.find_elements_by_class_name('tweet-text')).

Usage

usage: tweet_scrape.py [-h] [-dt DT] [-dp DP] [-src SRC] [-nst NST]
                       [-dest DEST]

optional arguments:
  -h, --help  show this help message and exit
  -dt DT      Driver type, supported drivers: 0 - Chrome, 1 - Firefox Default
              - 0
  -dp DP      Driver path, Default "" - in this case driver is assumed to be
              specified on path
  -src SRC    Source of file that contains words that needs to be in the
              tweets Default - "words.txt"
  -nst NST    Load tweets for nst seconds per topic and username. Default=50
  -dest DEST  Destination to which we need to store a tweet a line.
              Default="tweets.txt"

Json Parsing

json_parse.py uses twitter API credentials that should be defined in a file "credentials.json".
The script in context make an auth connection and calls home_timeline.json sub root of the API to get tweets. These tweets are parsed to tweets.csv

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
credentials.json		credentials.json
json_parse.py		json_parse.py
tweet_scrape.py		tweet_scrape.py
tweets.csv		tweets.csv
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Scraping

Logic behind this tweet_scrape.py

Usage

Json Parsing

About

Releases

Packages

Languages

piyush2896/Twitter-Scraper

Folders and files

Latest commit

History

Repository files navigation

Twitter Scraping

Logic behind this tweet_scrape.py

Usage

Json Parsing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages