A Federated Index of Virus Metadata and Hyperdata in Public Repositories
Status: Extensible DRAFT API
https://test.pypi.org/project/viral-index/
Requirements:
python3
- A Google Cloud Platform (GCP) account. Please see GCP's getting started guide if you are new to GCP.
- Install the
viral-index
module
python3 -m venv .env
source .env/bin/activate
pip install -q --extra-index-url https://test.pypi.org/simple/ viral-index
- Configure BigQuery access credentials
Usage of this API requires access to GCP BigQuery. To set up authentication, please follow the instructions in the section "Setting up authentication" in this page. Note: when prompted to save the JSON file with your key downloads, we suggest we save it to a filename without spaces. In that way it's easier to set the GOOGLE_APPLICATION_CREDENTIALS
environment variable :)
N.B.: You may be charged for using this API. Please learn more about BigQuery pricing.
- Write your code to access the index!
>>> from viral_index.client import ViralIndex
>>> viral_client = ViralIndex()
>>> cdd_id = 165276
>>> runs = viral_client.get_SRAs_where_CDD_is_found(cdd_id)
>>> print([r for r in runs])
['SRR2187433', 'SRR533343', 'ERR1915143']
>>>
>>> pig_taxid = 9823
>>> viruses = viral_client.get_viruses_for_host_taxonomy(pig_taxid)
>>> if viruses is not None:
for virus in viruses:
print(virus)
['Rotavirus C', 36427]
['Porcine rubulavirus', 53179]
['Porcine associated porprismacovirus 7', 2170123]
['Porcine enterovirus b/BEL/15V010', 2017720]
[...]
>>>
>>> spacer_seqs=viral_client.get_spacer_seqs(1915496)
>>> print([s for s in spacer_seqs])
[['112', 'CAGCCATCCGCGACGCCACGACAGCGGCCGAGAGTGT', 'GCF_002508705', 'GTDB'], ['1', 'AATCAGCCCGTCGGGGTAGCCAGGGACGCCCTCCA', 'GCF_002508705', 'GTDB'],
[...]
>>> spacer_seq='CACGAGTGCGAAGCATCCAATCCATATGACTACAT'
>>> spacer_tax_ids=viral_client.get_taxid_from_spacer_seq(str(spacer_seq))
>>> print([t for t in spacer_tax_ids])
[['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915496], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915507], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915502], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915504], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915506], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915510], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915499], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915512], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915500], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915495], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915498], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915505], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915508], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915503]]
Additional sample code can be found in python/sample-viral-index-access.py.
-
If you get an error like the one below, it's likely that you don't have Bigquery configured properly for your project. See step 2 in developer instructions above.
Access Denied: Project {YOUR_PROJECT_HERE}: User does not have bigquery.jobs.create permission in project {YOUR_PROJECT_HERE}
make
: Runsudo apt-get -y -m update && sudo apt-get install -y make
or equivalent command for your system.python3
- GCP SDK
- Check out the source code:
git clone https://github.com/NCBI-Codeathons/The_Virus_Index.git
- Set up the python virtual environment:
make .env
- Enable python virtualenv:
source .env/bin/activate
- Set up the GCP credentials:
export GOOGLE_APPLICATION_CREDENTIALS=${PATH_TO_CREDENTIALS_JSON_FILE}
. - Write code that uses
viral_index.client.ViralIndex
Automated testing is available in TravisCI.
The Makefile
has several targets that may be helpful:
.env
: initializes the python virtual environment.check_bq
: checks command line access to BigQuery (tool availability and authentication).check_python_syntax
: checks the syntax of python scripts in this repo.check_taxadb
: checks that taxadb was properly installed.check_api
: checks that the API can be retrieved from PyPI, runs demo script.init_taxadb
: Initializes and configures taxadb (needed for the taxonomy utilities).deploy
: Builds a tarball for distribution and uploads it to test.pypi.org (requirestwine
, contact @christiam).setup_bigquery_authentication
: Sample command lines to set up authentication for BigQuery.
The module's version is stored in setup.py.
(Assumes bash and linux)
- Download and set up taxadb: Run
make init_taxadb
(this will take about 2-3 minutes). - Initialize python virtual environment: Run
source .env/bin/activate
- Set environment variable:
export TAXADB_CONFIG=${PWD}/etc/taxadb.cfg
python/name2taxid.py
: takes scientific names on standard input or input files (spelling is significant) and outputs NCBI taxonomy IDs.python/taxid2lineage.py
: takes NCBI taxonomy IDs on standard input (or input files) and outputs the lineage for that given taxid.
- Review data in BigQuery and integrate it better with the API