Skip to content

Latest commit



268 lines (158 loc) · 8.49 KB

File metadata and controls

268 lines (158 loc) · 8.49 KB

Automatically format Gene ontology data for graphical representation with R

Prerequisites for MacOS

Install Xcode from the App Store (it is quite long)


Start Xcode to allow complete installation. The computer pass word will be required.


Prerequisites for Windows

Install UNIX terminal

  • Download on your PC the file TerminalUnixSetup.exe

    This is an executable that will install a UNIX terminal in your HOME directory

  • Execute TerminalUnixSetup.exe

    The UNIX terminal comes with all necessary tools. A tools directory is created in your HOME. It contains the script

Lancement du terminal

Select Git Bash in the Windows menu or click the icon on your desk.


Script for data preparation

Install file on MacOS


  • Save the file in Downloads

  • Open a UNIX terminal : Terminal


Create a tools directory that will contain the script

  • Execute the following command

    mkdir -p $HOME/tools

Install the file in the tools directory

  • Execute the following commandsto move the file and make it executable

    mv $HOME/Downloads/ $HOME/tools
    chmod +x $HOME/tools/
  • Close the terminal.

Install file on Windows

The file was already install by the executable TerminalUnixSetup.exe 😄

First execution

Open a UNIX terminal in the directory that contains the file to analyse

Start the script


If the terminal shows the message below you can jump to the "Next executions" part. [--help|--man|--version]

or [-m|--method] [-c|--correction] input_gene_list.tsv output_curated_gene_ontology.tsv

On the contrary, if you get the message below, that means that some Perl modules need to be installed :

Au moins un des modules Perl nécessaires n'est pas installé.
Pour utiliser ce script vous devez d'abord exécuter les commandes suivantes:

cpan App::cpanminus
cpanm WWW::Mechanize
cpanm JSON

Create or update.zshrc file

touch .zshrc .bashrc

Execute command:

cpan App::cpanminus

This command may ask questions.

Accept all default answer with Enter

⚠️ ​**==Open a new tab in the terminal with==** cmd+t

Then execute:

cpanm WWW::Mechanize

This step can take some times, be patient

Finally, execute :

cpanm JSON

Next executions

The script perform gene ontology analysis using PANTHER and REVIGO from a gene ID list according to the protocole descibed by Bonnot et al, 2019 . It formats the result so that it can be used by the scripts script_1plot.R and script_2plot.R which perform graphical representation of the ontology analysis.

PANTHER now, sort the results in a graphical order, so that each GO_id is link to the so called parent or main GO_id that characterize a class of ontology. This information is kept in the file output_go_ids_hierarchy.tsv and is used by the script script_1plot_hierarchy.R.

⚠️ The main GO_id whne alone don’t show up in the list but are took into account by de R script

The command is :

$HOME/tools/ [-m|--method] [-c|--correction] input_gene_list.tsv output_curated_gene_ontology.tsv output_go_ids_hierarchy.tsv

The script has 2 option : [-m|--method] and [-c|--correction] .

The value for the [-m|--method]are : biological_process cellular_component molecular_function

Default value is biological_process.

The value for [-c|--correction] are : fdr

​ bonferroni

Default value is fdr.

input_gene_list.tsv is the file that contains your data, let's call it myfile.tsv for the exemple.

The extention.tsv means "Tab-separated values". Any file with the data in tab separated columns is suitable to use with the script, no matter of the extention (a .txt file can be used).

output_curated_gene_ontology.tsv is the default name for the output file, you can change it..

Command can be :

$HOME/tools/ --method biological_process --correction fdr myfile.tsv my_output_file.tsv my_hierarchy_file.tsv

Or :

$HOME/tools/ -m biological_process -c fdr myfile.tsv my_output_file.tsv my_hierarchy_file.tsv

Get syntax information with : --man

You get the manual. To close it press q.

If you execute the command:

$HOME/tools/ --method biological_process --correction fdr myfile.tsv my_output_file.tsv my_hierarchy_file.tsv

Terminal shows the below message :

Step 1/7 Extract gene ID list from /Volumes/Disk_4To/Donnees_ARA2/Clusters/2w_5clusters/cluster1.txt
Step 2/7 Panther ontology analysis => /tmp/gene_ontology_analysis.txt, /tmp/gene_ontology_analysis.json
export result to /tmp/gene_ontology_analysis.txt
export result to /tmp/gene_ontology_analysis.json
Step 3/7 extract GO ids and FDR from /tmp/gene_ontology_analysis.txt
Step 4/7 REVIGO reduction => /tmp/gene_ontology_analysis_revigo.csv
export result
Step 5/7 Formating /Users/cecile/Downloads/output_curated_gene_ontology.tsv: filter panther result with revigo result
Step 6/7 keep track of GO ids hierarchy from /tmp/gene_ontology_analysis.json into /Users/cecile/Downloads/output_go_ids_hierarchy.tsv
Step 7/7 cleanup: remove temporay files from /tmp

⚠️ Access to REVIGO (step 4/7) can be long, even too long. If the script exit with an error of type: "request.Error POSTing read timeout...", re-start the same command as often as necessary.

Graphical representation of the results

The 2 scripts script_1plot.R and script_2plot.R allows to plot respectively one graph or 2 graph in the same figure.

The third script script_1plot_hierarchy.R take into account the hierarchical informations.

The scripts can be executed in R-studio. All scripts check package requirement and install only the missing ones.

Comments in the scripts explain each step.

At the end of the scripts, the command :

InfoSession <- devtools::session_info()

# Save session file
 write.table(InfoSession, file = "InfoSession.txt", 
                quote = FALSE, row.names = FALSE, sep = '\t')

Allows you to keep information about R session (version of R, R-studio, packages...) to comply with FAIR analysis guide line.



PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API Huaiyu Mi, Dustin Ebert, Anushya Muruganujan, Caitlin Mills, Laurent-Philippe Albou, Tremayne Mushayamaha and Paul D Thomas . Nucl. Acids Res. (2020) doi: 10.1093/nar/gkaa1106s.


Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800

Script R

A Simple Protocol for Informative Visualization of Enriched Gene Ontology Terms. T. Bonnot, MB. Gillard and DH. Nagel. Bio-101: e3429. DOI:10.21769/BioProtoc.3429

R Packages

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Wickham, Hadley, Jim Hester, and Winston Chang. 2021. Devtools: Tools to Make Developing r Packages Easier.

Wilke, Claus O. 2020. Cowplot: Streamlined Plot Theme and Plot Annotations for ’Ggplot2’.


Terese M. et Lecampion C.