Query_genome.fasta Query_lincRNAs.fasta Query_species_abbreviation(four_letter_code) Subject_species_abbreviation(four_letter_code) Query_known_genes.gff
# Example
Subject#1_genome.fasta Query_lincRNAs.fasta Query_species_abbreviation(four_letter_code) Subject#1_species_abbreviation(four_letter_code) Subject#1_known_genes.gff Subject#1_Known_lincRNAs.fasta
Subject#2_genome.fasta Query_lincRNAs.fasta Query_species_abbreviation(four_letter_code) Subject#2_species_abbreviation(four_letter_code) Subject#2_known_genes.gff
Subject#3_genome.fasta Query_lincRNAs.fasta Query_species_abbreviation(four_letter_code) Subject#3_species_abbreviation(four_letter_code)
Subject#4_genome.fasta Query_lincRNAs.fasta Query_species_abbreviation(four_letter_code) Subject#4_species_abbreviation(four_letter_code) Subject#4_Known_lincRNAs.fasta
-
The above represents the columns that should be included in the
BLASTing file
, which is necessary for Evolinc-II to run. Evolinc-II runs through the comparisons one line at a time until it has finished, and then it builds families out of any sequences it found that pass certain criteria. -
Therefore, it is essential to create this file correctly and to have all of these files located in the same folder.
-
At a bare minimum, the first four columns are necessary. The last two column (genome annotation files or known lincRNAs) can be included in any combination. Tabs should separate all columns. If a known lincRNA file is present but no genome annotation file, have two tabs (to indicate a blank for genome annotation).
Below is an example with real species names, performing an analysis on Arabidopsis thaliana (as query) and its close relatives Arabidopsis lyrata, Capsella rubella, and Leavenworthia alabamica
Atha_genome.fasta Atha_lincRNAs.fasta Atha Atha Atha_genes.gff
Alyr_genome.fasta Atha_lincRNAs.fasta Atha Alyr Alyr_genes.gff Alyr_known_lincRNAs.fasta
Crub_genome.fasta Atha_lincRNAs.fasta Atha Crub Crub_known_lincRNAs.fasta
Lala_genome.fasta Atha_lincRNAs.fasta Atha Lala
-
Do not end a row with a tab, but instead end it with a new line character (Enter or Return). For PC users, make sure that it is not a carriage return (some linux systems treat carriage returns as if they are not there). For PC users, we recommend using Notepad++ with "View->symbols->All Characters" selected. New line characters are noted by "LF" whereas carriage returns are denoted by "CR". You must have "LF" at the end of a row. Mac/Linux users just need to make sure they press enter at the end of a row.
-
Note that there always has to be a query lincRNA search against the query genome step.
-
Also note the four letter abbreviation scheme used. For simplicity, a four letter naming scheme is enforced.
-
The four letters chosen can be arbitrary as long as they are unique, but we recommend using the first letter of the genus followed by the first three letters from the species. For example, Arabidopsis thaliana is Atha.