add sample annotation docs

gagneurlab · Oct 7, 2024 · d946b01 · d946b01
1 parent 307b561
commit d946b01
Showing 1 changed file with 29 additions and 1 deletion.
diff --git a/docs/source/prepare.rst b/docs/source/prepare.rst
@@ -169,7 +169,7 @@ Calling variants on RNA-seq data may be useful for researchers who do not have a
 The RNA variant calling process uses information from multiple samples (as designated by the ``groups`` variable) to improve the quality of the called variants. However, the larger the group size, the more costly the computation is in terms of time and resources. To prioritize accuracy, include many samples in each ``DROP_GROUP``, and to prioritize speed up computation, separate samples into many groups. Additionally, certain vcf and bed files must be included to further boost the quality of the called variants (refer to `files-to-download`_).
 
 =====================  =========  ================================================================================================================================================================================================  =========
-Parameter              Type       Description                                                                                                                                                                    Default/Examples
+Parameter              Type       Description                                                                                                                                                                                       Default/Examples
 =====================  =========  ================================================================================================================================================================================================  =========
 run                    boolean    If true, the module will be run. If false, it will be ignored.                                                                                                                                    ``true``
 groups                 list       Same as in aberrant expression.                                                                                                                                                                   ``# see aberrant expression example``
@@ -226,6 +226,34 @@ column order does not matter. Also, it does not matter where it is stored, as th
 specified in the config file. Here we provide some examples on how to deal with certain
 situations. For simplicity, we do not include all possible columns in the examples.
 
+=====================  =========  ================================================================================================================================================================================================  ==========================
+Parameter              Type       Description                                                                                                                                                                                       Default/Examples
+=====================  =========  ================================================================================================================================================================================================  ==========================
+RNA_ID                 character  Unique identifier from an RNA assay.                                                                                                                                                              ``sample1``
+RNA_BAM_FILE           character  Absolute path of the BAM file derived from RNA-seq. A BAM file can belong to only one RNA_ID and vice versa.                                                                                      ``path/to/sample1.bam``                                                                          
+DNA_VCF_FILE           character  Absolute path to the corresponding VCF. The DNA_ID has to match the ID inside the VCF file. In case a multisample VCF is used, write the file name for each sample.                               ``path/to/sample1.vcf``
+DNA_ID                 character  Unique identifier from a DNA assay.                                                                                                                                                               ``sample1``
+DROP_GROUP             list       The analysis group(s) that the RNA assay belongs to. Multiple groups must be separated by commas and no spaces (e.g. blood,WES,groupA). We recommend doing a different analysis for each tissue 
+                                  as gene expression and splicing can be tissue specific.                                                                                                                                           ``group1,group2``
+PAIRED_END             boolean    Either TRUE or FALSE, depending on whether the sample comes from paired-end RNA-seq or not.                                                                                                       ``TRUE``
+COUNT_MODE             character  Either ``Union``, ``IntersectionStrict`` or ``IntersectionNotEmpty``. Refer to the documentation of HTSeq for details.                                                                            ``IntersectionStrict``
+COUNT_OVERLAPS         character  Either TRUE or FALSE, depending on whether reads overlapping different regions are allowed and counted.                                                                                           ``TRUE``
+STRAND                 character  Either yes, no, or reverse: ``no`` means that the sequencing was not strand specific; ``yes`` that it was strand specific, and the first read in the pair is on the same strand as the feature 
+                                  and the second read on the opposite strand; and ``reverse`` that the sequencing is strand specific and the first read in the pair is on the opposite strand to the feature and the second read 
+                                  on the same strand.                                                                                                                                                                               ``no``
+HPO_TERMS              list       Comma-separated phenotypes encoded as HPO terms.                                                                                                                                                  ``HP:0001479, HP:0005591``
+GENE_COUNTS_FILE       character  (Only required for aberrant expression external samples) Location of external gene-level count matrix.                                                                                            ``/path/to/gene_counts/``
+GENE_ANNOTATION        character  (Only required for aberrant expression external samples) Gene annotation used to obtain the count matrix. Must correspond to the key of an entry in the geneAnnotation parameter of the config 
+                       file.                                                                                                                                                                                                        ``v29``
+GENOME                 character  (Optional) Either ``ncbi`` or ``ucsc`` indicating the reference genome assembly.                                                                                                                  ``ncbi``
+SPLICE_COUNTS_DIR      character  (Only required for aberrant splicing external samples) Location of external files required for aberrant splicing module as explained above.                                                       ``/path/to/splicing_dir/``
+SEX                    character  (Optional) Either ``m``, ``male``, ``f`` or ``female`` or ``unknown`` . When provided, sex matching algorithm will be run to match provided sex values to bam files and predict SEX value for 
+                                  unknown samples.                                                                                                                                                                                  ``m``
+TISSUE                 character  (Optional)                                                                                                                                                                                        ``BRAIN``
+DISEASE                character  (Optional)                                                                                                                                                                                        ``AML``
+=====================  =========  ================================================================================================================================================================================================  ==========================
+
+
 
 Using External Counts
 ++++++++++++++++++++++++++++++++++