Organize ReDeeM full data

If the full experimental protocol was followed (ReDeeM protocol), three modelities are generated. Here are some tips to organize the data into a convenient hierarchy for downstream analysis.

Locate the folder(s) where FASTQ files are for one experiment
Generate a Data.summary (a 4 column txt file saperated by comma) in the same folder to annotate each fastq file. It is needed for next step fastq_file_name,sample_name,modelity,trim_parameter

fastq_file_name (The original fastq file names)
sample_name (The meaning name for each sample, usually refer to each 10X lane)
modelity (ATAC, RNA and Mito. Note: Mito is optional, it is only for enriched mito library sequencing results. In configuration file, mitofq=True. If the mito and ATAC is mixed together in sequencing, i.e., no sample barcode to saperate, it is still fine. This fastq is just annotated as ATAC, but it will be used for mito analysis as well. In configuration file,set mitofq=True)
trim_parameter (The number of bases from beginning to have, if no trim say "notrim", It depends on the length of actual sequencing, below are the aimed length)

RNA (R1: 28nt, i7: 10nt, i5: 10nt, R2:90nt)

ATAC(R1: 50nt, i7: 8nt, i5: 24nt, R2:50nt)

ATAC(R1: 150nt, i7: 8nt, i5: 24nt, R2:150nt)

Below shows an example of Data.summary

L508_16_S1_L001_R1_001.fastq.gz,SRN,RNA,notrim
L508_16_S1_L001_R2_001.fastq.gz,SRN,RNA,notrim
L508_16_S1_L002_R1_001.fastq.gz,SRN,RNA,notrim
L508_16_S1_L002_R2_001.fastq.gz,SRN,RNA,notrim
L508_17_S2_L001_R1_001.fastq.gz,INF,RNA,notrim
L508_17_S2_L001_R2_001.fastq.gz,INF,RNA,notrim
L508_17_S2_L002_R1_001.fastq.gz,INF,RNA,notrim
L508_17_S2_L002_R2_001.fastq.gz,INF,RNA,notrim
L508_18_S3_L001_R1_001.fastq.gz,D100,RNA,notrim
L508_18_S3_L001_R2_001.fastq.gz,D100,RNA,notrim
L508_18_S3_L002_R1_001.fastq.gz,D100,RNA,notrim
L508_18_S3_L002_R2_001.fastq.gz,D100,RNA,notrim
L508_7_S1_L004_R1_001.fastq.gz,SRN,ATAC,50
L508_7_S1_L004_R2_001.fastq.gz,SRN,ATAC,notrim
L508_7_S1_L004_R3_001.fastq.gz,SRN,ATAC,50
L508_8_S2_L004_R1_001.fastq.gz,INF,ATAC,50
L508_8_S2_L004_R2_001.fastq.gz,INF,ATAC,notrim
L508_8_S2_L004_R3_001.fastq.gz,INF,ATAC,50
L508_9_S3_L004_R1_001.fastq.gz,D100,ATAC,50
L508_9_S3_L004_R2_001.fastq.gz,D100,ATAC,notrim
L508_9_S3_L004_R3_001.fastq.gz,D100,ATAC,50

navigate to the folder you will work on the whole dataset
Download REDEEM-V git clone https://github.com/chenweng1991/REDEEM-V.git
Assign path REDEEM_V=ThePathToREDEEM-V #The loacation where the REDEEM-V is downloaded to
Create a configuration file prepdata.ini including the following information

[Input]
fq_folders= Path_To_Your_FASTQ_Folder  #You can add more folders separated by comma
[Parameters]
mitofq=False # Do you have mitochondrial specific fastq files?
parallel=True
[output]
out=Path_To_Your_Output # Should be a folder that already exists. It can be the current folder

Run prep.py

python REDEEM-V/PrepData/prep.py prepdata.ini > PrepData.log

Note, this will submit several jobs of fastx to the backgroud, so please check the top or htop for the running jobs

Finally, this will generate one folder for each sample, corresponding to that in thew Data.summary. In each folder it will look like below

INF/
├── CellRanger
├── FASTQ
│   ├── ATAC
│   ├── Mito
│   └── RNA
└── Mito
    ├── Enrich
    └── WholeATAC

Note, the fastq files are organized in the filders under FASTQ. The cellranger will be run under CellRanger, and mito analysis will be run under Mito

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organize ReDeeM full data

Clone this wiki locally