-
Notifications
You must be signed in to change notification settings - Fork 3
Organize ReDeeM full data
chenweng1991 edited this page Jul 17, 2023
·
1 revision
If the full experimental protocol was followed (ReDeeM protocol), three modelities are generated. Here are some tips to organize the data into a convenient hierarchy for downstream analysis.
- Locate the folder(s) where FASTQ files are for one experiment
- Generate a Data.summary (a 4 column txt file saperated by comma) in the same folder to annotate each fastq file. It is needed for next step
fastq_file_name,sample_name,modelity,trim_parameter
- fastq_file_name (The original fastq file names)
- sample_name (The meaning name for each sample, usually refer to each 10X lane)
- modelity (ATAC, RNA and Mito. Note: Mito is optional, it is only for enriched mito library sequencing results. In configuration file, mitofq=True. If the mito and ATAC is mixed together in sequencing, i.e., no sample barcode to saperate, it is still fine. This fastq is just annotated as ATAC, but it will be used for mito analysis as well. In configuration file,set mitofq=True)
- trim_parameter (The number of bases from beginning to have, if no trim say "notrim", It depends on the length of actual sequencing, below are the aimed length)
- RNA (R1: 28nt, i7: 10nt, i5: 10nt, R2:90nt)
- ATAC(R1: 50nt, i7: 8nt, i5: 24nt, R2:50nt)
- ATAC(R1: 150nt, i7: 8nt, i5: 24nt, R2:150nt)
Below shows an example of Data.summary
L508_16_S1_L001_R1_001.fastq.gz,SRN,RNA,notrim
L508_16_S1_L001_R2_001.fastq.gz,SRN,RNA,notrim
L508_16_S1_L002_R1_001.fastq.gz,SRN,RNA,notrim
L508_16_S1_L002_R2_001.fastq.gz,SRN,RNA,notrim
L508_17_S2_L001_R1_001.fastq.gz,INF,RNA,notrim
L508_17_S2_L001_R2_001.fastq.gz,INF,RNA,notrim
L508_17_S2_L002_R1_001.fastq.gz,INF,RNA,notrim
L508_17_S2_L002_R2_001.fastq.gz,INF,RNA,notrim
L508_18_S3_L001_R1_001.fastq.gz,D100,RNA,notrim
L508_18_S3_L001_R2_001.fastq.gz,D100,RNA,notrim
L508_18_S3_L002_R1_001.fastq.gz,D100,RNA,notrim
L508_18_S3_L002_R2_001.fastq.gz,D100,RNA,notrim
L508_7_S1_L004_R1_001.fastq.gz,SRN,ATAC,50
L508_7_S1_L004_R2_001.fastq.gz,SRN,ATAC,notrim
L508_7_S1_L004_R3_001.fastq.gz,SRN,ATAC,50
L508_8_S2_L004_R1_001.fastq.gz,INF,ATAC,50
L508_8_S2_L004_R2_001.fastq.gz,INF,ATAC,notrim
L508_8_S2_L004_R3_001.fastq.gz,INF,ATAC,50
L508_9_S3_L004_R1_001.fastq.gz,D100,ATAC,50
L508_9_S3_L004_R2_001.fastq.gz,D100,ATAC,notrim
L508_9_S3_L004_R3_001.fastq.gz,D100,ATAC,50
- navigate to the folder you will work on the whole dataset
- Download REDEEM-V
git clone https://github.com/chenweng1991/REDEEM-V.git
- Assign path
REDEEM_V=ThePathToREDEEM-V #The loacation where the REDEEM-V is downloaded to
- Create a configuration file prepdata.ini including the following information
[Input]
fq_folders= Path_To_Your_FASTQ_Folder #You can add more folders separated by comma
[Parameters]
mitofq=False # Do you have mitochondrial specific fastq files?
parallel=True
[output]
out=Path_To_Your_Output # Should be a folder that already exists. It can be the current folder
- Run prep.py
python REDEEM-V/PrepData/prep.py prepdata.ini > PrepData.log
Note, this will submit several jobs of fastx to the backgroud, so please check the top or htop for the running jobs
- Finally, this will generate one folder for each sample, corresponding to that in thew Data.summary. In each folder it will look like below
INF/
├── CellRanger
├── FASTQ
│ ├── ATAC
│ ├── Mito
│ └── RNA
└── Mito
├── Enrich
└── WholeATAC
Note, the fastq files are organized in the filders under FASTQ. The cellranger will be run under CellRanger, and mito analysis will be run under Mito