Background

The accurate measure of fetal fraction is important to assure the results of noninvasive prenatal testing. However, measuring fetal fraction could require a huge amount of data and additional costs. Therefore, this study proposes an alternative method of measuring fetal fraction under a limited sample size and low sequencing reads. The adaptive machine learning algorithms customized to each laboratory’s environment were used to measure fetal fraction. The pregnant women with female fetuses were tested to exclude the bias caused by training data of the women carrying male fetuses. The accuracy of fetal DNA fraction prediction was enhanced by increasing the training sample size. When trained with 1,000 samples (males) and tested with 45 samples (females), the optimal bin sizes using the read count and read size features were 300 kb and 800 kb, respectively. Comparing the new 300 kb bin to the 50 kb bin used by SeqFF at 4,000–5,000 training samples, the correlation is approximately 3-5% higher in the 300 kb bin. We have proposed an effective and tailored method to measure fetal fraction available in individual laboratories at limited sample collecting conditions and relatively low-coverage sequencing data.

for more search our paper

`User Manual`

Required Files in the Folders

`The bin info files in RC and RL folders` are like both rc_bin... and rl_bininfo... without headers.

Sample1.Fastq.sam.bam.sort.bam.rmdup.bam.sam.rl,19.05270566 Sample2.Fastq.sam.bam.sort.bam.rmdup.bam.sam.rl,17.65618359

Read Count (RC) and Read Length (RL) files must be in .rc and .rl formats with headers as the following.

"BIN","CHR","END","COUNT","GC"
chr1_0,chr1,300000,583,0.430783082518
chr1_1,chr1,600000,474,0.444418530072

"BIN","CHR","END","RRL"
chr1_0,chr1,800000,0.255024255024
chr1_1,chr1,1600000,0.262870514821

Require dependency

`Python 3`

Pandas
numpy
scikit

Method to insall python library pip install <library name> e.g. pip install pandas.

`R`

doParallel
glmnet
Matrix
MASS
methods

install.packages(c('Matrix', 'glmnet', 'MASS', 'foreach', 'doParallel', 'MASS'))

If any error rises, please check the specified format of files, the installed packages, and the path for Python and R in your system.

Preparing data for Training & Testing

Please check the file formats, locations, bin info file names, and headers.

Keep all sam files inside the sam folder like e.g. TheragenGenomecare/sam/.
Run python code python bam_rl_read.py.
Convert sam files to Read Count (rc) and Read Length (rl) format files. This may take long time according to the input data size.

After the rc and rl files are ready, please keep all rc and rl files in the training and testing folders with corresponding bininfo files.

`Training the Model`

python GenomomFF_training.py in the terminal where GenomomFF_training.py is located.

This may take a few minutes according to your data size.

For 1000 sets of data, it took around 4 minutes in our system. After running GenomomFF_training successfully, this will create the rc and rl parameter files inside the training folder, which are used for testing the data.

`Testing the Data`

Please check the bininfo files, file formats, and locations inside the testing folder.

Run testing Code

python GenomomFF_testing.py

You can see the csv file with correlation values saved inside the testing folder at last.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
samfile		samfile
testing		testing
training		training
GenomomFF_testing.py		GenomomFF_testing.py
GenomomFF_training.py		GenomomFF_training.py
LICENSE		LICENSE
README.md		README.md
bam_rl_rc.py		bam_rl_rc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

`User Manual`

Required Files in the Folders

`The bin info files in RC and RL folders` are like both rc_bin... and rl_bininfo... without headers.

Read Count (RC) and Read Length (RL) files must be in .rc and .rl formats with headers as the following.

Require dependency

`Python 3`

`R`

If any error rises, please check the specified format of files, the installed packages, and the path for Python and R in your system.

Preparing data for Training & Testing

Please check the file formats, locations, bin info file names, and headers.

After the rc and rl files are ready, please keep all rc and rl files in the training and testing folders with corresponding bininfo files.

`Training the Model`

This may take a few minutes according to your data size.

`Testing the Data`

Please check the bininfo files, file formats, and locations inside the testing folder.

Run testing Code

This program runs on both Linux and Windows and we recommend Windows with Python 3.7 and R 3.6.

`Any questions, bugs, suggestions or errors are heartily welcome.`

Sunshin Kim (sunshinkim3@gmail.com)

Adh Krish (krishdb38@gmail.com)

About

Releases

Packages

Contributors 2

Languages

License

TheragenGenomecare/GenomomFF

Folders and files

Latest commit

History

Repository files navigation

Background

User Manual

Required Files in the Folders

The bin info files in RC and RL folders are like both rc_bin*... and rl_bininfo*... without headers.

Read Count (RC) and Read Length (RL) files must be in .rc and .rl formats with headers as the following.

Require dependency

Python 3

R

If any error rises, please check the specified format of files, the installed packages, and the path for Python and R in your system.

Preparing data for Training & Testing

Please check the file formats, locations, bin info file names, and headers.

After the rc and rl files are ready, please keep all rc and rl files in the training and testing folders with corresponding bininfo files.

Training the Model

This may take a few minutes according to your data size.

Testing the Data

Please check the bininfo files, file formats, and locations inside the testing folder.

Run testing Code

This program runs on both Linux and Windows and we recommend Windows with Python 3.7 and R 3.6.

Any questions, bugs, suggestions or errors are heartily welcome.

Sunshin Kim (sunshinkim3@gmail.com)

Adh Krish (krishdb38@gmail.com)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`User Manual`

`The bin info files in RC and RL folders` are like both rc_bin... and rl_bininfo... without headers.

`Python 3`

`R`

`Training the Model`

`Testing the Data`

`Any questions, bugs, suggestions or errors are heartily welcome.`

Packages