Skip to content

Nano2NGS: A framework for converting long-read sequencing data into NGS-liked mode for targeted mutation and metagenomic data analysis

Notifications You must be signed in to change notification settings

langjidong/Nano2NGS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Nano2NGS - Developed by Jidong Lang (langjidong@hotmail.com)

Nano2NGS: A Framework for Converting Nanopore Sequencing Data to NGS-Liked Sequencing Data for Hotspot Mutation Detection and Metagenomic Analysis

The Nano2NGS as a data analysis framework for hotspot mutation detection and metagenomic analysis based on long reads from nanopore sequencing. The Nano2NGS’s main idea is characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. Long reads can be converted into short reads and then processed through existing NGS analysis pipelines in combination with statistical methods. Nano2NGS not only effectively avoids false positive/negative results caused by non-random errors and unexpected insertions-deletions (indels) of nanopore sequencing data, improves the reads efficient utilization, but also breaks the barriers of data analysis methods between short-read sequencing and long-read sequencing. We hope Nano2NGS can serves as a reference method for nanopore sequencing data, and promotes higher application scope of nanopore sequencing technology in scientific research and clinical practice.

The Nano2NGS framework consists of two modules: Nano2NGS-Muta for hotspot mutation analysis and Nano2NGS-Meta for metagenomic analysis.

For the Nano2NGS-Muta modules:

Usage:

Option

    -fq		<Input File>			Input fq.gz file
    -read_len	<To NGS-Liked Read Length>	Extracted Nanopore read to NGS-Liked read length, default is 101bp
    -position_step  <Step Size>			The step size for hotspot site on the NGS-Liked reads, default is 10
    -configure	<Hotspot BED File>		Hotspot bed file, the format likes: chr\tstart\tend\tgene symbol\tmutation\ttranscription ID (optional)
    -process	<Number of process used>	N processes to use, default is 1
    -help					print HELP message

Example:

perl nano2ngs-muta.pl -fq nanopore.fq -read_len 101 -position_step 10 -configure Hotspot.bed -process 8

Note: You should prepare a check-hotspot.list file for calling final target hotspot mutation. The format was just like the hotspot bed file. Why this? Because one probe may contain two or more mutations, the configure file must choose the minimal position, others may be loss. So the check file should contail all the mutation positions.

For the Nano2-Meta modules:

Usage:

Option

    -fq 		<Input_File>    	Input .fastq .fq file's
    -read_len  	        <Extract Read Length>   Extract sequencing read length. Suggest more than 75bp.
    -time		<Number of Swipes>	The number of swipes for the extract the reads
    -help   					print HELP message

Example:

perl nano2ngs-meta.pl -fq test.fq -read_len 100 -time 10

Important Notes and Extensions: The current version of the Nano2NGS framework only converts long reads into a set of short reads, with no error correction for reads and/or bases. Therefore, although Nano2NGS can improve the reads efficient utilization, converted short reads may contain lower or similar error rates to the long reads. We recommend that users should evaluate whether it is suitable to use the Nano2NGS framework or Nano2NGS+NGS data analysis software/algorithms according to the purpose and actual needs of the research and the characteristics of Nano2NGS outputs. The Nano2NGS framework is compatible with NGS data analysis software, but regardless of the analysis algorithm and/or pipelines used, it is necessary to use actual data for limit of detection (LoD) analysis and background noise evaluation, as different analysis algorithms can produce different result features. The Nano2NGS as an analytical framework also requires further development and optimization regarding computing resource consumption, running time, and statistical algorithms, among others, to minimize running time and resource consumption while ensuring the accuracy of analysis. In the future, we will develop algorithms for the detection of copy number variation, structural variation, gene fusion, and gene expression.

BTW, if you have any good suggestions or comments for Nano2NGS, please don't hesitate to contact us, we look forward to communicating and discussing with you!

Publications

Lang J, Sun J, Yang Z, He L, He Y, Chen Y, Huang L, Li P, Li J, Qin L. Nano2NGS-Muta: a framework for converting nanopore sequencing data to NGS-liked sequencing data for hotspot mutation detection. NAR Genom Bioinform. 2022 Apr 21;4(2):lqac033. doi: 10.1093/nargab/lqac033. PMID: 35464239; PMCID: PMC9022462.

About

Nano2NGS: A framework for converting long-read sequencing data into NGS-liked mode for targeted mutation and metagenomic data analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published