Read (biology)
In DNA sequencing, a read is an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment. A typical sequencing experiment involves fragmentation of the genome into millions of molecules, which are size-selected and ligated to adapters. The set of fragments is referred to as a sequencing library, which is sequenced to produce a set of reads.[1]
Read length
Sequencing technologies vary in the length of reads produced. Reads of length 20-40 base pairs (bp) are referred to as ultra-short.[2] Typical sequencers produce read lengths in the range of 100-500 bp.[3] However, Pacific Biosciences platforms produce read lengths of approximately 1500 bp.[4] Read length is a factor which can affect the results of biological studies.[5] For example, longer read lengths improve the resolution of de novo genome assembly and detection of structural variants. It is estimated that read lengths greater than 100 kilobases (kb) will be required for routine de novo human genome assembly.[6] Bioinformatic pipelines to analyze sequencing data usually take into account read lengths.[7]
References
- "Sequencing library: what is it?". Breda Genetics. 2016-08-12. Retrieved 23 July 2017.
- Chaisson, Mark J. (2009). "De novo fragment assembly with short mate-paired reads: Does the read length matter?". Genome Research. 19 (2): 336–346. doi:10.1101/gr.079053.108. PMC 2652199. PMID 19056694. Retrieved 23 July 2017.
- Junemann, Sebastian (2013). "Updating benchtop sequencing performance comparison". Nature Biotechnology. 31 (4): 294–296. doi:10.1038/nbt.2522. PMID 23563421.
- Quail, Michael A. (2012). "A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers". BMC Genomics. 13 (1): 341. doi:10.1186/1471-2164-13-341. PMC 3431227. PMID 22827831.
- Chhangawala, Sagar; Rudy, Gabe; Mason, Christopher E.; Rosenfeld, Jeffrey A. (23 June 2015). "The impact of read length on quantification of differentially expressed genes and splice junction detection". Genome Biology. 16 (1): 131. doi:10.1186/s13059-015-0697-y. PMC 4531809. PMID 26100517.
- Chaisson, Mark J.P. (2015). "Genetic variation and the de novo assembly of human genomes". Nature Reviews Genetics. 16 (11): 627–640. doi:10.1038/nrg3933. PMC 4745987. PMID 26442640.
- Conesa, Ana; Madrigal, Pedro; Tarazona, Sonia; Gomez-Cabrero, David; Cervera, Alejandra; McPherson, Andrew; Szcześniak, Michał Wojciech; Gaffney, Daniel J.; Elo, Laura L.; Zhang, Xuegong; Mortazavi, Ali (26 January 2016). "A survey of best practices for RNA-seq data analysis". Genome Biology. 17 (1): 13. doi:10.1186/s13059-016-0881-8. PMC 4728800. PMID 26813401.