What Are the Benefits and Disadvantages of Long Reads Scaffolding Techniques?

Article Navigation

Article Contents

Long reads: their purpose and place

Martin O Pollard,

Human Genetics - Wellcome Sanger Institute, Hinxton, Cambridge, Great britain

University of Cambridge - Department of Medicine, Addenbrookes Hospital, Box 157, Hills Road, Cambridge, UK

To whom correspondence should be addressed at: Global Health and Populations Group - Human being Genetics, Morgan Building, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1HH, UK. Tel: +44 01223834244; Fax: +44 01223496802; Email: mp15@sanger.air conditioning.uk

Search for other works past this author on:

Abstract

In recent years long-read technologies have moved from being a niche and specialist field to a bespeak of relative maturity likely to feature often in the genomic landscape. Analogous to next generation sequencing, the toll of sequencing using long-read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterizing genomes at high resolution. In this article, we will endeavour to nowadays an introduction to long-read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful and how they are being used. We volition highlight the contempo developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.

When Brusk Reads Are Not Enough

Deoxyribonucleic acid is an extraordinarily compact storage medium, so minor that developing ways to decode the sequence encoded in these molecules has been a topic of research for many years. The first method developed for sequencing DNA, often known every bit Sanger sequencing (i), was a low throughput procedure that detected bases by incorporation into a template strand, sequencing fragments of Dna up to k bp long. The quantum assuasive sequencing at scale finally came with the advent of next generation sequencing (NGS) applied science, which employed massively parallel reactions for high throughput. While these technologies accept been able to capture sequence from the majority of the genome and take found utility in the study of illness, their brusk reads and lack of contextual information has limited their utility in genome assembly and in resolving complex and repetitive regions of the genome.

The incremental improvements in read-length that this generation of engineering science can yield is one of diminishing returns. Thus, to achieve substantial gains in mapping, assembly and phasing one must consider applied science that provides an order of magnitude increase in read-length (2). Practically likewise, there are many important bug in genetics where a curt read of Dna (<thou base of operations pairs) is insufficient (Table 1 and Fig. 1).

Table i.

Advantages and applications of long-read sequencing

Limitations of short read data	Applications and advantages of long-read sequencing
Access to high GC content regions Resolution of complex regions of the genome (e.g. MHC ^a ) Repetitive regions where brusk reads will not map uniquely Systematic context-specific fault modes Structural variation, and large segmental duplications Paralogous regions of the genome Resolution of stage (read-based phasing)	De novo assembly from long reads to span the low complication and repetitive regions, to create accurate assemblies (three). Targeted sequencing of complex genomic and paralogous regions and resolution of stage for clinical applications e.g. HLA ^b typing, ADPKD ^c (iv). Transcriptomics, allowing total length sequencing of isoforms and examination of splicing (5). Detection of structural variants (east.chiliad. segmental duplications, gene loss and fusion events) Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells Long-range characterization of methylation patterns

Limitations of short read data

Applications and advantages of long-read sequencing

Access to high GC content regions
Resolution of complex regions of the genome (e.g. MHC ^a )
Repetitive regions where brusk reads will not map uniquely
Systematic context-specific fault modes
Structural variation, and large segmental duplications
Paralogous regions of the genome
Resolution of stage (read-based phasing)

De novo assembly from long reads to span the low complication and repetitive regions, to create accurate assemblies (three).
Targeted sequencing of complex genomic and paralogous regions and resolution of stage for clinical applications e.g. HLA ^b typing, ADPKD ^c (iv).
Transcriptomics, allowing total length sequencing of isoforms and examination of splicing (5).
Detection of structural variants (east.chiliad. segmental duplications, gene loss and fusion events)
Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells
Long-range characterization of methylation patterns

Limitations of brusque read information	Applications and advantages of long-read sequencing
Admission to high GC content regions Resolution of complex regions of the genome (eastward.one thousand. MHC ^a ) Repetitive regions where brusque reads will not map uniquely Systematic context-specific error modes Structural variation, and large segmental duplications Paralogous regions of the genome Resolution of phase (read-based phasing)	De novo assembly from long reads to span the low complexity and repetitive regions, to create accurate assemblies (3). Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications e.g. HLA ^b typing, ADPKD ^c (4). Transcriptomics, assuasive full length sequencing of isoforms and examination of splicing (5). Detection of structural variants (due east.g. segmental duplications, cistron loss and fusion events) Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells Long-range characterization of methylation patterns

Limitations of brusque read information

Applications and advantages of long-read sequencing

Admission to high GC content regions
Resolution of complex regions of the genome (eastward.one thousand. MHC ^a )
Repetitive regions where brusque reads will not map uniquely
Systematic context-specific error modes
Structural variation, and large segmental duplications
Paralogous regions of the genome
Resolution of phase (read-based phasing)

De novo assembly from long reads to span the low complexity and repetitive regions, to create accurate assemblies (3).
Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications e.g. HLA ^b typing, ADPKD ^c (4).
Transcriptomics, assuasive full length sequencing of isoforms and examination of splicing (5).
Detection of structural variants (due east.g. segmental duplications, cistron loss and fusion events)
Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells
Long-range characterization of methylation patterns

MHC: Major histocompatibility complex.

HLA: Histocompatibility leucocyte antigen.

ADPKD: Autosomal-dominant polycystic kidney affliction.

Tabular array 1.

Advantages and applications of long-read sequencing

Limitations of short read information	Applications and advantages of long-read sequencing
Admission to high GC content regions Resolution of circuitous regions of the genome (e.g. MHC ^a ) Repetitive regions where short reads will not map uniquely Systematic context-specific error modes Structural variation, and large segmental duplications Paralogous regions of the genome Resolution of phase (read-based phasing)	De novo assembly from long reads to span the low complication and repetitive regions, to create accurate assemblies (3). Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications e.m. HLA ^b typing, ADPKD ^c (4). Transcriptomics, allowing full length sequencing of isoforms and examination of splicing (five). Detection of structural variants (e.thousand. segmental duplications, cistron loss and fusion events) Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells Long-range label of methylation patterns

Limitations of short read information

Applications and advantages of long-read sequencing

Admission to high GC content regions
Resolution of circuitous regions of the genome (e.g. MHC ^a )
Repetitive regions where short reads will not map uniquely
Systematic context-specific error modes
Structural variation, and large segmental duplications
Paralogous regions of the genome
Resolution of phase (read-based phasing)

De novo assembly from long reads to span the low complication and repetitive regions, to create accurate assemblies (3).
Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications e.m. HLA ^b typing, ADPKD ^c (4).
Transcriptomics, allowing full length sequencing of isoforms and examination of splicing (five).
Detection of structural variants (e.thousand. segmental duplications, cistron loss and fusion events)
Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells
Long-range label of methylation patterns

Limitations of brusque read data	Applications and advantages of long-read sequencing
Access to high GC content regions Resolution of complex regions of the genome (due east.k. MHC ^a ) Repetitive regions where short reads will non map uniquely Systematic context-specific error modes Structural variation, and large segmental duplications Paralogous regions of the genome Resolution of phase (read-based phasing)	De novo assembly from long reads to span the low complication and repetitive regions, to create accurate assemblies (3). Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications east.g. HLA ^b typing, ADPKD ^c (4). Transcriptomics, allowing full length sequencing of isoforms and examination of splicing (5). Detection of structural variants (e.chiliad. segmental duplications, cistron loss and fusion events) Single molecule sequencing allows exam of clonal heterogeneity of pathogens, and immunogenic cells Long-range label of methylation patterns

Limitations of brusque read data

Applications and advantages of long-read sequencing

Access to high GC content regions
Resolution of complex regions of the genome (due east.k. MHC ^a )
Repetitive regions where short reads will non map uniquely
Systematic context-specific error modes
Structural variation, and large segmental duplications
Paralogous regions of the genome
Resolution of phase (read-based phasing)

De novo assembly from long reads to span the low complication and repetitive regions, to create accurate assemblies (3).
Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications east.g. HLA ^b typing, ADPKD ^c (4).
Transcriptomics, allowing full length sequencing of isoforms and examination of splicing (5).
Detection of structural variants (e.chiliad. segmental duplications, cistron loss and fusion events)
Single molecule sequencing allows exam of clonal heterogeneity of pathogens, and immunogenic cells
Long-range label of methylation patterns

MHC: Major histocompatibility complex.

HLA: Histocompatibility leucocyte antigen.

ADPKD: Autosomal-ascendant polycystic kidney disease.

Figure i.

Behaviour of reads around genomic events. (A) Large insertion: short reads at the edge of the variant are be soft-clipped. Reads within the insertion will exist either unmapped or mapped incorrectly. Large reads will either span the insertion or have plenty context to be marked equally inserted sequence. (B) Large deletion: short reads spanning the deletion may exist mismapped or only have ane of the reads marked as mapped considering the reference measured length indicates the insert size deviates from the expected distribution. Long reads will span the gap but most will accept plenty context to call the deletion. (C) Copy number variation: where the read-length exceeds the length of the CNV region reads will map correctly. Shorter reads may be collapsed and show up every bit increased depth in a pileup or be marked as mapping poorly. (D) Inversion: reads will either be represented as a primary alignment with an inverted supplementary or manifest as soft clipping around the border of the inversion with a reduction in depth where reads span the edge of the inversion.

Key to achieving high quality results with all long-read technologies is the employ of loftier molecular weight DNA equally a starting cloth. The utility of these methods depends on a long DNA fragment size, with Deoxyribonucleic acid damage and fragmentation limiting the quality of data obtained. Specific protocols for Dna extraction such equally the agarose gel protocol for BioNano are platonic to maximize yield from these methods.

Long-Read Technologies

Single molecule existent fourth dimension sequencing

The first long-read sequencing technology to achieve a widespread deployment is the single molecule existent time (SMRT) sequencing technology from Pacific Biosciences (PacBio). The SMRT system implemented in their Sequel and RS- Two platforms uses a massively parallel arrangement of polymerases each bound to a single molecule of target Dna that has been circularized with a pair of hairpin sequencing adaptors (the SMRTbell) (Fig. 2A). Incorporation of labelled bases past a polymerase on the template strand causes fluorescence. The resulting indicate is detected by a CCD camera via a cipher-fashion waveguide (vi,7), yielding a combination of indicate and time series information. Reads produced by this applied science typically peak at 100 Kbp in length and a typical N50 on recent polymerases is ∼xx Kbp.

Figure 2.

Long-read sequencing technologies. (A) PacBio SMRT sequencing. Double stranded DNA is first sheared and size selected to the desired length and and so sequencing adaptors are annealed. The adaptors are bound to a sequencing primer and strand displacing polymerase which adheres to the bottom of a well containing a nil way moving ridge guide. Following a pre-extension menses where the polymerase reaction is run in the dark, the fragment is illuminated with a laser and as each base in the sequencing solution is incorporated, the fluorophore is detected and the polymerase reaction displaces it, giving a time and intensity signal which is converted into a base of operations call. (B) Oxford Nanopore Engineering passes the DNA molecule through a nanopore fastened the flow cell surface membrane. Every bit each base of the Deoxyribonucleic acid molecule passes through the pore changes to the current passing through the pore are detected and converted into a signal. The betoken detected is passed to a recurrent neural network (RNN) which converts information technology into base calls. (C) 10X Genomics Chromium technology works past means of an emulsion droplet technology, where gel beads are mixed with high molecular weight genomic Deoxyribonucleic acid and an enzyme. Within each gel bead Deoxyribonucleic acid is sheared and barcoded, creating fragments which tin can and then be sequenced with Illumina sequencing. The presence of the chromium barcode then provides a mapper or assembler with linked-reads, assuasive the relative spatial position of the fragments to be estimated Components of figure reproduced with permission from Pacific Biosciences, Oxford Nanopore Technologies and 10X Genomics.

One complication of SMRT sequencing is the high error rate of this process relative to short read sequencing, at xi–fourteen% depending on polymerase and chemistry. Notwithstanding, this fault style is stochastic (by contrast with other technologies), and can be mitigated by repeated measurements of the sequence. With PacBio sequencing, this is carried out by repeated forward and reverse sequencing passes over the circularized SMRTbell molecule (Fig. 2A). Adaptor sequences tin can be removed from the generated sequence to provide plenty subreads to generate a highly accurate consensus of each molecule. This process is known as round consensus sequencing and has been shown to reduce basecalling error substantially (eight) whilst likewise enabling the strand specific calling of base of operations modifications in unamplified DNA (9). When long DNA fragments are sequenced, these may not be parsed more than once in the SMRTbell; in this case, increasing coverage and so calling a consensus across reads can also attain a reduced error rate; a method ofttimes used in polishing assemblies (x).

Oxford Nanopore Technologies

The next successful single molecule applied science to striking the marketplace was that produced past Oxford Nanopore Technologies (ONT) (xi). This engineering science is based on passing a single strand of DNA through a nanopore with an enzyme attached, and measuring changes in the electrical signal across the pore (Fig. 2B). The signal is then amplified and measured to determine the bases that passed through. As the pore holds several bases at a time (typically 5-mers), overlapping k-mers that cause changes in raw current must be inferred and used to make base of operations calls, a procedure which can be error decumbent. Past measuring the shape of the molecule passing through the pore ONT not only reads the sequence of the Dna only like SMRT is also able to find base modifications (12). However, unmodelled base modifications and systematic Deoxyribonucleic acid context-specific errors (13) currently limit the utility of the technology.

Oxford Nanopore MinION technology heralds the promise of a pocket size sequencer, with reads from ONT that can stretch into the hundreds of kilobases with appropriate DNA preparation, and megabase long reads that have been observed when a large number of menstruation cells have been used. In that location appears to be no intrinsic read-length limit for ONT, other than the size of Dna fragments. Recent improvements in engineering science, library grooming and throughput have allowed the first human line sequenced on the MinION (GM12878) earlier this twelvemonth (14). This written report generated ultra-long reads (>800 Kbp), and suggested that addition of small coverage with ultra-long-read sequencing to existing assemblies may substantially improve resolution of contigs and haplotypes. While the mistake charge per unit is comparable to SMRT sequencing, a component of the fault is systematic and context-specific, limiting the ability to right this by increasing coverage (13) and requiring polishing with other technologies instead.

ONT has adult a distinct strategy to mitigate stochastic mistake on their platform, focusing on the way that the template strand passes through the pore. ONT cannot simply circularize the DNA. Instead, both the template and complement strands of the Dna molecule are joined by a hairpin loop during library prep (2d) or tethered in such a fashion (1Dⁱⁱ) to permit sequential forwards and opposite strand sequencing. Combining these information greatly enhances accuracy and reduces random fault.

The utilize of nanopores as a nucleic acid sequencing technology is not entirely sectional to ONT; at least one similar but distinct competing technology is too under development by Roche.

10X Genomics Chromium organisation

An alternative to the aforementioned single molecule sequencing methods is the 10X Genomics Chromium system. Whilst this is non technically a long-read sequencing technology, it is an important member of this ecosystem and tin can solve similar issues such equally mapping, phasing and assembly (Fig. 2C). Chromium has lower cost compared to ONT and SMRT because of the employ of the nearly ubiquitous Illumina short reads in its sequencing procedure.

The basis of this technology (15) is the barcoding of large fragments of DNA (preferably >100 Kbp) in an initial digital droplet polymerase chain reaction (PCR) step. In each droplet, a single fragment is both sheared and then tagged with a semi-unique molecular barcode (Fig. 2C). The resulting fragments are and then sequenced similar any other Illumina library. The barcode allows for determination of the relative spatial orientation of the tags, and allows phasing and assembly of contigs by combining information across multiple tags (xv,16). Additionally, because the data provide spatial orientation across the genome, it is possible to use it to scaffold data from other methods (17).

Centrolineal technologies

Centrolineal technologies associated with long-read sequencing such as: optical mapping, HiC and like accept been used to enhance the final results from sequencing. Optical mapping technologies such as BioNano Irys and Saphyr characterization DNA and then image the labelled Deoxyribonucleic acid to generate genome maps. These genome maps are used to scaffold contigs produced past assembly (18) and as well to discover large (>500 bp) structural variants and inversions. HiC tin be used to analysis chromosomal conformation and is particularly useful in assigning assembled sequences to chromosomes (19).

The Utility of Long-Read Technology: Recent Developments

High resolution genome assemblies

Accurate assemblies of the genomes of organisms are crucial to agreement organismal diversity, speciation, evolution of species and the impact of genomic multifariousness on health and disease. The electric current human genome reference GRCh38 has been assembled from the DNA of multiple donors, and represents a mosaic of haplotypes. However, several studies take suggested that existing human reference genomes may not fully reflect the diversity of global homo populations, and may be biased towards diversity in European populations (twenty–22). This has of import implications for human basic and medical enquiry. Assembling the human genome has involved extensive curation with clone-based assembly methods and Sanger sequencing. Long-read technologies provide a high throughput platform for characterization of genomes through highly contiguous assemblies (Fig. 3).

Figure 3.

Long reads span and call variations that short reads cannot. IGV (http://software.broadinstitute.org/software/igv/home) image of (top) PacBio reads from a sample sequenced as part of the GDAP projection. The reads bridge a 6 kb heterozygous LINE-1 element deletion and show clear depth variation. Illumina (bottom) reads from the aforementioned sample unable to be clearly mapped effectually the deletion with reads in white indicating where reads were unable to be uniquely mapped.

The early long-read platforms produced reads that were only a few kilobases long with a high per-base price; however, they chop-chop carved a niche in the creation and finishing of assemblies. These long reads could close gaps in genomes by spanning the low complexity regions that would otherwise require many plush YAC, BAC and fosmid clones to be created and sequenced. Thus, many of the early on tools such as PBJelly were focused on gap closure (23,24). The high per-base error rate besides required new assembly algorithms, and new tools were created to polish the final associates with Illumina reads to eliminate basecalling error (12). Clone based assembly methods were non eliminated entirely either every bit they provided useful spatial context, but long reads provided a new mode to sequence clones in a high throughput manner (25).

Long-read sequencing methods have contributed to platinum quality reference sequences such as NA12878 (xiv,26) and the haploid sequences CHM1 (27) and CHM13 (28), as well filling many gaps in the human reference (18,25,29). Of item notation are the first Chinese (18) and Korean (25) human reference genomes which have been created to reply questions about population-specific sequence. These sequences have resulted in highly contiguous assemblies, endmost a high proportion of gaps in the human genome. These take led to discovery of population-specific sequences, demonstrating the need for further assemblies from non-European population groups. Recently, higher coverage sequencing (∼60×) of two haploid genomes has also been used to place substantial structural variation, the vast bulk of which take not been recovered from sequencing using NGS technologies (28). Characterization of high resolution population-specific reference genomes from initiatives such every bit the Genome in a Bottle (GIAB) (30) and the Genome Variety in Africa project (GDAP) (31) (Fig. three) will provide of import resources for population and medical genetics, and too allow a clearer understanding of the evolutionary demographic history of different populations by better delineation of phase (31).

Most human assemblies accept involved a haploid representation of the genome, where information from the ii chromosomes is collapsed into a single sequence. Generation of haplotype representations of the genome can reduce error in the final assembly, particularly in the case of segmental duplications (xvi,32). While long-read technologies can generate phase information over long contiguous segments, these methods cannot resolve phase over long regions of homozygosity or assembly gaps. Assembly of haploid genomes, therefore, requires boosted contextual information, which can be provided by linked-read approaches. More recently, trio based methods (where parents are sequenced using Illumina short reads, with offspring sequenced with long reads) have been used to provide this contextual information by separation of maternal and paternal haplotypes prior to assembly using a father–mother–offspring trio (33). This method has been practical to yield a highly contiguous diploid assembly of an F1 hybrid of two bovine subspecies with a quality surpassing previous cattle reference genomes (33).

Long reads have been successfully applied to organisms with smaller genomes besides every bit bacteria and viruses, with the advantage that for some of these the entire genome can be spanned by a single long read (34). The Tree of Life initiative, a collaboration across multiple centres is in the process of developing loftier resolution reference sequences for >50 vertebrate species using a combination of long read, brusk read and linked-read approaches. Some other leading project is the big bacterial sequencing projection NCTC 3000 at the Wellcome Sanger Institute, which is using PacBio sequencing to sequence consummate bacterial genomes (https://www.phe-culturecollections.org.uk/collections/nctc-3000-project.aspx). These relatively small genomes (Escherichia coli is for example 4.6 Mbases) tin can often have their chromosomes and plasmids assembled into unmarried contigs. The construction of full and authentic assemblies of these organisms allow fine-scale phylogenies of these organisms to be constructed and is too helpful in the field of epidemiology when tracing the source of an outbreak. A contempo example of this was a study where SMRT sequencing was used to identify a reservoir of antibiotic resistant plasmids inside hospitals (35).

In improver to Dna sequencing, ONT sequencing has been applied to sequence RNA directly rather than relying on an intermediate cDNA stride, allowing direct sequencing of RNA viruses and detection of splice variants and base modifications straight from RNA molecules. An instance of this is the contempo directly sequencing and assembly of the influenza A virus in a native RNA form without amplification or conversion to DNA (36).

Targeted sequencing

From a clinical point of view targeted sequencing is an area where long reads are likely to have the greatest initial bear on. In the diverse, complex and clinically relevant regions such as the histocompatibility leucocyte antigen (HLA) (37), killer prison cell immunoglobulin-like receptor (KIR) (38) and BRCA; and in pharmacologically relevant genes such every bit CYP2D6 (39,xl), targeted sequencing has allowed clinicians and researchers to characterize areas of the genome which were previously inaccessible using NGS methods. In addition, where diversity is high it has become possible to call and phase variation across the entire gene. This approach has since been used to retype 126 HLA reference samples across 6 loci and is now considered a gilt standard for clinical sequencing for stem cell transplants (41).

Typically, when targeting such a region, a long-range PCR reaction is used to specifically dilate the genes of interest. Yet recently there accept been studies demonstrating the use of pulldowns and CRISPR-CAS9 to capture the region of interest with little or no distension (42). The reward of these reduced and non-amplification based approaches is the removal of PCR fault as a factor, specially in tandem repeats and GC rich regions (42). Additionally, in the instance of CRISPR methods, capture of raw genomic textile allows DNA modification information to be read.

Transcriptomics and RNA

In add-on to its many uses with Deoxyribonucleic acid, long-read technology also has provided many new insights into the world of transcriptomes and ncRNA past assuasive for sequencing of these total length isoforms rather than relying on the associates of sheared NGS fragments, a method prone to a high charge per unit of false positives and ambiguities (43). Direct sequencing of isoforms tin can be particularly useful in circuitous polyploid genomes such every bit the coffee institute (44), where construction of a reference transcriptome is otherwise extremely challenging. In add-on to its usefulness in reference transcriptomes IsoSeq has been used in functional studies to analyse the expression of diverse disease-linked proteins such as TP53 in leukaemia (45).

The MinION platform has recently been used to sequence cDNA; applications of this, such as single cell sequencing of allowed cells illustrates the power of such methods to examine clonal heterogeneity in gene expression and isoform usage, potentially revolutionizing our agreement of the repertoire and functions of immunological cell receptors (46).

Epigenetics

SMRT sequencing technology is able to detect base modification, as it records base kinetics of the polymerase, when Deoxyribonucleic acid molecules are sequenced directly without PCR. Similarly, Nanopore technology tin also detect base modifications due to variation in ionic currents. However, because distension of DNA would erase base modifications, these methods require relatively large amounts of native, unamplified Deoxyribonucleic acid as input material. Contempo innovations that combine bi-sulphate conversion with SMRT sequencing have immune direct loftier throughput analysis of CpG methylation without requiring large quantities of sample (47), providing an artery for more accurate assessment of CpG islands, and allele-specific CpG methylation.

Clinical applications

The advantages of long-read technologies in accessing complex regions of the genome, make these ideal for clinical applications in diagnosis, prognostication and personalized medicine. Early clinical applications have included sequencing of tandem repeats in fragile X syndrome, spinocerebellar ataxia, providing authentic diagnostics and potential for prognostication in clinical genetics. SMRT sequencing has besides been used to resolve structural variants associated with Mendelian disease (48).

Long-read sequencing technologies are quickly moving towards the mainstay of high resolution HLA typing for transplant registries in certain regions (37); with high resolution typing potentially having implications for meliorate matching, and clinical outcomes of patients undergoing transplantation. This is even more important in populations which are poorly represented in electric current reference sequence databases, limiting disambiguation of clinical types when using standard methods for typing. The HLA diversity in Africa projection, which aims to characterize high resolution HLA types across >20 ethno-linguistic in Africa has recently completed sequencing of ∼2000 individuals using long-read sequencing, identifying high levels of novelty in class I and grade Ii HLA types (49). This panel volition provide an important resources for clinical HLA typing in populations of African ancestry, as well as a platform for highly accurate imputation of HLA types in medical genetics research.

Using long-range PCR amplicons, with barcoding and long-read technology besides allow better delineation of genes from pseudogenes, such as for sequencing PKD1 for diagnosing autosomal-dominant polycystic kidney disease, for which diagnostic accuracy of NGS technologies has been express (50). SMRT sequencing has also been used to tailor treatment in patients with cancer, by identifying low frequency resistant mutations in BCR-ABL1 that bear on treatment efficacy in patients with CML (51). Applications of SMRT sequencing in reproductive medicine, to place parent of origin effects, and for pre-implantation diagnosis accept been previously noted (52).

Full sequencing of several virus genomes in a unmarried contig by long-read sequencing has provided unique avenues for identification of resistant mutations for clinical applications. Proof-of concept studies take generated protocols to examine low frequency (up to 0.25%) associated mutations for HIV and HCV resistance to drugs, through deep sequencing of full length quasispecies (53). Methylation profiles of pathogens examined using SMRT approaches have also been shown to correlate with pathogenicity, and virulence, potentially providing a new avenue for applications in communicable diseases surveillance.

The Future

Long-read technologies are improving speedily, and may get the mainstay of sequencing; however, the broader awarding of long-read technologies are currently limited by a lower throughput, college error rate and higher toll per base relative to short read sequencing. Wider employ of such technologies in the clinical context may apace improve our understanding of cancer, pathogen evolution, drug resistance and genetic diversity in circuitous regions of the genome that take important implications for clinical intendance. Parallel development of existing technology to allow high throughput PCR-free sequencing volition be important in sequencing difficult regions of the genome (54).

At nowadays, no single long-read technology has any articulate advantage from a scientific point of view, and thus it seems probable that the time to come of long-read sequencing is more likely to exist decided on commercial terms rather than scientific. Whichever technology captures the marketplace, it is clear that as these technologies become more than affordable they will proceed to shine a calorie-free into previously intractable regions of the genome with ever larger sample sizes and longer read-lengths, assuasive new discovery in these evolving fields.

Acknowledgements

Uttara Partap for copyediting.

Conflict of Involvement statement. M.O.P has presented at a PacBio-sponsored meeting and has received accommodation for presenting at this effect.

Funding

Wellcome Trust (grant number 098051 to M.S.South.); the National Plant for Wellness Research Cambridge Biomedical Research Centre (Great britain) to Chiliad.Due south.S.; a Wellcome Trust Fellowship (grant number 106289/Z/xiv/Z to A.J.Thou.); and the Medical Research Council (MRC) (MR/S003711/1 to D.G.); IAVI with the generous support of USAID (in part), and the Bill & Melinda Gates Foundation; a full list of IAVI donors is bachelor at www.iavi.org. The contents of this manuscript are the responsibility of the authors and practice not necessarily reflect the views of USAID or the U.s. Government. Funding to pay the Open Access publication charges for this article was provided past the Wellcome Trust.

References

Sanger

Coulson

(

1975

)

A rapid method for determining sequences in Deoxyribonucleic acid past primed synthesis with Deoxyribonucleic acid polymerase

J. Mol. Biol

441

–

448

Due west.

Freudenberg

(

2014

)

Mappability and read length

Forepart. Genet

381.

Howe

Clark

M.D.

Torroja

C.F.

Torrance

Berthelot

Muffato

Collins

J.Eastward.

Humphray

Southward.

McLaren

Thousand.

Matthews

Fifty.

(

2013

)

The zebrafish reference genome sequence and its relationship to the human genome

Nature

496

498

–

503

Hosomichi

Jinam

T.A.

Mitsunaga

Nakaoka

Inoue

(

2013

)

Phase-divers complete sequencing of the HLA genes past adjacent-generation sequencing

BMC Genomics

xiv

355

–

355

five

Wang

Tseng

Eastward.

Regulski

One thousand.

Clark

T.A.

Hon

Jiao

Olson

Stein

J.C.

Ware

(

2016

)

Unveiling the complication of the maize transcriptome by single-molecule long-read sequencing

Nat. Commun

seven

11708

Levene

M.J.

Korlach

Turner

S.W.

Foquet

Craighead

H.G.

Webb

W.W.

(

2003

)

Zero-style waveguides for single-molecule analysis at high concentrations

Scientific discipline

299

682

–

686

seven

Eid

Fehr

Grayness

Luong

Lyle

Otto

Peluso

Rank

Baybayan

Bettman

(

2009

)

Real-time Deoxyribonucleic acid sequencing from single polymerase molecules

Science

323

133

–

138

Travers

K.J.

Chin

C.-S.

Rank

D.R.

Eid

J.Due south.

Turner

South.W.

(

2010

)

A flexible and efficient template format for circular consensus sequencing and SNP detection

Nucleic Acids Res

e159

Flusberg

B.A.

Webster

D.R.

Lee

J.H.

Travers

One thousand.J.

Olivares

E.C.

Clark

T.A.

Korlach

Turner

S.Westward.

(

2010

)

Straight detection of DNA methylation during single-molecule, existent-fourth dimension sequencing

Nat. Methods

461

–

465

Chin

C.-Southward.

Alexander

D.H.

Marks

Klammer

A.A.

Drake

Heiner

Clum

Copeland

Huddleston

Eichler

E.E.

(

2013

)

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

Nat. Methods

563

–

569

Deamer

Akeson

Grand.

Branton

(

2016

)

Three decades of nanopore sequencing

Nat. Biotechnol

518

–

524

Simpson

J.T.

Workman

R.Eastward.

Zuzarte

P.C.

David

Dursi

L.J.

Timp

(

2017

)

Detecting DNA cytosine methylation using nanopore sequencing

Nat. Methods

407

–

410

Krishnakumar

Sinha

Bird

Southward.W.

Jayamohan

Edwards

H.Due south.

Schoeniger

J.S.

Patel

K.D.

Branda

South.S.

Bartsch

Grand.South.

(

2018

)

Systematic and stochastic influences on the performance of the MinION nanopore sequencer beyond a range of nucleotide bias

Sci. Rep

3159

fourteen

Jain

One thousand.

Koren

Miga

Thou.H.

Quick

Rand

A.C.

Sasani

T.A.

Tyson

J.R.

Beggs

A.D.

Dilthey

A.T.

Fiddes

I.T.

et al. (

2018

)

Nanopore sequencing and associates of a human genome with ultra-long reads

Nat. Biotechnol

338

–

345

Zheng

G.Ten.Y.

Lau

B.T.

Schnall-Levin

Jarosz

Thousand.

Bell

J.K.

Hindson

C.M.

Kyriazopoulou-Panagiotopoulou

Due south.

Masquelier

D.A.

Merrill

Terry

J.M.

et al. (

2016

)

Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

Nat. Biotechnol

303

–

311

Weisenfeld

N.I.

Kumar

Shah

Church

D.K.

Jaffe

D.B.

(

2017

)

Straight determination of diploid genome sequences

Genome Res

757

–

767

Yeo

Coombe

50.

Warren

R.L.

Chu

Birol

(

2018

)

ARCS: scaffolding genome drafts with linked reads

Bioinformatics

725

–

731

Shi

Guo

Dong

Huddleston

Yang

Han

Gong

et al. (

2016

)

Long-read sequencing and de novo assembly of a Chinese genome

Nat. Commun

12065

xix

Dudchenko

Batra

South.S.

Omer

A.D.

Nyquist

S.K.

Hoeger

Durand

N.C.

Shamim

Thousand.S.

Machol

Lander

E.S.

Aiden

A.P.

et al. (

2017

)

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

Science

356

–

Brandt

D.Y.C.

Aguiar

Five.R.C.

Bitarello

B.D.

Nunes

Goudet

Meyer

(

2015

)

Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 genomes projection stage I data

G3 (Bethesda)

five

931

–

941

Lunter

Goodson

(

2011

)

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads

Genome Res

936

–

939

Degner

J.F.

Marioni

J.C.

Pai

A.A.

Pickrell

J.G.

Nkadori

Gilad

Pritchard

J.Grand.

(

2009

)

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Bioinformatics

3207

–

3212

English language

A.C.

Richards

Han

Wang

Vee

Qin

Ten.

Muzny

D.K.

Reid

J.G.

Worley

K.C.

et al. (

2012

)

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology

PLoS One

e47768

–

e47712

Worley

K.C.

English language

A.C.

Richards

Ross-Ibarra

Han

Hughes

Deiros

D.R.

Vee

Wang

Boerwinkle

Due east.

et al. (

2014

). Improving genomes using long reads and PBJelly 2. Presented at: Plant and Animal Genome Conference XXII.

Seo

J.-S.

Rhie

Kim

Lee

Sohn

M.-H.

Kim

C.-U.

Hastie

Cao

Yun

J.-Y.

Kim

et al. (

2016

)

De novo assembly and phasing of a Korean human genome

Nature

538

243.

Pendleton

Thousand.

Sebra

Pang

A.West.C.

Ummat

Franzen

Rausch

Stütz

A.M.

Stedman

Due west.

Anantharaman

Hastie

et al. (

2015

)

Associates and diploid architecture of an individual man genome via unmarried-molecule technologies

Nat. Methods

780

–

786

Chaisson

Chiliad.J.

Huddleston

Dennis

M.Y.

Sudmant

P.H.

Malig

Hormozdiari

Antonacci

Surti

Sandstrom

Boitano

Grand.

et al. (

2015

)

Resolving the complexity of the human genome using unmarried-molecule sequencing

Nature

517

608

–

611

Huddleston

Chaisson

1000.J.

Steinberg

M.M.

Warren

Westward.

Hoekzema

Grand.

Gordon

Graves-Lindsay

T.A.

Munson

M.M.

Kronenberg

Z.North.

Vives

et al. (

2017

)

Discovery and genotyping of structural variation from long-read haploid genome sequence data

Genome Res

677

–

685

Schneider

V.A.

Graves-Lindsay

Howe

Bouk

Chen

H.-C.

Kitts

P.A.

Tater

T.D.

Pruitt

Chiliad.D.

Thibaud-Nissen

Albracht

et al. (

2017

)

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

Genome Res

849

–

864

Zook

J.G.

Chapman

Wang

Mittelman

Hofmann

Hide

Salit

Yard.

(

2014

)

Integrating human sequence data sets provides a resource of criterion SNP and indel genotype calls

Nat. Biotechnol

246

–

251

Gurdasani

Martinez

Pollard

Carstensen

Pomilla

and

GDAP Investigators

(

2016

). The Genome Diversity in Africa Project: a deep catalogue of genetic diversity across Africa. Presented at: The 66th Annual Meeting of The American Society of Human Genetics, Vancouver, Canada.

Chin

C.-S.

Peluso

Sedlazeck

F.J.

Nattestad

Concepcion

Thou.T.

Clum

Dunn

O'Malley

Figueroa-Balderas

Morales-Cruz

et al. (

2016

)

Phased diploid genome assembly with single-molecule real-time sequencing

Nat. Methods

1050

–

1054

Koren

Rhie

Walenz

B.P.

Dilthey

A.T.

Bickhart

D.1000.

Kingan

S.B.

Hiendleder

South.

Williams

J.50.

Smith

T.P.

Phillippy

(

2018

). Complete assembly of parental haplotypes with trio binning. bioRxiv,

271486.

Koren

Southward.

Phillippy

A.M.

(

2015

)

One chromosome, one contig: complete microbial genomes from long-read sequencing and associates

Curr. Opin. Microbiol

110

–

120

Weingarten

R.A.

Johnson

R.C.

Conlan

Due south.

Ramsburg

A.Chiliad.

Dekker

J.P.

Lau

A.F.

Khil

Odom

R.T.

Deming

Park

et al. (

2018

)

Genomic analysis of hospital plumbing reveals diverse reservoir of bacterial plasmids conferring carbapenem resistance

MBio

nine

e02011-17

Keller

M.W.

Rambo-Martin

B.L.

Wilson

One thousand.M.

Ridenour

C.A.

Shepard

S.S.

Stark

T.J.

Neuhaus

E.B.

Dugan

V.Chiliad.

Wentworth

D.East.

Barnes

J.R.

(

2018

). Direct RNA sequencing of the complete flu A virus genome. bioRxiv.

Mayor

N.P.

Robinson

McWhinnie

A.J.Yard.

Ranade

Eng

Midwinter

Bultitude

Due west.P.

Chin

C.-South.

Bowman

Marks

et al. (

2015

)

HLA typing for the adjacent generation

PLoS I

e0127153

Roe

Vierra-Greenish

Pyo

C.-W.

Eng

Hall

Kuang

Spellman

Ranade

Geraghty

D.Due east.

Maiers

(

2017

)

Revealing consummate circuitous KIR haplotypes phased past long-read sequencing technology

Genes Immun

127

–

134

Buermans

H.P.

Vossen

R.H.

Anvar

S.Y.

Allard

W.G.

Guchelaar

H.-J.

White

Southward.J.

den Dunnen

J.T.

Swen

J.J.

van der Straaten

(

2017

)

Flexible and scalable full-length CYP2D6 long amplicon PacBio sequencing

Hum. Mutat

310

Yang

Botton

M.R.

Scott

East.R.

Scott

South.A.

(

2017

)

Sequencing the CYP2D6 gene: from variant allele discovery to clinical pharmacogenetic testing

Pharmacogenomics

673

–

685

Turner

T.R.

Hayhurst

J.D.

Hayward

D.R.

Bultitude

W.P.

Barker

D.J.

Robinson

Madrigal

J.A.

Mayor

N.P.

Marsh

S.G.Eastward.

(

2018

)

Unmarried molecule real-time Dna sequencing of HLA genes at ultra-high resolution from 126 International HLA and Immunogenetics Workshop prison cell lines

HLA

–

101

Tsai

Y.-C.

Greenberg

Powell

Hoijer

Ameur

Strahl

Ellis

Jonasson

Mouro Pinto

Wheeler

et al. (

2017

). Distension-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions. bioRxiv,

203919.

Steijger

Abril

J.F.

Engström

P.Grand.

Kokocinski

Abril

J.F.

Akerman

Chiliad.

Alioto

Ambrosini

Antonarakis

S.E.

Behr

et al. (

2013

)

Cess of transcript reconstruction methods for RNA-seq

Nat. Methods

1177

–

1184

Cheng

Furtado

Henry

R.J.

(

2017

)

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts

GigaScience

ane

–

thirteen

Lodé

Fifty.

Ameur

Coste

Ménard

Richebourg

Gaillard

J.-B.

Le Bris

Béné

M.-C.

Lavabre-Bertrand

Soussi

(

2018

)

Single-molecule DNA sequencing of acute myeloid leukemia and myelodysplastic syndromes with multiple TP53 alterations

Haematologica

103

e13

–

e16

Byrne

Beaudin

A.East.

Olsen

H.E.

Jain

1000.

Cole

Palmer

DuBois

R.M.

Forsberg

East.C.

Akeson

Vollmers

(

2017

)

Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells

Nat. Commun

16027.

Yang

Sebra

Pullman

B.S.

Qiao

Due west.

Peter

Desnick

R.J.

Geyer

C.R.

DeCoteau

J.F.

Scott

South.A.

(

2015

)

Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS)

BMC Genomics

, 350–310.

Merker

J.D.

Wenger

A.M.

Sneddon

Grove

Zappala

Fresard

Waggott

Utiramerur

Due south.

Hou

Smith

K.S.

et al. (

2018

)

Long-read genome sequencing identifies causal structural variation in a Mendelian disease

Gen. Med

159

–

163

Pollard

Tommy

Cristina

Gurdasani

Investigators

(

2017

). The MHC diversity in Africa resource: a roadmap to agreement HLA diversity in Africa. Presented at: The 67th Annual Coming together of The American Society of Human Genetics, Orlando, FL, United states of america.

Borràs

D.M.

Vossen

R.H.A.M.

Liem

Buermans

H.P.J.

Dauwerse

van Heusden

Gansevoort

R.T.

den Dunnen

J.T.

Janssen

Peters

D.J.Yard.

et al. (

2017

)

Detecting PKD1 variants in polycystic kidney disease patients by single-molecule long-read sequencing

Hum. Mutat

870

–

879

Cavelier

Ameur

Häggqvist

Höijer

Cahill

Olsson-Strömberg

Hermanson

(

2015

)

Clonal distribution of BCR-ABL1 mutations and splice isoforms past unmarried-molecule long-read RNA sequencing

BMC Cancer

fifteen

Wilbe

Gudmundsson

Johansson

Ameur

Stattin

Eastward.-Fifty.

Annerén

1000.

Malmgren

Frykholm

Bondeson

M.-50.

(

2017

)

A novel arroyo using long-read sequencing and ddPCR to investigate gonadal mosaicism and estimate recurrence risk in two families with developmental disorders

Prenat. Diagn

1146

–

1154

Balderdash

R.A.

Eltahla

A.A.

Rodrigo

Koekkoek

S.M.

Walker

Pirozyan

M.R.

Betz-Stablein

Toepfer

Laird

et al. (

2016

)

A method for well-nigh full-length distension and sequencing for half-dozen hepatitis C virus genotypes

BMC Genomics

247

Ardui

Ameur

Vermeesch

J.R.

Hestand

M.South.

(

2018

)

Unmarried molecule real-time (SMRT) sequencing comes of historic period: applications and utilities for medical diagnostics

Nucleic Acids Res

2159

–

2168

This is an Open up Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in whatever medium, provided the original piece of work is properly cited.

ribushhigend.blogspot.com

Source: https://academic.oup.com/hmg/article/27/R2/R234/4996216

Ribush Higend

What Are the Benefits and Disadvantages of Long Reads Scaffolding Techniques?

Article Contents

Long reads: their purpose and place

Abstract

When Brusk Reads Are Not Enough

Long-Read Technologies

Single molecule existent fourth dimension sequencing

Oxford Nanopore Technologies

10X Genomics Chromium organisation

Centrolineal technologies

The Utility of Long-Read Technology: Recent Developments

High resolution genome assemblies

Targeted sequencing

Transcriptomics and RNA

Epigenetics

Clinical applications

The Future

Acknowledgements

Funding

References

Email alerts

Citing articles via

Menu Halaman Statis