In broad terms there are three things we would like to sequence
In all these cases we get a DNA fragment, about a few thousands bp long (let’s say 500–5000bp)
Each fragment is a physical object
Each strand can be sequenced and produce a read
Usually we have two reads for each fragment
They are sometimes called “forward” and “reverse” reads
There are many sequencing methods, based on different physicochemical properties
They are in constant development. We will focus on their key properties
See DNA sequencing on the WikiPedia for more details
DNA is separated in 4 parts, each part is mixed with a different restriction enzyme. Then the fragments are separated by electrophoresis.
In a second version, each part is marked with a different fluorophores and then placed in a single capilar electrophoresis, with a light detector
Part of the vector will be sequenced in every read
The result is a chromatogram. Filenames ending in .AB1
or .SCF
Third generation
Generic adaptors are added to the ends and annealed to beads, one DNA fragment per bead
The fragments are then amplified by PCR
Each bead is placed in a single well of a slide
Each well will contain a single bead, covered in many PCR copies of a single sequence
The slide is flooded with one of the four NTP species
The nucleotide is incorporated when it matches the template
If that single base repeats, then more will be added
The addition of each nucleotide releases a light signal, and is registered with a hight resolution video
This NTP mix is washed away
The next NTP mix is now added and the process repeated, cycling through the four NTPs.
This kind of sequencing generates graphs for each sequence read, showing the signal density for each nucleotide wash
The sequence can then be determined computationally from the signal density in each wash.
Sequencing by synthesis
can sequence 0.7 gigabase per run
run time 23 hours
read length 700-800 bases
also sequencing by synthesis
over 70% of the market
Sequencing by ligation
Supported Oligonucleotide Ligation and Detection (SOLiD)
The advantage of this method is accuracy with each base interrogated twice
Major disadvantages
Method | Read length (bp) | Error rate (%) | No. of reads per run | Time per run | Cost per million bases (USD) |
---|---|---|---|---|---|
Sanger ABI 3730x1 | 600-1000 | 0.001 | 96 | 0.5–3 h | 500 |
Illumina HiSeq 2500 | 2 × 250 | 0.1 | 1.2 × 109 (pairs) | 1–6 days | 0.04 |
PacBio RS II: P6-C4 | 1.0–1.5×104 | 13 | 3.5–7.5 × 104 | 0.5–4 h | 0.4–0.8 |
Oxford Nanopore MinION | 2–5 × 103 | 38 | 1.1–4.7 × 104 | 50 h | 6.44–17.9 |