We're going five years onward with the next paper. The full citation is
Teasdale MD, van Doorn NL, Fiddyment S, et al. Paging through history: parchment as a reservoir of ancient DNA for next generation sequencing. Philos Trans R Soc Lond B Biol Sci. 2015;370(1660):20130379. doi:10.1098/rstb.2013.0379
A copy of the paper is available You are not allowed to view links.
Register or
Login to view..
1. What research question was Teasdale et al. asking?
Can the use of next generation DNA sequencing (NGS) improve identification of what species was used to make parchment of two historical documents, particularly in light of the confusing results that standard DNA sequencing provided previously?
Please note that Teasdale et al. did other biocodiological measurements -- but I'm going to talk about the other techniques with other papers that focus on those those techniques. The results of the other techniques did not contradict the NGS results.
Comparative highlights of this publication:
(i) What is being tested --
historical archival parchments (1600s; PA1 and 1700s; PA2), non-valuable (similar to Campana et al. but slightly older; dating was done paleographically (e.g. historical handwriting analysis)
(ii) How the samples were prepared – “5 X 5mm” square pieces (e.g. still
destructive sampling).
(iii)
The use of "next generation" sequencing (NGS DNA*) *note that NGS isn't necessarily DNA sequencing -- but that is what was done here.
This is by far the most notable difference between Campana et al. and this work. The use of these different sequencing techniques mean that both different data was being collected and also different analysis is justified.
What are the differences between next generation sequencing and standard sequencing? If you recall from the prior posting, standard DNA sequencing involved (1) isolating the DNA from the sample (hoping it wasn't too degraded); (2) using targeted PCR to selectively amplify pre-selected areas of the genome; and (3) sequencing those amplified sequences and comparing them to known sequences from specific species (e.g. cow, goat, sheep, human) in order to identify the species involved.
In contrast, next generation sequencing is done by sequencing each and every sequence of DNA that is isolated. Then the results are analyzed over multiple rounds using software based probability "callers" that evaluate each read for quality control, including comparisons to known genomes, and where possible reconstruction of longer sequences due to overlaps. For example, Teasdale et al. was able to get the likely full mtDNA sequences for the sheep involved in producing PA1 and PA2 parchment (see below for why that is important).
Here is a quote from Illumina, a leader in the next generation sequencing area, that hones in on the key difference:
The critical difference between Sanger sequencing and NGS is sequencing volume. While the Sanger method only sequences a single DNA fragment at a time, NGS is massively parallel, sequencing millions of fragments simultaneously per run. This process translates into sequencing hundreds to thousands of genes at one time. NGS also offers greater discovery power to detect novel or rare variants with deep sequencing.
This is extremely well illustrated by the large number of reads involved in the Teasdale et al. experiments (see Table 1 below).
Here is an You are not allowed to view links.
Register or
Login to view. that compares and contrasts the uses of standard sequencing to next generation sequencing.
Note that one similarity between the two methods is that if the DNA is too degraded because of time, too harsh isolation approaches, or poor storage conditions, NEITHER approach will work.
2. What were Teasdale et al's results?
a. The differing kind of data that is produced by NGS as well as how differently the data is analyzed are shown in Table 1 and Figure 1 (below).
The incredible volume of reads (as well as the huge amount of elimination of reads not making the "grade") are indicated in this table. Reads were eliminated when they didn't align to a genome and when they weren't "high quality." Note that the human sequences have already been removed in this table (see point c., below).
So from Figure 1 it can be seen that sequences that matched with cow, goat, and sheep were all found. However, because all the DNA that was recoverable had been repetitively copied using PCR, it was justifiable to count the percentage of sequences that were unique to each species and identify the species used for the parchment as the species that had the most unique sequences (show in RED by percentage of the total).
b.
Why weren't the other (non-sheep) sequences seen as contaminants?
Because they weren't working with a narrow, absolutely identified match (like Campana et al.) but instead non-discriminately amplified and sequenced all the DNA isolated, a percentage analysis was a valid approach. Certainly, some of these sequences (in the blue parts of the histograms) could have been contaminants and unless greatly higher than the valid sequences, this would not be necessarily known, but see the control experiment that follows.
Teasdale et al. achieved essentially entire coverage of the ovine mtDNA sequences for each parchment. Because of this, they were able to reliably identify those mtDNA sequences that did not match (e.g. bovine and caprine or other ovine). This comparison showed that (at least for mtDNA) approximately 4% of the sequences were contaminants in PA1 and approximately 5% for PA2. It can be assumed that would be the same for autosomal sequences as well. It appears that this experiment was done to specifically counter the conclusions provided by Campana et al. and does seem to directly contradict their conclusions.
c.
Were there any human contaminant sequences in this experiment?
Yes, but Teasdale et al. simply removed those sequences before analysis. The supplemental data indicates that human genome aligning sequence was approximately .01% of the reads in both PA1 and PA2.
d.
Why were Teasdale et al.'s results so different from Campana et al's?
The use of next generation sequencing allowed for different processes to be applied to the results that removed the confounding data. First, they had a much greater volume of results to work with, so small amounts of contamination could be more easily ignored. Teasdale et al. suggested that the use of standard sequencing PCR approaches admittedly causes much more artifacts than NGS sequencing approaches and that could be a significant factor. It is also hypothesized that the contaminants present in the Campana et al. samples may have been selectively utilized using the standard DNA sequencing approaches (perhaps because the target DNA from contaminating samples were less degraded). This issue appears less dire using NGS approaches. Also, Teasdale et al. applied post-sequencing (software based) quality control that removed a number confounding amplification results that may have been seen by Campana et al. (see the section discussing the use of FastQC)
TLDR –
Next generation sequencing appears to provide much more clear cut answers for the species question for parchment, as long as destructive sampling can be tolerated.
A couple more interesting results: Though relative comparison of the amount of sequences on the X chromosome, Teasdale et al. were able to say definitely that both parchments had been made using skin from ewe (female) animals. Because ewe cells have two X chromosomes (XX) while ram cells have only one (XY), through comparison to ratios of modern controls, they were able to quantify that the cells of both parchments had sequences that indicate two copies of the X chromosome.
Finally, because there were now reference genomes of single nucletotide polymorphisms (SNPs) for
modern sheep breeds they could say that the polymorphisms found in both parchments "shaded" toward breeds located in the British Isles. Please note that because the number of reads with SNPs in them were relatively very, very, very small, this is by far the most speculative conclusion in the publication (see Figure 2 in the publication). Ovine SNP containing reads accounted for a mere .018% and .007% of the total reads for parchment 1 and parchment 2, respectively.
It is cool that the small number they found did not contradict what is almost certainly the facts (e.g. the sheep were grown in and were related to those raised in the British Isles).
Finally, I'll also note that both of the conclusions in Teasdale et al. were foregone BEFORE the experiments were put together -- the parchments had been identified as sheep previously using other methods (perhaps, follicle analysis -- the paper doesn't say) and it was known that these parchments had been produced in the British Isles.
Such geographical certainty prior to analysis is certainly not available with the Voynich and thus a greater volume of SNP results (and a greater geographic spread of reference animal genomes) will certainly be required to draw any conclusions.
But that being said, these results are much more promising than Campana et al. and likely identify and overcome the issues seen in that earlier paper. Progress!