The Voynich Ninja

Full Version: Biocodicology - A Deeper Dive
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
The DNA that interests me is easy to isolate and the result can close the topic of modern counterfeiting and narrow the area of the origin of VMs.

In the process of studying QUIRE binding, I had a strong suspicion that the oldest thread (which I labeled T6) is a yarn (that is, animal, not vegetable).  You are not allowed to view links. Register or Login to view.
I am glad to hear there are records. On the matter of confidentiality, of course with permission, or known publicly. Actually, I had in mind the future, there are currently people who display dna data on genealogical sites and these extrapolate to their relevant relations, ie y chromosomal data extrapolates to all the males in the line, mitochondrial dna to all the relevant relations, autosomal matches made as they are added, so some might even be found publicly even for those who were alive in our target dating, if they are accounted for in the tree. Of course i also foresee some clashes occurring eventually, some surprises to come with them i'm sure upon unravelling the conflicts. I figure that in the past (prior to dna testing being developed) these conservateur rubbings or dusty detritus would not have been thought as something to save, hoping they all save them now. As i said i do think the parchment database would be more useful first, but as Mark was saying, if human dna were found also, why not start going down that road to see what can be learned. I am not suggesting it would be a quick process, nor that anything definitive would be learned, but who knows what might be learned from it until it is investigated. Wladimir's interest in the yarn also seems like a worthwhile pursuit, even the species of the animal or plant, if it turns out not to be yarn, could be of value, and it would seem that very little would be needed to determine this.
We're going five years onward with the next paper.  The full citation is

Teasdale MD, van Doorn NL, Fiddyment S, et al. Paging through history: parchment as a reservoir of ancient DNA for next generation sequencing. Philos Trans R Soc Lond B Biol Sci. 2015;370(1660):20130379. doi:10.1098/rstb.2013.0379

A copy of the paper is available You are not allowed to view links. Register or Login to view..

1. What research question was Teasdale et al. asking?
Can the use of next generation DNA sequencing (NGS) improve identification of what species was used to make parchment of two historical documents, particularly in light of the confusing results that standard DNA sequencing provided previously?

Please note that Teasdale et al. did other biocodiological measurements -- but I'm going to talk about the other techniques with other papers that focus on those those techniques.  The results of the other techniques did not contradict the NGS results.

Comparative highlights of this publication:

(i) What is being tested -- historical archival parchments (1600s; PA1 and 1700s; PA2), non-valuable (similar to Campana et al. but slightly older; dating was done paleographically (e.g. historical handwriting analysis)

(ii) How the samples were prepared – “5 X 5mm” square pieces (e.g. still destructive sampling).

(iii) The use of "next generation" sequencing (NGS DNA*) *note that NGS isn't necessarily DNA sequencing -- but that is what was done here.

This is by far the most notable difference between Campana et al. and this work.  The use of these different sequencing techniques mean that both different data was being collected and also different analysis is justified.  

What are the differences between next generation sequencing and standard sequencing?  If you recall from the prior posting, standard DNA sequencing involved (1) isolating the DNA from the sample (hoping it wasn't too degraded); (2) using targeted PCR to selectively amplify pre-selected areas of the genome; and (3) sequencing those amplified sequences and comparing them to known sequences from specific species (e.g. cow, goat, sheep, human) in order to identify the species involved.

In contrast, next generation sequencing is done by sequencing each and every sequence of DNA that is isolated.  Then the results are analyzed over multiple rounds using software based probability "callers" that evaluate each read for quality control, including comparisons to known genomes, and where possible reconstruction of longer sequences due to overlaps.  For example, Teasdale et al. was able to get the likely full mtDNA sequences for the sheep involved in producing PA1 and PA2 parchment (see below for why that is important).

Here is a quote from Illumina, a leader in the next generation sequencing area, that hones in on the key difference:

The critical difference between Sanger sequencing and NGS is sequencing volume. While the Sanger method only sequences a single DNA fragment at a time, NGS is massively parallel, sequencing millions of fragments simultaneously per run. This process translates into sequencing hundreds to thousands of genes at one time. NGS also offers greater discovery power to detect novel or rare variants with deep sequencing.

This is extremely well illustrated by the large number of reads involved in the Teasdale et al. experiments (see Table 1 below).

Here is an You are not allowed to view links. Register or Login to view. that compares and contrasts the uses of standard sequencing to next generation sequencing.

Note that one similarity between the two methods is that if the DNA is too degraded because of time, too harsh isolation approaches, or poor storage conditions, NEITHER approach will work.

2. What were Teasdale et al's results?

a.  The differing kind of data that is produced by NGS as well as how differently the data is analyzed are shown in Table 1 and Figure 1 (below).

[attachment=6134]

The incredible volume of reads (as well as the huge amount of elimination of reads not making the "grade") are indicated in this table.  Reads were eliminated when they didn't align to a genome and when they weren't "high quality." Note that the human sequences have already been removed in this table (see point c., below).


[attachment=6133]


So from Figure 1 it can be seen that sequences that matched with cow, goat, and sheep were all found.  However, because all the DNA that was recoverable had been repetitively copied using PCR, it was justifiable to count the percentage of sequences that were unique to each species and identify the species used for the parchment as the species that had the most unique sequences (show in RED by percentage of the total).

b.  Why weren't the other (non-sheep) sequences seen as contaminants?

Because they weren't working with a narrow, absolutely identified match (like Campana et al.) but instead non-discriminately amplified and sequenced all the DNA isolated, a percentage analysis was a valid approach.  Certainly, some of these sequences (in the blue parts of the histograms) could have been contaminants and unless greatly higher than the valid sequences, this would not be necessarily known, but see the control experiment that follows.

Teasdale et al. achieved essentially entire coverage of the ovine mtDNA sequences for each parchment.  Because of this, they were able to reliably identify those mtDNA sequences that did not match (e.g. bovine and caprine or other ovine).  This comparison showed that (at least for mtDNA) approximately 4% of the sequences were contaminants in PA1 and approximately 5% for PA2.  It can be assumed that would be the same for autosomal sequences as well.  It appears that this experiment was done to specifically counter the conclusions provided by Campana et al. and does seem to directly contradict their conclusions. 

c.  Were there any human contaminant sequences in this experiment?  

Yes, but Teasdale et al. simply removed those sequences before analysis.  The supplemental data indicates that human genome aligning sequence was approximately .01% of the reads in both PA1 and PA2.

d.  Why were Teasdale et al.'s results so different from Campana et al's?  

The use of next generation sequencing allowed for different processes to be applied to the results that removed the confounding data.  First, they had a much greater volume of results to work with, so small amounts of contamination could be more easily ignored.  Teasdale et al. suggested that the use of standard sequencing PCR approaches admittedly causes much more artifacts than NGS sequencing approaches and that could be a significant factor.  It is also hypothesized that the contaminants present in the Campana et al. samples may have been selectively utilized using the standard DNA sequencing approaches (perhaps because the target DNA from contaminating samples were less degraded).  This issue appears less dire using NGS approaches.  Also, Teasdale et al. applied post-sequencing (software based) quality control that removed a number confounding amplification results that may have been seen by Campana et al. (see the section discussing the use of FastQC) 


TLDR – Next generation sequencing appears to provide much more clear cut answers for the species question for parchment, as long as destructive sampling can be tolerated.

A couple more interesting results:  Though relative comparison of the amount of sequences on the X chromosome, Teasdale et al. were able to say definitely that both parchments had been made using skin from ewe (female) animals.  Because ewe cells have two X chromosomes (XX) while ram cells have only one (XY), through comparison to ratios of modern controls, they were able to quantify that the cells of both parchments had sequences that indicate two copies of the X chromosome.

Finally, because there were now reference genomes of single nucletotide polymorphisms (SNPs) for modern sheep breeds they could say that the polymorphisms found in both parchments "shaded" toward breeds located in the British Isles.  Please note that because the number of reads with SNPs in them were relatively very, very, very small, this is by far the most speculative conclusion in the publication (see Figure 2 in the publication). Ovine SNP containing reads accounted for a mere .018% and .007% of the total reads for parchment 1 and parchment 2, respectively.  It is cool that the small number they found did not contradict what is almost certainly the facts (e.g. the sheep were grown in and were related to those raised in the British Isles).

Finally, I'll also note that both of the conclusions in Teasdale et al. were foregone BEFORE the experiments were put together -- the parchments had been identified as sheep previously using other methods (perhaps, follicle analysis -- the paper doesn't say) and it was known that these parchments had been produced in the British Isles.  Such geographical certainty prior to analysis is certainly not available with the Voynich and thus a greater volume of SNP results (and a greater geographic spread of reference animal genomes) will certainly be required to draw any conclusions.

But that being said, these results are much more promising than Campana et al. and likely identify and overcome the issues seen in that earlier paper.  Progress!
I thought the following seemed interesting:

You are not allowed to view links. Register or Login to view.
(26-12-2021, 01:07 AM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Finally, because there were now reference genomes of single nucletotide polymorphisms (SNPs) for modern sheep breeds they could say that the polymorphisms found in both parchments "shaded" toward breeds located in the British Isles.

One question by the way, in You are not allowed to view links. Register or Login to view. for the ancestor of today's domestic chickens, does "SNP" also mean "single nucletotide polymorphisms" ?

Is this definition of SNPs correct? ( for me this is the most understandable one ).

Quote:SNPs are also referred to as "successful point mutations", i.e. genetic changes that have become established to a certain degree in the gene pool of a population, i.e. have become heritable changes therein.
(26-12-2021, 12:32 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.One question by the way, in You are not allowed to view links. Register or Login to view. for the ancestor of today's domestic chickens, does "SNP" also mean "single nucletotide polymorphisms" ?
Is this definition of SNPs correct? ( for me this is the most understandable one ).
Quote:SNPs are also referred to as "successful point mutations", i.e. genetic changes that have become established to a certain degree in the gene pool of a population, i.e. have become heritable changes therein.

Hi, bi3mw:

Yes, that is what SNP stands for -- note that when you are talking, it's called a "snip" or "snips."  You are not allowed to view links. Register or Login to view. is another definition that might include some useful differentiations.  One thing I don't like about your first definition is the implication that establishment within a genome is implied to be directly linked to function.  A subset of SNPs do have a direct functional impact, particularly if they are present in the part of the gene that encodes the instructions for the protein (e.g. are within the "open reading frame" that is transcribed into messenger RNA).  Then that change is seen in the protein and may provide an organism some sort of selective advantage.  

Note however that many other DNA changes often "go along for the ride" with other sequences that are actually impart the "selected for" function.  How closely a SNP might be associated with another selected for trait is called "linkage" and can be mathematically measured with enough crosses (offspring results).  The first DNA maps were actually linkage maps -- but now everything is done through sequencing.  Thus, existence of SNPs were identified many years ago but identification of them and their associations are very much still an ongoing project.  You are not allowed to view links. Register or Login to view. is an article that talks about how SNPs were isolated in the early 2000s told in the context of human sequences that you might find interesting.   

SNPs are what most individual identification using DNA is based upon.  If they are well enough documented, SNPs can be used to predict a fair number of things about the individual organism that the DNA was isolated from.  Groups of SNPs can result in an individual identification (think paternity testing).  As a way to identify individuals, it is similar to STR (short tandem repeats).  You are not allowed to view links. Register or Login to view. is a much later article that compares and contrasts the use of SNPs and STRs for commercial livestock (Angus cattle) parentage determination, thus is more applicable to some of the analysis that could be done with the Voynich.  But once you've got the hang of how SNPs work, you might want to read this general You are not allowed to view links. Register or Login to view. that talks about their limitations.

Hope this helps and happy to answer any other questions anyone may have.
(26-12-2021, 04:33 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Hope this helps and happy to answer any other questions anyone may have.

Thanks for answering questions.

I was curious as to how fast DNA degrades. I found the following article:

You are not allowed to view links. Register or Login to view.

To what extent can we expect that human or animal DNA from the time of the manuscript's construction has survived to the present day such that there are long enough strands of DNA even if incomplete that they would be suitable for sequencing? (I assume that even if strands are incomplete if they are long enough DNA sequences generated from separate strands can be combined to generate a full sequence.)

Is "touch" DNA likely to degrade much much faster?
(26-12-2021, 06:05 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I was curious as to how fast DNA degrades. I found the following article:



You are not allowed to view links. Register or Login to view.



To what extent can we expect that human or animal DNA from the time of the manuscript's construction has survived to the present day such that there are long enough strands of DNA even if incomplete that they would be suitable for sequencing? (I assume that even if strands are incomplete if they are long enough DNA sequences generated from separate strands can be combined to generate a full sequence.)



Is "touch" DNA likely to degrade much much faster?

Hi, Mark:

Thanks for that link to the paper reporting a generally assumed half-life of DNA at 521 years.  It appears this DNA had been "stored" in the bones of dinosaurs that had been underground with very similar humidity (groundwater) conditions.  So with those assumptions in place, that is how long it took for half the DNA to decay.  Estimates for other "rates of decay" will need to take in to account how the storage conditions vary from the conditions underlying their calculations.  One interesting conclusion is that the best temperature appears to be -5 degrees C (23 degrees F) --- and all libraries are definitely warmer than that (at least I hope so for the poor librarians).

I don't know very much about touch DNA, since that is a very "forensic" branch of these kinds of analyses which is different than my work.  But a little bit of poking around found the following possible publications:

1. You are not allowed to view links. Register or Login to view., Alketbi 
2. You are not allowed to view links. Register or Login to view., MIT Lincoln Laboratory
3. You are not allowed to view links. Register or Login to view., Sharma
4. You are not allowed to view links. Register or Login to view., Menchhoff et al.

I thought this last one may be the closest you're going to get because it involves stored latent fingerprint cards -- so the amount of time involved could be significantly longer than what the other more general articles might consider -- but obviously still a much shorter time period than what would be involved with the VM.

There is at least one publication about residue from touch in manuscripts that I haven't looked closely at, but could also be useful in pursuing your questions (I think some of these might have been referred to in some articles that have been linked in this string) and also here is a high level article that includes references to the article I just reviewed and one I think I'll be reviewing closely but probably not next (maybe the one after that):

1. You are not allowed to view links. Register or Login to view., Rudy
2. You are not allowed to view links. Register or Login to view., Gibbons

Hope these articles help clarify the issues.
I must admit my fascination with the idea of isolating and sequencing a scribe's DNA is that with that information there seems to be a potential for learning a lot about the Voynich manuscript. (I do think DNA analysis of vellum or ink if possible sound like excellent ideas so I don't see it as an either or situation.) Unlike for sheep or cows there are already massive and ever growing databases of DNA from contemporary humans. As well as a growing knowledge as to what DNA can tell us about a given human being and their characteristics. The prospect of even being able to work out the precise named individual from which that DNA came is very appealing.

However there are clearly questions that would need to be answered:

1) In 400 years how much DNA from the original scribe(s) is likely to have survived? This is why I raised the question to Michele as to how rapidly DNA from the touch of the scribe would have degraded and so how much there might be left around now.
2) How can the DNA from the scribe be distinguished from other later human DNA on the manuscript? Personally I think when ordinary people reading a manuscript touch that manuscript it is usually in the corner or on the side, whereas a scribe touches a manuscript where there is writing/drawing. So I think if a sample were taken thoughtfully from somewhere the scribe would have left their DNA, but where a normal reader would not have left their DNA the need to compare DNA from 100s of samples would be far less likely to exist. Is it realistically possible that a scribe's DNA could be preserved in the ink? Are there better places to look for a scribe's DNA?
3) Then there is the issue of ensuring that no damage is done to the manuscript. In principle it seems that this should be perfectly possible as with Carbon dating and the vellum DNA discussed elsewhere. However if little scribe DNA survives or is hard to locate then I can see that being problematic. I agree with other commenters that doing any significant damage to the manuscript would be highly unacceptable.
4) If a scribe's DNA is sequenced what information about the scribe can realistically be found out? This will increase with time as our ability to read DNA improves. At the moment if one sends one's DNA to an ancestry company they tend to generally provide people with a very crude idea as to where your ancestors are from; how narrowly can we expect to be able to narrow down our scribe's geographical origins now and in the future? For example if we could say that the scribe is from "Europe" that information would not be of much use or interest. Alternatively if we could say for example that the scribe is from "Sardinia" that would be much more interesting. In the light of recent discussions determining the scribe's gender would be interesting. Is it conceivable that we might be able to narrow down even further on who specifically the scribe might be and where more precisely they were from? Would it be realistically conceivable that through comparison with other DNA on a large contemporary DNA database that one could identify the scribe through their line of ancestry? So could we say as an example that the following list of people are descended from our scribe and then trace back the family tree to the individual in question, assuming they have surviving descendants?

(I don't expect you, Michele, to answer these questions as I know there are a lot and they are highly speculative in nature.)
Since I first got to know about the Voynich manuscript it has always seemed to me that the carbon dating was by far and away the most important development since Wilfred Voynich rediscovered the manuscript. So the value of external scientific tests and their possibilities rank high with me. Though I daresay I am inclined to be overly optimistic about what might be possible.
Pages: 1 2 3 4 5 6 7 8 9 10