The Voynich Ninja

Full Version: Biocodicology - A Deeper Dive
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
Hi, All:

I haven't really shared my background here on the board but I'm going to now to hopefully gain some "expert" credentials in this area.  I have been in the biotech industry basically since the time it was possible to patent biotech inventions (1989).  I do a significant amount of technical drafting support for companies that utilize all the sequencing technologies involved in biocodicology -- but for medical diagnosis and treatments.  Since I'm in the patenting business, I learn many cutting edge techniques -- some of which I can't share because they are confidential.

The biocodicology area has moved significantly since the 2009 article cited in the general chat thread.

You are not allowed to view links. Register or Login to view.is an article from 2019 (for free) that is a better representation of the present state of the area -- although since I have done a full review of what's out there in my Voynich work I do find this review a bit optimistic, but it is a good general introduction to the techniques available.  

I will say that biocodicology on the whole is behind the times significantly from what is happening in the medical world -- but that is not surprising given (1) funding challenges, (2) much lower number of labs performing this work, and (3) the understandable abhorrence of "destructive" analysis for historical objects.  Obviously, none of these things are as formable of an issue when applied to medical sequencing technique development.

I'm going to add (over time -- appreciate the patience!) to this thread with reviews of significant publications in this area.  In a bit of a spoiler, I will say that the results most strongly point to all the technical issues in performing and interpreting the results that have been obtained rather than providing any certain answers.  But it is worthwhile exercise and one that I'd like to share with the board (at least in part to try to "pay back" all the technical help I've gotten in my Voynich research from here).

In each case, I will attempt to relate the results of the publication to possibilities for the Voynich (as that was the goal of my review -- which is complete but not in an easily shared form).  Hopefully after the string of review posts are done, the board members can come away with a better understanding of what can and can't be done.  But, if there is any truth in biotechnology, it continually gets better and better at solving significant nucleic acid (DNA, RNA) and protein detection problems.  Thus, I have all expectations that someday these techniques will be useful to the Voynich -- but the twenty years prediction may not be far off, given the general issues above and the specific issues that will be discussed in the reviews.

The first review post to come soon.
Hi Michelle,

thanks for the introduction to the topic. I am already looking forward to the following posts.

Did I understand correctly that the twenty years that Raymond has quoted refer to the pure development time of the technology ? I had assumed that the technology would already be available but the provision of the infrastructure (extensive, accumulated database) would take this long time.

Am I correct in assuming that the samples needed from manuscripts will be minimally invasive ( μg / μl )?
(14-12-2021, 06:18 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Did I understand correctly that the twenty years that Raymond has quoted refer to the pure development time of the technology ? I had assumed that the technology would already be available but the provision of the infrastructure (extensive, accumulated database) would take this long time.

It's actually a mix (I'll be discussing the technique sensitivities and the slightly counter-intuitive issues surrounding that.  But sensitivities will increase over time -- it definitely has already).  But you are correct that the database building (e.g. collection of results to have something to compare) is likely to take the longest of the two issues to resolve.  You are correct this is less a technology issue and more a funding, number of experiments being done/number of labs involved and specimen access issue more than anything else.  Will have more concrete information about all of this in the reviews.

bi3mw Wrote:Am I correct in assuming that the samples needed from manuscripts will be minimally invasive ( μg / μl )?

Generally, yes -- but I'll be discussing the pros and cons of needing to harvest samples in this way as well.  As you'll see -- it definitely complicates the interpretations of results but is likely absolutely a deal-breaker in terms of obtaining access to many manuscripts.
It all sounds very interesting. I look forward to reading more!
Most welcome!
Before I begin review of the various articles out there in biocodicology it makes sense to establish some underlying biological information that is needed to understand each of the articles.

Each of the techniques that are going to be discussed measure and describe molecules present in living cells and in materials made from cells that were once living.  The chemical structure details aren’t necessarily important, although a tiny bit of an introduction as to what these chemicals do in the cell will help clarify some of the techniques’ challenges.

The techniques measure and characterize either nucleic acids or proteins.  All cells are basically protein factories.  Proteins are the building blocks and machinery of the body.  Proteins are made of strings of chemicals called amino acids.  The amino acids determine how the protein "folds" and there by determines its shape and function in the cell.  You are not allowed to view links. Register or Login to view. is some more high level information about proteins.  Deoxyribonucleic acid (DNA) can be considered the blueprint for how to make the proteins because it encodes instructions as to what amino acids and in what order are needed to make the proteins.  It is also called hereditary material because copies of the DNA are passed from cell to cell during growth (and from parent to child organism during reproduction).  Importantly, DNA is found both inside the nucleus (relatively large, central membrane bound area) of the cell but also in mitochondria, which are smaller, also membrane bound areas within the cell, where energy is produced.

DNA is found in a double helix structure (twisted ladder -- also called "double stranded").  Down the ladder there are only 4 choices of information inputs (called “bases or nucleotides”) and it is this identity/order of these strings of nucleotides that provide the information for making the protein.  The bases pair in a consistent way to form the rungs of the ladder.  If you break the molecule down the middle of the “rungs” you have two copies of the same information, because the rung bases always pair with the same complementary base on the other side of the ladder (called a “base pair”).  You are not allowed to view links. Register or Login to view. is a cartoon illustration of this.

One type of ribonucleic acid (RNA) found in the cell is an intermediary between the DNA and the protein production machinery.  Unlike DNA, RNA has only one side of the ladder (also called "single stranded").  It is called messenger RNA because it provides a copy of the blueprint “message” from the DNA that is transported to the part of the cell where the protein production happens.  Note that the RNA sequence is a copy of the DNA sequence because complementary base pairing “copy method” is used here.  The DNA “unzips” and RNA bases are paired to form a single strand copy. You are not allowed to view links. Register or Login to view. is a cartoon illustration of this.  

The primary characterization of DNA or RNA or protein is determining the precise order of the bases or amino acids.  This process is called sequencing.  The precise sequences found are what provide the unique signatures that allow DNA or RNA or protein to be identified as originating from a particular organism.  

However, it is important to understand that for relatively close evolutionary relationships the vast, vast majority of DNA, RNA, and protein sequences are the same no matter what organism it comes from.  This makes differentiating between relatively close relationships (such as any mammal from any other mammal – sheep from cow from goat from human) a relatively difficult task.  This becomes even more difficult when trying to differentiate between even more closely related sources such as one sheep breed from another sheep breed.  In many cases, the differentiating aspects between DNA or RNA or protein sequences of one breed from another are simply not known.  Figuring out these “differentiation” hot spots (and separating these information carrying results from random noise errors that are an inevitable part of any sequencing process and may look like “differentiation”) is a very challenging part of the applicability of these tests.

Note that parchment contains all of these molecules in greatly varying amounts.  It is mostly protein -- namely one kind of protein, collagen, a basic building block of skin.  But DNA and RNA are present as well.  Also note that time degrades all of these molecules making it progressly harder to determine the sequences.  The rate of degradation depends heavily on (1) what is present in the sample, which involves how it was produced (e.g. processing contamination) and how it was handled (e.g. environmental contamination); and (2) the material storage conditions.  I'll be discussing how this relates to parchment in future reviews. 

Cheatsheet

Protein   Building Block     Amino Acids              Variety of folded structures
RNA       Messenger         Nucleotide bases        Single Strand
DNA       Blueprint           Nucleotide bases        Double Helix
(17-12-2021, 06:45 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.....which involves how it was produced (e.g. processing contamination) ....

I think the same applies to the production of parchment:

Quote:From a chemical point of view, the production of leather involves a specific modification of the collagen fiber skeleton, in particular the dermis or corium, by means of substances introduced that change the structure of the raw material and lead to the stabilization of the cross-links of the collagen.

Does this manufacturing process influence the extraction of usable material ?
I am eagerly awaiting the next installment.
(17-12-2021, 07:20 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.I think the same applies to the production of parchment:

Quote:From a chemical point of view, the production of leather involves a specific modification of the collagen fiber skeleton, in particular the dermis or corium, by means of substances introduced that change the structure of the raw material and lead to the stabilization of the cross-links of the collagen.

Does this manufacturing process influence the extraction of usable material ?

Absolutely!  This will be a central topic of the first reference I review -- tentatively promised sometime this weekend.
Here is the full citation for the first paper I’m going to review.
Campana, Michael G., Bower, Mim A., Bailey, Melanie J., Stock, Frauke, O'Connell, Tamsin C., Edwards, Ceiridwen J., Checkley-Scott, Caroline, Knight, Barry, Spencer, Matthew, and Howe, Christopher J. 2010. "A flock of sheep, goats and cattle: ancient DNA analysis reveals complexities of historical parchment manufacture." Journal of Archaeological Science 37 (6) 1317-1325.

You can read and download the entire paper at ResearchGate You are not allowed to view links. Register or Login to view. although I don’t recall if I had to give them my e-mail in order to download.

1. What research question was Campana et al. asking?
Can standard DNA sequencing, with reliance on Polymerase Chain Reaction (PCR) amplification, then cloning, followed by sequencing of the results, provide identification of what species was used to make the parchment used for group of documents created in the 1700-1800s?

Some important points to note: 
(i) What is being tested -- relatively new parchment documents, compared to the Voynich, which were considered not valuable;
 
(ii) How the samples were prepared – “1-200 mm2” square pieces were removed using a scalpel or scissors” (e.g. destructive sampling).

(iii) The use of PCR (repetitive copying of preselected target DNA sequences from what DNA was isolated from the sample and cloning and sequencing of those results) You are not allowed to view links. Register or Login to view. is more information about PCR.
 
(iv) What genetic sequences were the targets of the PCR?

(A) Cytochrome B coding sequence – cytochrome b is an enzyme encoded on mitochondrial DNA.  You are not allowed to view links. Register or Login to view. is a link to the first group to propose using this gene to distinguish one species from another.   You are not allowed to view links. Register or Login to view. is a link to a full paper that is free that discusses this. 

A few things to note about mitochondrial DNA – inherited only from female (from mitochondria in the egg) so therefore is haploid (only one copy) and found in a circular double helix form.  There has been and continues to be controversy about the use of mtDNA in identification work.  

If you’re interested in learning more about the pros and cons of mtDNA You are not allowed to view links. Register or Login to view. is an article. Please note that citations of this article indicate a continued uncertainty even to 2021 about the appropriate use of mtDNA particularly for taxonomy relationships – which could complicate how the data can be used to move backwards and forwards in time with data within a single species.

(B) D-loop – this is another area of the mtDNA that has been commonly used to figure out the specific populations within one species from another.

( C) Short Tandem Repeats (STR) – these are autosomal (standard, nuclear, chromosomal) sequences.  There will be two copies of this genetic location (called an “allele”) in each cell – one inherited from male parent and one inherited from the female parent.

Because of the presence of many repeats these gene locations are highly polymorphic (lots and lots and lots of variation) so they are used to identify one individual’s DNA from another.  Thus, the point of this analysis was to see if any of the parchment sheets were from the exact same animal but would not (at this time) be used to identify the species.  

You are not allowed to view links. Register or Login to view. is a paper which discusses the issues with using STR analysis for animal species identification, in this case, badgers. Note that the test described in You are not allowed to view links. Register or Login to view. could maybe work for something like the Voynich with expansion to include goat – so there has been significant work on this since 2010. 

2. What were Campana et al.'s results?

a. The species for only three of the parchments were able to be identified using either the cytochrome b or D-loop results (see Table 1).  Campana et al. said these three were definitely bovine and likely the species Bos taurus.  One of the parchments only had sequences identified as goat – but four different goat sequences were isolated from that single specimen – calling this result into question.

b. The remaining eleven parchments contained sequences identified to be from multiple species (including humans) and a large amount of what Campana et al. termed PCR “artifacts.”  No conclusions as to even which species could be reliably drawn for these parchments.

c. The cytochrome b and D-loop results did not confirm each other as they had hoped.  All identifications were based on consistent results for EITHER cytochrome b OR D-loop.

d. The STR results were all inconclusive – i.e., they could not draw any conclusions about the relationships of any of the individual animals used for the parchments to each other – there was simply too many “peaks” (results) to be able to see the relationships.  In other words, all the parchments in this test looked like they came from multiple animals of different species.

Although the reasons for these results could be multiple laboratory errors or contamination (e.g. it is well known that modern bovine sequences are common contaminants because of the widespread use of fetal bovine serum for common lab practices), BUT Campana et al. did many control experiments to rule the great majority of these issues out – including sending all new samples to a second genetic lab that confirmed their results.

****Note I am greatly simplifying the mess of these results and if there is any part of it you want further details about, just ask!****

What was left was the likely conclusion that the method of making parchment resulted in so much cross-contamination of the resulting sheets with so many species’ DNA that it is impossible to focus on only the DNA that actually came from the cells present in the parchment.

TLDR – The Campana et al. results reflect an unhelpful mixture of bovine, ovine, caprine, and human DNA sequences which appear to be cross-contamination from the parchment production process.

Although this could be a result of using later more “industrially” produced parchment this isn’t entirely true because at least one other group testing leather had a similar result to theirs.  At this point in the development of the testing process, it looks like significant modifications of the testing method will have to be made for parchment to get useful results.

As you can imagine, this was not a particularly propitious start for the use of genomic analysis for parchment.

I’ll talk about whether this conclusion from 2010 has held up over time as I go through what has been done since.  Spoiler alert -- things are a little better -- but there are still many, many questions.
Pages: 1 2 3 4 5 6 7 8 9 10