28-01-2025, 10:26 AM
I spent some time with the manuscript over the festive break. I have an idea as to what I think is happening here regarding the text but I'll set that aside for the end. This write-up is a byproduct of the overall investigation, on its own it might be beneficial to people following their own research paths.
The experiment:
Can we establish correlation between folios and sections based on the number of words shared.
The method:
I wrangled a python script that goes through each folio, takes each individual word (above a length of 3 characters) and counts how many times it appears in any other folio. To balance the fact that different folios will have a different number of words I divide the final score by the number of words in the comparison page. The idea is that we are balancing the count against the event opportunity (although I admit this is likely not a *clean* scoring method).
The output & processing:
The output is a big CSV file with the scores for each folio/folio row/column. I imported this into a spreadsheet so that I could do further analysis. I formatted the cells so that they are colored on a gradient based on how much higher they are than the average.
Some advanced warning, this method of displaying the results creates an implicit symmetry between an entry and its counter, basically everything is mirrored across the diagonal and this will create create patterns that don't exist. The output spreadsheet looks like the following:
![[Image: AP1GczNAoeoFK9D-S74M8UNov2z0ridcCM9zV1EO...authuser=0]](https://lh3.googleusercontent.com/pw/AP1GczNAoeoFK9D-S74M8UNov2z0ridcCM9zV1EOelTmYqNJ52XydpZWAPnYrx9Ao-jYVciN-aMWyORGTkW38Snz7ZfGfl_k0VgHZK8UlVTvQdafilYeiRbKGKrekUQDwJvwHaKp3GL2Mr61KjVi6SmYyz7TCg=w1036-h825-s-no?authuser=0)
Starting from the top left corner F1r.
The darker the red color the higher above the average score.
The bordered lines are quires.
The gaps in the column and row headers are the missing folios.
Observations:
We can see very clear correlation between two sections in the bottom right, these are exactly the bathing and recipe sections.
Both the bathing section and the recipe section have very distinct correlation to plants in the second half of the herbal section.
These same plant folios have a strong correlation with the preparation section as well as the bathing and recipe section.
The second clear block is the plant section itself, we see clear correlation between all entries, and perhaps an indication that the last 3 quires of this section are somehow more tightly correlated than the previous 4.
The quires wrap very nicely to these... "islands of correlation", we can see the borders of the quires outlining the sections.
A nice example of this is the "vine" plant that appears in f17v, f96v, f99. It's correlated through theme and word correlation.
How does this help:
It helps us isolate sections to work on, there is a lot of text in the manuscript and reducing the attack surface can help.
We can also use this to validate folio and quire order, for example I've always been confused by location of quire 19 and suspected it belonged somewhere else, but looking at its position on the sheet it shares the same correlations as its neighbors, the bathing section and the recipe section, meaning it likely belongs where it is.
The speculation part:
The current status of the "is it a cipher or not" debate boils down to either "no its the product of a generative method that produces pseudo-language" or "yes but we don't know how or what" or "something else". If this research shows anything it shows that the manuscript text shares a logical correlation to its pictographic themes and even the physical structure (the quires). If this is pseud-language then it follows themes and relations. Also, I don't think these concepts are mutually exclusive, I think you absolutely can use a generative method as an encoding or enciphering mechanism. If I was to state it in a simple way it would be, "you hide a needle in a bunch of haystacks, but you're going to need to manufacture those haystacks". I think the generative mechanism takes words as a seed and produces pseudo-language as an obscured output. I also think there is a sort of pun going on here, it's not just the text being generated, the characters and plants are too, hence why we see so many unrecognisable plants.
The plants are gathered, cut up thrown into a device and turned into something new, like the words.
If I'd say anything, my theory is that a generative mechanism turns a word into many words, perhaps letter by letter. It adds pre/suff-ixes, modifies letters, adds false root etc so that we end up with what we have, a text poor in individual characters and overburdened with words.
I think the generative mechanism is shown in You are not allowed to view links. Register or Login to view. and is based on a solar quadrant (You are not allowed to view links. Register or Login to view.).
I think the female characters in the center are showing how a word is modified, what that process is.
I think the second outermost circle (containing the repeating pattern of single characters) manages the substitutions.
I think the contents of the generative mechanism is defined by the astal/zodiac section allowing it to be adjusted based on themes.
I think some settings for the mechanism are indicated by the specific female character and their association to a star, or the star present.
I think we have some working notes in the margins of You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. that are further clues to the process.
I think sections sharing an above average correlation are encrypted using the same scheme/settings.
What next:
I'd like to focus on identifying the core word roots and re-scoring everything based on that, I think that would help identify how the prefixes and suffixes are defined based on the mechanism state/settings.
Notes:
I added the missing folios and quires based on You are not allowed to view links. Register or Login to view..
The solar quadrant was taken from a reference in the interesting academic paper "The Voynich Manuscript as a Manual for the Habsburgs" by You are not allowed to view links. Register or Login to view.
You can get a PDF of the spreadsheet here You are not allowed to view links. Register or Login to view.
You can get a spreadsheet of the spreadsheet here You are not allowed to view links. Register or Login to view.
As is the way with such things, it turns out ReneZ had done this research prior. Head You are not allowed to view links. Register or Login to view.for more correlation goodness.
The experiment:
Can we establish correlation between folios and sections based on the number of words shared.
The method:
I wrangled a python script that goes through each folio, takes each individual word (above a length of 3 characters) and counts how many times it appears in any other folio. To balance the fact that different folios will have a different number of words I divide the final score by the number of words in the comparison page. The idea is that we are balancing the count against the event opportunity (although I admit this is likely not a *clean* scoring method).
The output & processing:
The output is a big CSV file with the scores for each folio/folio row/column. I imported this into a spreadsheet so that I could do further analysis. I formatted the cells so that they are colored on a gradient based on how much higher they are than the average.
Some advanced warning, this method of displaying the results creates an implicit symmetry between an entry and its counter, basically everything is mirrored across the diagonal and this will create create patterns that don't exist. The output spreadsheet looks like the following:
Starting from the top left corner F1r.
The darker the red color the higher above the average score.
The bordered lines are quires.
The gaps in the column and row headers are the missing folios.
Observations:
We can see very clear correlation between two sections in the bottom right, these are exactly the bathing and recipe sections.
Both the bathing section and the recipe section have very distinct correlation to plants in the second half of the herbal section.
These same plant folios have a strong correlation with the preparation section as well as the bathing and recipe section.
The second clear block is the plant section itself, we see clear correlation between all entries, and perhaps an indication that the last 3 quires of this section are somehow more tightly correlated than the previous 4.
The quires wrap very nicely to these... "islands of correlation", we can see the borders of the quires outlining the sections.
A nice example of this is the "vine" plant that appears in f17v, f96v, f99. It's correlated through theme and word correlation.
How does this help:
It helps us isolate sections to work on, there is a lot of text in the manuscript and reducing the attack surface can help.
We can also use this to validate folio and quire order, for example I've always been confused by location of quire 19 and suspected it belonged somewhere else, but looking at its position on the sheet it shares the same correlations as its neighbors, the bathing section and the recipe section, meaning it likely belongs where it is.
The speculation part:
The current status of the "is it a cipher or not" debate boils down to either "no its the product of a generative method that produces pseudo-language" or "yes but we don't know how or what" or "something else". If this research shows anything it shows that the manuscript text shares a logical correlation to its pictographic themes and even the physical structure (the quires). If this is pseud-language then it follows themes and relations. Also, I don't think these concepts are mutually exclusive, I think you absolutely can use a generative method as an encoding or enciphering mechanism. If I was to state it in a simple way it would be, "you hide a needle in a bunch of haystacks, but you're going to need to manufacture those haystacks". I think the generative mechanism takes words as a seed and produces pseudo-language as an obscured output. I also think there is a sort of pun going on here, it's not just the text being generated, the characters and plants are too, hence why we see so many unrecognisable plants.
The plants are gathered, cut up thrown into a device and turned into something new, like the words.
If I'd say anything, my theory is that a generative mechanism turns a word into many words, perhaps letter by letter. It adds pre/suff-ixes, modifies letters, adds false root etc so that we end up with what we have, a text poor in individual characters and overburdened with words.
I think the generative mechanism is shown in You are not allowed to view links. Register or Login to view. and is based on a solar quadrant (You are not allowed to view links. Register or Login to view.).
I think the female characters in the center are showing how a word is modified, what that process is.
I think the second outermost circle (containing the repeating pattern of single characters) manages the substitutions.
I think the contents of the generative mechanism is defined by the astal/zodiac section allowing it to be adjusted based on themes.
I think some settings for the mechanism are indicated by the specific female character and their association to a star, or the star present.
I think we have some working notes in the margins of You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. that are further clues to the process.
I think sections sharing an above average correlation are encrypted using the same scheme/settings.
What next:
I'd like to focus on identifying the core word roots and re-scoring everything based on that, I think that would help identify how the prefixes and suffixes are defined based on the mechanism state/settings.
Notes:
I added the missing folios and quires based on You are not allowed to view links. Register or Login to view..
The solar quadrant was taken from a reference in the interesting academic paper "The Voynich Manuscript as a Manual for the Habsburgs" by You are not allowed to view links. Register or Login to view.
You can get a PDF of the spreadsheet here You are not allowed to view links. Register or Login to view.
You can get a spreadsheet of the spreadsheet here You are not allowed to view links. Register or Login to view.
As is the way with such things, it turns out ReneZ had done this research prior. Head You are not allowed to view links. Register or Login to view.for more correlation goodness.