The Voynich Ninja

Full Version: Ruminations on the Voynich Manuscript
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Has anybody else read this article/post: You are not allowed to view links. Register or Login to view.

I haven't yet had time to read and understand it all, but it appears very thorough.
I savored this blog post over the course of a weekend, and then left Prof Feaster a comment thanking him for one of the most epic reads I've had in a long time. I recommend this blog post as required reading to any serious newcomers, who want to be up to date on all the important VMs-related research that's happened in the past decade or so, and avoid reinventing the wheel. This guy really has done his homework, and doesn't let any of the major problems that stand in the way of an understandable VMs text go unnoticed or unaddressed. It's clear he is nothing close to convinced by any of the ideas he's read of how the VMs's text came to be, but I trust his judgement on which ideas are worth paying attention to and developing further.

I invited Prof Feaster to visit voynich.ninja and participate in the discussions here. I think he'd have a lot of value to add. He's got a unique background as an expert in historical media and the history of information storage, which is a multidisciplinary STEM and Humanities field that's rather relevant to this mystery. Even if he doesn't end up stopping by, I'm happy to see his blog post get the recognition it deserves here.
There's a handful of good nuggets in the article (e.g. consecutive words beginning with ch-), but it's mainly a lot of diffuse wandering and wondering that only a really determined few will be able to grind their way through.

I couldn't help but wish Patrick took on a single point at a time, it would be so much easier to work with.
[attachment=5759]
This is something I posted in a comment on Patrick's blog.

In "§ 5 Patterns of Similarity and Repetition", Patrick shows the two lines that appear in the top plot of the image. The plot is about word similarity in Voynichese paragraph text: the whole text is processed as a single long sequence of words.
Patrick's description of the plot and some of his discussion of it:

Patrick Feaster Wrote:Measurement of average similarity between words n and n+x for x in the range 1-20.  Blue: comma breaks disregarded (= fewer but longer words).  Orange: comma breaks treated as real word breaks (= more but shorter words). 

... 

There’s a sharp peak at n+2, which shows that horizontal pairs of words separated by one intervening word are actually more similar on average than words that are immediately next to each other.  This is followed by a periodic rise and fall at an interval close to the average length of a line.  Such a period would be consistent with vertically patterned word similarity within lines

What I did in the lower part of the image was adding more plots based on crudely filtering out words according to their position in lines. I used a slightly different similarity measure  (Levenshtein ratio, instead of the custom measure that Patrick discusses). My dark-blue and red lines correspond to Patrick's: these are close enough to show that the different similarity measure does not have a great impact.

The purple and light-blue lines show what happens if all the first and last words in lines are removed. Some of the observable effects:
  • similarity is increased;
  • the "periodic rise and fall" at 9/18 (dark blue) and 10/19 (orange/red) disappears;
  • the N+2 peak is amplified.
This possibly shows that the vertical patterns pointed out by Patrick are due to line-initial and line-final words being "special": roughly, each of the three sets of words (line-initial, inside-line, line-final) is made of relatively homogeneous words which differ from the words of the other two sets. By removing line-initial and line-final words, only inside-line words are left and this results in greater overall similarity.

The yellow and green line show the result of removing the fourth word from lines containing at least seven words (i.e. an inside-line word is removed). Effects:
  • similarity is decreased;
  • I would have expected the periodic peaks to still be observable, but it seems this is only the case with the green line (since one word was removed, the peak shifts from position 10 to position 9);
  • the N+2 peak is still visible, though attenuated, in the green line (i.e. when uncertain spaces are considered); it disappears in the yellow lines (uncertain spaces ignored).

Hacking lines like I did is certainly brutal. I guess that the less smooth behaviour of the resulting plots when compared with the dark-blue and red lines is due to this. It would be nice to come up with a method of analysis that further investigates the details without altering the nature of the text.

The feature that I am mostly curious about is the N+2 peak. Something similar was pointed out by Torsten in his paper about "Co-Occurrence Patterns in the Voynich Manuscript": in Table 2 he showed that identical words are more likely to occur at a distance of 2 (1.07%), rather than immediately next to each other (0.96%).
As you note, Torsten had an interesting paper on distance and similarity, noting potential patterns. He also showed that some languages also had patterns (though different). I really wanted him to test a huge range of texts to understand more about these kinds of patterns, but it never happened.

On your purple and light blue lines: is it that there are two patterns? Line initial words are similar to one another, which causes the 9/18 and 10/19 rises, while line-internal words are similar to each another but dissimilar to line initial? This by removing the line initial and final words, the longer repeat disappears and the main N+2 is strengthened because they make up a greater ratio of all words?

(I've still yet to work through the whole of Patrick's post! There's so much and I've been so distracted with other things.)
(16-08-2021, 05:01 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.On your purple and light blue lines: is it that there are two patterns? Line initial words are similar to one another, which causes the 9/18 and 10/19 rises, while line-internal words are similar to each another but dissimilar to line initial? This by removing the line initial and final words, the longer repeat disappears and the main N+2 is strengthened because they make up a greater ratio of all words?

Hi Emma,
yes, I agree with your idea. This also applies to line-final words, which so often include EVA:m/g which are rare elsewhere.

I also agree with your explanation of why the N+2 peak is amplified in the purple and light-blue lines. Possibly, the opposite happens in the yellow and green lines, where the N+2 pattern is disturbed by the removal of the 4th word; this has a greater impact when uncertain spaces are ignored and there are fewer "inner" words.
Hello MarcoP,

I'm wondering how much of the downward slope can be ascribed to the inconsistencies in vocabulary between pages, most of it I guess. Without it the peak at +2 would then be only a bias against "stuttering", which is normal in natural languages.
(17-08-2021, 04:13 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Hello MarcoP,

I'm wondering how much of the downward slope can be ascribed to the inconsistencies in vocabulary between pages, most of it I guess. Without it the peak at +2 would then be only a bias against "stuttering", which is normal in natural languages.

Hi Nablator,
my superficial opinion is that the 20-word window is too small for a great impact of variability between pages. As a cause, I think more of intra-page phenomena, like You are not allowed to view links. Register or Login to view.. But it would certainly be interesting to look more into this.

You are not allowed to view links. Register or Login to view.
Your observation about normal natural languages made me curious to see what happens with English.

[attachment=5762]

Apparently, the lines are quite flat, with the exception of a rising slope at positions 1,2,3 (the no-stuttering effect). Interestingly, the very repetitive King James Genesis shows a peak at position 3. I counted that identical words at distance 3 occur 800 times; at distance 2 they occur 315 times. Most of the distance 3 cases appear to be of the form Article, Noun, Preposition, Article:

and the name of the second river
shall keep the way of the lord to do
you give me a possession of a buryingplace
ye shall eat the fat of the land 

while N+2 cases tend to be and...and sequences:

the sheep and go and feed them 
himself and jacob and jacob fed 
with the ass and i and the lad will
without form and void and darkness was
Some researcher, I can't recall who, had a blog post comparing the statistical occurrence of both vords and the glyphs comprising them, in line-initial, line-final, mid-line, and label positions. The statistical patterns strongly supported the idea that these are four distinct populations of data, each with its own unique set of preferences and disinclinations that's not much like any of the other three. What surprised me most was the dissimilarity between line-initial and label positions, even when the first lines of paragraphs and Grove words were ignored. Can someone kindly link me that blog post, or at least remind me who wrote it?

The reason I bring this up, is that I wonder if, and how, label vords can be meaningfully be brought into this discussion. If I can collect the right tools and learn how to use them correctly, I'd love to do an experiment like this:

  1. Select a page of the VMs that has a lot of unambiguous labels
  2. For each label in turn, find, count, and mark all types occurring in the manuscript that are a Levenshtein edit distance of one from this label
  3. Tally how many tokens of each type occur in line-initial, mid-line, line-final, and label positions
  4. Use the total vord count for line-initial, mid-line, line-final, and label positions in that entire currier language / hand / thematic section / quire, to calculate the expected counts if the null hypothesis were true. The null hypothesis is that vords with a Levenshtein edit distance of one from any given label vord are equally as likely to occur in all of these positions, proportional to their fraction of the total text.
  5. If the counts deviate from those predicted to a statistically significant degree, deem the null hypothesis falsified, and consider the idea that label vords are more morphologically similar to vords in certain positions than others.
(18-08-2021, 03:37 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.Some researcher, I can't recall who, had a blog post comparing the statistical occurrence of both vords and the glyphs comprising them, in line-initial, line-final, mid-line, and label positions. The statistical patterns strongly supported the idea that these are four distinct populations of data, each with its own unique set of preferences and disinclinations that's not much like any of the other three. 
I'm not sure whether it's what you have in mind, but I made an argument something like this in section four of the "Ruminations" post to which Emma linked, including a comparative chart of beginning and ending glyphs:
[Image: voynich-starting-ending-glyphs1.jpg]
In the same place, I also compared the most frequent label vords with the most frequent vords in "internal" line positions within paragraphs, which drew out some additional contrasts (e.g., when common label vords end in certain ways, they tend also to be well-attested as line-internal vords, whereas when they end in other ways they tend not to be).

It would be interesting to see the results of the experiment you describe!
Pages: 1 2