The Voynich Ninja

Pages: 1 2

I haven't been able to share the slides of the presentation I did at Voynich Day since it is 40 MB. This is because I recorded the audio into the slides themselves so I have to work out how to undo that. I'm also working on how to explain the work better but in the meantime, here's a summary of the presentation.

The aim was about outlining line patterns beyond immediately apparent ones such as gallows usually being the paragraph start initial; p and f appearing mostly on the Top Row; final m being disproportionately at Line End; initial a and ch rarely being at Line Start, etc. Do we see other distinct behaviour in initials and finals (and sometimes word-middle) at Line Start, Line End, and Top Row? Does this vary by scribe or do we see similar trends?

Line Behaviour at Key Positions (Line Start, Line End, Top Row)

My method is rather unsophisticated compared to Emma's; it's just comparing percentages. For the line patterns, firstly I took three large separate chunks for comparison: Herbal A by Lisa's Scribe 1; Balenological by Lisa's Scribe 2; and Stars by Lisa's Scribe 3. This was both to be able to confirm when there is/isn't a cross-scribal tendency but also to reduce distortions in the stats where scribes might cancel each other out, etc.

I also split the text according to its position: for each scribal section, I separated top row words, line start words, line end words, and the "pure middle" (though it does include the bottom row mid-line words) from each other. This was to help spot any "Top Row effects", "Line End effect", etc, and also limit them interfering with each other.

So for example, ch is 24% of middle initials in Herbal A. Ceteris paribus, we'd expect to see it be 24% of LS initials, which would be about 300 ch. We see only 64, so there are over 200 "missing" instances. This makes initial ch "averse" to Line Start.

General issues that could distort/affect statistics include uncertain word breaks (likely especially affecting LE initials); transcription errors; choosing to focus on the scribal level rather than the folio or even smaller level; and my general incompetence.

The three scribes did often show similar tendencies, even if the size of their gaps can vary. Key similarities include for initials:

At Line Start: all three hate initial ch, and are attracted to initial y, d, s, hence the clusters ych, dch, etc. Simultaneously, all were attracted to /ch/ and /sh/ as word-middle glyphs, and /ai/ and /ee/
At Line End: All were averse to initial q, ch, and sh.
At Top Row (middle): All were averse to initial ch and attracted to initial o. Clusters like /opch/ are common.
At Paragraph End: All were averse to initial q, and prefer initial ch

For finals:

At Line Start: all three were averse to final y and attracted to final n
At Line End: all averse to final y and attracted to final m
At Top Row (middle): all were attracted to final y and averse to final n.

Despite these similarities, there were also some striking differences. The top ones included:

At Line Start: Scribe 1 in Herbal A is attracted to initial o and q. But Scribe 2 and Scribe 3 are averse to initial o (with some variance in the subclusters), and Scribe 3 in Stars hates initial q
At Line End: Scribe 1 in Herbal A absolutely loves initial da (as Marco points out, this can reflect Patrick's findings of d becoming more prevalent as you go rightward). Meanwhile, Scribe 2 and Scribe 3 are developing a fondness for rare clusters often beginning with l, r, etc and we see words that are either wholly or mostly exclusive to the Line End position.
For finals, Scribe 1 in Herbal A really likes final or at Line Start. This isn't a passion particularly shared by the others, but Scribe 3 in Stars is a little attached to final r for Paragraph Start.

I tried some reckless mapping of "missing" glyphs to "surplus" glyphs. I won't reproduce this here. But sometimes there was a vague resemblance between the missing and surplus word types:

Line Start is the obvious example. We see tons of missing initial ch, and simultaneously tons of surplus word-middle ch in clusters like ych, dch etc.
Top Row is similar. In its middle, initial ch vanishes, and simultaneously clusters like opch or qopch appear.

These may well be the same word types but the numbers don't always match up, and the finals are often different.

Other times, there is little resemblance between missing word types and surplus word types:

At Line Start, Scribe 2 and Scribe 3 tend to have large deficits for initial oka/ota/oke/ote words (with some exceptions), and for the q version e.g. initial qoka type words. But we don't see any clear simple surplus word types at Line Start that could be replacing them in sufficient numbers like initial ych etc might replace initial ch. If they are replaced by different word forms with the same meaning, we'd need to look at more creative mutations like initial s or d, which carry large surpluses at Line Start.
The Line End patterns mentioned above. Scribe 1 in HA's love affair with initial da words, while Scribe 2 and 3 are exploring initial l, r words like "lol"...yet the missing word types don't look very similar and so are hard to match.

"Vertical Impact Effect"
This was based on looking at how often glyphs are immediately under each other at Line Start. On folio 10r, you can see two lines - the 9th and 10th - that both start with o. I call this a "vertical pair" and denote it as o-o.

What's odd is that we rarely see this vertical pair in Scribe 1's Herbal A. It's really odd since Scribe 1 loves starting lines with initial o (and the others hate it). My calculations (hopefully right) were that we should see over 40 o-o vertical pairs. Yet there's this one and...well you can check it out on Voynichese.com.

Simultaneously and suspiciously, we see a similarly sized surplus of o-q vertical pairs.

Scribe 1 in Herbal A really dislikes q-q vertical pairs despite loving q as a Line Start initial. Scribe 2 in Balneological also shows a distaste for q-q. Both show a fondness for q-ch and q-sh, despite ch and sh being averse to Line Starts.

We also see Scribe 1 in Herbal A being attracted to the y-o vertical pairs, and Scribe 2 in Balneological liking d-q vertical pairs. And y in Scribe 3 in Stars seems to like hanging out too much on the bottom row of paragraphs. There were other patterns but those were the ones I thought most worth highlighting.

This seemed really bizarre. It seems the lower glyph in the pair is conditioned on what the upper glyph is. Assuming lines were written in order, that is. Why might it occur?

Is there an innate anti-duplication sentiment in the scribes where they hate reusing the same glyph immediately below another at Line Start (in the midline, there's more space to play around with)? But we don't see such a marked tendency with y-y or d-d. And s-s actually performs well.
Is it about space saving or avoiding clashes with the glyphs below (e.g. q-t is messy)? But surely o-o is fine in this regard.
Is it something to do with a role each glyph plays at Line Start? But what?

More general questions and thoughts
Does it also show scribal awareness, adapting to the circumstances and implied understanding of what they are writing? Or do the strong patterns imply there was a system of rules for them to blindly follow?

I couldn't think of a "natural", e.g. plaintext or linguistic reason for the Vertical Impact behaviour. They may well be at play for the other Line pattern behaviour above but it was hard for me to imagine they could be the main overarching cause.

Could the cause be that the text is meaningless? If meaningless, it doesn't really matter what glyphs are where. But these patterns would require a system with some strong and seemingly arbitrary rules, e.g. "Avoid writing an o directly under another o at Line Start and write a q instead; avoid writing a q under another q and write ch/sh instead, etc"

The same thing would apply if we consider the Line Start initials to be meaningless nulls attached to the real word as part of a cypher hypothesis.

If they are not nulls, are they real? And does that mean the "shorter" word in the midline is an abbreviation, e.g. ychol becomes chol (which as Koen noted is a really weird way to abbreviate)

And lastly as part of the cypher paradigm, if the "mappings" reflect homophones and abbreviations, the apparent interchangeability of glyphs may pose the risk of running out of plaintext letters and making it illegible for even the authorized reader, unless we posit some further internal distinctions or external references.

Thanks, Tavie, this helps a lot. Note about the abbreviation that I was thinking exclusively about natural language. We have a very strong tendency to leave the first letter intact, since it is crucial to the identity of the word. Conversely, non-initial vowels are the easiest to omit.

I think it could still be abbreviation (shortening) in the sense that the omitted glyph is the most optional one. Like some kind of marker that can be left out or whatever.

Regarding analysis, I wonder if it would help to write some kind of "profile" for each glyph? Where is it preferred, where is it left out...

Hi tavie, I find all Line Patterns fascinating, so it was lovely to hear your talk. There's clearly a problem with line start and line end words which goes very deep.

The Vertical Impact Behaviour is definitively new. I've never seen nor heard anybody discuss this before. I can't think of a linguistic reason, nor even any good reason. It seems to be a purely(?) stylistic choice. But I'm not sure a) why that would be a thing, and b) how the text allows it. I hope to hear more about it if there's anything else you think could be researched on the same topic.

Can we have a table of the scribes and the vertical patterns they like and dislike? Just all the information in a single glance. It would seem---at least I think---that all the alternating pairs are in a single group of glyphs. This group agrees with what Marco and I called the "Weak" group, and which Guy identified as vowels. This at least allows for some follow-up theories as to what might be happening, or starts to put a boundary round the phenomenon.

Here is You are not allowed to view links. Register or Login to view. highlighting o-words and q-words in Currier A: it makes it easy to check the line-start vertical patterns discussed by Tavi, in particular the fact that lines starting with o- tend to be followed by lines starting with q-, and line start repetitions o-o q-q in two consecutive lines are avoided.

I collected a few examples, but please check other pages as well, I think this pattern is extremely impressive:
[attachment=8984]

In addition to the exception in You are not allowed to view links. Register or Login to view. mentioned by Tavi in her talk, I found two more exceptions for o-o in Currier A, f58r/v, text pages by scribe 3. These two are also bizarre (one has oqot-, like 10r, the other has a line that is aligned to the right).
[attachment=8983]

Hi Tavie,

this is an great article. I would suggest that you publish your results in a peer reviewed journal like Cryptologia.

There is some related research that comes to my mind.

The first mention of line effects is the paragraph about "The Line Is a Functional Entity" in the paper of Prescott H. Currier from 1976. Currier points out: "The first point is that the line is a functional entity in the manuscript on all those pages where the text is presented linearly. There are three things about the lines that make me believe the line itself is a functional unit. The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally." [You are not allowed to view links. Register or Login to view.].

There is also the paper of Andreas Schinner from 2007. Schinner points to long range correlations within the Voynich text: "Interpreting normal texts as bit sequences yields deviations of little significance from a true (uncorrelated) random walk. For the VMS, this only holds on a small scale of approximately the average line length; beyond positive correlation build up: the presence/absence of a symbol appears to increase/decrease the tendency towards another occurrence" [You are not allowed to view links. Register or Login to view., p. 105]. In my eyes this results demonstrate statistically what you call an "Vertical Impact Effect".

There is also an interesting analysis of Elmar Vogt from 2012. Vogt demonstrates that: "1. The first word with i = 1 of a line is longer than average, ?1 > ?, 2. The second word with i = 2 is shorter, ? 2 < ?" [You are not allowed to view links. Register or Login to view.: p. 4]. In fact, the second glyph group is shorter than the first group in 48% of the lines and longer in only 32%. In my eyes this research also indicates that the first word plays a special role.

I have researched this effect back in 2014: Starting point of my research of the VMS was the idea that it was necessary to find a way to limit the number of words to look at. My idea was to check only words occurring exactly 7 or 8 times in the VMS (see Timm 2014, p. 13). This way it was possible to check each word in detail. At the same time there is no reason to believe that this words behave differently than any other words. One word that exists only 8 times in the VMS is the word 'dsheey' (You are not allowed to view links. Register or Login to view.) and one word with seven occurrences is the word 'dalam' (You are not allowed to view links. Register or Login to view.). What surprises me was that 'dsheey' exists in seven out of eight instances at the start of a line and 'dalam' exists in six out of seven cases at the end of a line! I found it stunning that even for rarely used words such striking patterns exists. This was in fact the first pattern I was able to spot in the text. This pattern convinced me that it is possible to describe patterns for the Voynich text. In my eyes there are not only line end glyphs like EVA-m but also line start glyphs like EVA-y, EVA-o, EVA-d and EVA-s (see the tables IX., X. and XI on p. 85 in Timm 2014).

These line start patterns then lead to further patterns. For instance the second glyph group in a line occurs twice as often as a subgroup of the first group (2.6%) than this is the case for any other groups in a given line (1.3%) (see p.20 in Timm 2014). My conclusion for this finding was "Since EVA-y, EVA-i, EVA-d or EVA-s were added frequently to the first glyph group within a line and since the gallow glyphs k, t, p, and f were added regularly to the initial glyph group within a paragraph the average length of the first words within a line increases." [You are not allowed to view links. Register or Login to view., p. 19].

I have three ideas regarding paragraph and line initial glyphs.

My first idea was that the paragraph initial gallow glyph had some influence on the text. For instance the words in the first line of a paragraph contain more gallow glyphs and are on average longer (see Timm 2014 p. 29). One idea might be that the initial gallow glyph also had some influence on line initial glyphs.

My second idea was that paragraph and line initial glyphs are used on purpose to avoid to much repetition on this places. If the author had the impression that he repeats sequences too often he could start a new paragraph. So maybe the the author was using features like paragraph and line initial glyphs to introduce new features into the text and to generate this way some unique words. In this case these observations are "an unintended side effect of the self citation method used to generate the text. The source for the first word in each line could only be found within the previous lines. Since the first and the last word in each line are easy to spot, the most obvious way is to pick them as a source for the generation of a group at the beginning or at the end of a line." [Timm 2014, p. 20].
However if the first or the last words are used frequently as source words for generating new words in the same position this could lead to easy to spot repeated features. Therefore the introduction of some line initial glyphs could be useful to overcome this problem.
"For the second glyph group it is also possible to select the first group as a source. Since the first group in a line usually has a prefix (see o and 9 underlined in red in figure 8) the simplest change is to remove this prefix. And it is indeed possible to find examples of such changes. For instance for the first paragraph on page <f3r> there are two occurrences in which the leading 9 ("y") is also removed for the second glyph group:" [Timm 2014, p. 19]. Moreover it is even possible to demonstrate that the "the second glyph group in a line occurs twice as often as a subgroup of the first group (2.6%) than this is the case for any other groups in a given line (1.3%)" [Timm 2014, p. 20].

My third idea for the usage of paragraph and line initial glyphs was that maybe it was some kind of aesthetic preference of the author of the VMS, like the scribe also prefers EVA-y in word initial and word final position. Or the way the scribe highlighted every second star on You are not allowed to view links. Register or Login to view. and on the following folios.

(08-08-2024, 12:11 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The Vertical Impact Behaviour is definitively new. I've never seen nor heard anybody discuss this before. I can't think of a linguistic reason, nor even any good reason.

Agreed that this is an intriguing new discovery, and all the more so because it's such a counterintuitive pattern that I doubt most people would ever have thought to test for it!

At one point, just after reading Emma and Marco's paper on word-break combinations, I briefly checked to see whether there might be any analogous anomalies among line-break combinations (end of one line - start of next line). As far as I could tell, there mostly weren't. But as I recall, there was at least one notable exception (calculated across the manuscript as a whole): lines ending [m] appear to be followed by lines beginning [q] only around 58% as often as they "should." Given that lines beginning [q] also feature prominently in this Vertical Impact Effect, I wonder how these two apparent anomalies might be interacting with each other.

(08-08-2024, 06:52 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.(calculated across the manuscript as a whole): lines ending [m] appear to be followed by lines beginning [q] only around 58% as often as they "should." Given that lines beginning [q] also feature prominently in this Vertical Impact Effect, I wonder how these two apparent anomalies might be interacting with each other.

Hi Patrick,
I wrote a quick script to check the different sections, and I get the impression that q-starting lines and m-ending lines are anti-correlated. E.g. Q13/Bio has many q-starting lines and few m-ending lines (the ratio is about 3:1). Q20/Stars shows the opposite: 1 q-starting line for 3 m-ending lines. All sections show a strong unbalance, so the two features are unlikely to appear consecutively in any one section, but could be expected to be frequent if one considers the text as a whole, since both features are overall frequent on the average.

[attachment=8985]

I think it's striking that o-o and q-q vertical repetitions are absent from HA with so few exceptions. Typically, Voynich patterns appear as preferences rather than hard rules. This is of course different at word level, where word structure has rules with only a minimum of exceptions.

(09-08-2024, 10:46 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I wrote a quick script to check the different sections, and I get the impression that q-starting lines and m-ending lines are anti-correlated. E.g. Q13/Bio has many q-starting lines and few m-ending lines (the ratio is about 3:1). Q20/Stars shows the opposite: 1 q-starting line for 3 m-ending lines. All sections show a strong unbalance, so the two features are unlikely to appear consecutively in any one section, but could be expected to be frequent if one considers the text as a whole, since both features are overall frequent on the average.

Thanks for this! Yes, if those two features vary so much by section, that would account for the "anomaly" in a (sadly) less interesting way.

The next-most-anomalous case I'd noticed before was that lines ending [n] are followed by lines beginning [Sh] about 140% as often as they "should" be -- but maybe that's equally an illusion based on differences among sections?

If these few cases of apparent line-break combination patterns turn out not to be significant, that would make tavie's "Vertical Impact Effect" all the more striking and mystifying. Lines seem to run from left to right and secondarily from top to bottom, and yet the start of each new line would be constrained significantly by the start of the previous line -- but not at all by the end of the previous line.

Are there any "Vertical Impact Effects" associated with the ends of lines? Or do adjacent lines appear to be linked by their starting points, but not by where they end up?

(08-08-2024, 01:41 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.Scribe 1 in Herbal A really dislikes q-q vertical pairs despite loving q as a Line Start initial. Scribe 2 in Balneological also shows a distaste for q-q. Both show a fondness for q-ch and q-sh, despite ch and sh being averse to Line Starts.

There is also no ch-ch vertical pair anywhere in paragraphs: a rare rule without any exception! Not too surprising however because as you said ch is averse to Line Starts so there would be around 4 if paragraph lines were randomly shuffled.

3 c-c vertical pairs, all with ch and a benched gallows glyph (not real pairs I guess):
<f76v.10,+P0> cphdor...
<f76v.11,+P0> cheor...

<f106v.43,+P0> chedy...
<f106v.44,+P0> cthey...

<f116r.28,+P0> cheol...
<f116r.29,+P0> cthan...

Thanks for everyone's comments and suggestions; there's a lot of thought-provoking stuff!

This isn't the final version of the table, since I need to go through and check for consistency in how mis-alignments (like Marco's second Scribe 3 example with the right-aligned text) and large image breaks are treated, but I don't think this will affect the larger gaps. Thanks also to Marco for the suggestion of having a column for the actual/expected ratio.

Looking at the ones with the largest surpluses, it's interesting to see that there isn't much reciprocity. e.g.

o-q has the huge extra surplus in Scribe 1's Herbal A of about 38, while q-o has only about 8 extra. q-o is the most popular q-? pair but Scribe 1 appears disproportionately to avoid q-q by turning to q-ch.
d-q has about 19 extra in Scribe 2's Balneological, while q-d is reasonably close to the expected number. d-q is the most popular d-? pair but q-d is not for q-?. It is beaten in raw numbers by q-s, and Scribe 2 also appears disproportionately attracted to q-ch and q-sh.
y-o has about 17 extra in Scribe 1's Herbal A, while o-y is reasonably close to the expected number. y-o is the most popular y-? pair but o-y is easily beaten by o-q for the most popular o-? pair.

Pages: 1 2

tavie

Koen G

Emma May Smith

MarcoP

Torsten

pfeaster

MarcoP

pfeaster

nablator

tavie