The Voynich Ninja

Pages: 1 2

It looks like Stolfi leaned towards seeing EVA-m and g as shortened versions of in-clusters. To me this feels right intuitively. But I wonder, has anyone looked into this further? Are there any objections or better solutions?

For reference, here is the section I'm referring to, from You are not allowed to view links. Register or Login to view.

Quote:About m and g
It seems that the letter m is inordinately common at the end of lines, and before interruptions in the text due to intruding figures. The letter m, like the IN groups, is almost always preceded by a or o (862 tokens in 950, 91%). We note also that dam and am are the most common -am words, just as daiin and aiin are the most common -aiin words. Perhaps m is an abbreviation for iin (and/or other IN groups), used where space is tight.

On the other hand, the truth may not be that simple. of the 950 tokens that contain m, 56 (5.8%) are preceded by ai or aii rather than a alone.

The rare letter g, like m, occurs almost exclusively at the end of words (24 tokens out of 27); however, unlike m, it is not preceded by a. We note that g looks like an m, except that the leftmost stroke is rounded like that of an a. Perhaps g is an abbreviation of am?

There are 32 tokens that end in m, but not as am, om, or im. It is possible that these tokens are actually instances of g that were mistakenly transcribed as m --- a fairly common mistake.

Koen, I’d like to start by saying your last video, talking about the small (and potentially even smaller) Voynichese glyph set, was pretty insightful, and gave me a lot to chew on mentally. I’m attempting to teach myself Hebrew and Arabic, so this has kind of Baader—Meinhof’d my mind into thinking about positional variation in other writing systems, and other forms of symbolic notation.

@-JKP- was the one who schooled me in medieval scribal abbreviations. Apparently the little loopty-loo final stroke of EVA=[m], EVA=[g], and the gallows glyphs, U+A76C or U+A76D, typically was shorthand for -is or -Cis at the end of a word. Many uses of this stroke occurred in the final word of a line, in order to save space, and time spent writing. I also remember -JKP- saying that this stroke was an ancestor to our present-day use of the hyphen to break a word at the end of a line. Apparently when the last word of a line was a partial word ending in a pigtail, there was little consistency in how much, if any, of the remainder of the word could be found at the beginning of the next line.

I’m with you and Prof Stolfi on EVA=[m] being a line-final variation, and ligature, of EVA=[iin]. I’m with Emma May Smith on EVA=[a] and EVA=[y] being equivalent. As for EVA=[g], my guess is a variation and ligature of EVA=[dy] or EVA=[edy], but that’s pure intuition. Lol. The problem with the latter equivalence is that those word-final ngrams are so common, at least in Currier B, and EVA=[g] is so rare. It raises the question to me of why EVA=[dy] and EVA=[edy] weren’t always abbreviated as EVA=[g].

I think any proposed Voynichese glyph and ngram equivalence is testable and falsifiable, as long as there are enough occurrences of both in a well-made transcription of the Manuscript. In other words, I think textual analytical statistics can be used to support a hypothesis that one unit of Voynichese writing is equivalent to another, by showing that they occur in highly similar environments, with the exception of one crucial factor, more often than would be expected on chance alone. The results could give some idea of the general rules governing what textual environments seem to strongly favor one over the other. Nablator’s and Patrick Feaster’s “heat maps” of the line, paragraph, and page positions favored by various Voynichese ngrams come to mind. Twould be mighty interesting to me if the heat maps for EVA=[aiin] and EVA=[am] filled in each other’s gaps when superimposed.

Throw in a pinch of LAAFU to my magic potion, and you can see how a way to cut a line short for lack of space might have been a very useful shortcut to the VMS’s creators.

(20-10-2024, 04:23 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.It looks like Stolfi leaned towards seeing EVA-m and g as shortened versions of in-clusters. To me this feels right intuitively. But I wonder, has anyone looked into this further? Are there any objections or better solutions?

This seems to be tied specifically to an attempt to explain characters that appear mostly at the ends of lines in terms of abbreviations resorted to for lack of space. That is, there doesn't seem to be any reason to assume [m] and [g] might be abbreviations apart from their unusual distribution.

I'm inclined to suspect they aren't abbreviations, on these grounds:

(1) There are few enough graphemes in Voynichese as it is. Reducing the number yet further feels a bit like a step in the wrong direction.

(2) Many glyphs have stronger or weaker "positional preferences"; [m] and [g] are just a bit more conspicuous about it than some others. Explaining these two cases as abbreviations wouldn't resolve the broader "positional preference" problem.

(3) It's easy to find [m] and [g] in places where the scribe had plenty of room left in which to write, and [n] in places where space was very tight -- see for example [n] actually overlapping parts of plant illustrations on You are not allowed to view links. Register or Login to view. and f49v. If the scribe had the option of writing [m] instead of [iin] (or whatever), why not take it there?

(4) The number of [e]s preceding [g] tops out at the same point as the number preceding [b] and [d]: maximum [eeeg], [eeeb], [eeed]. And as Stolfi observes, there are also plenty of [m]s that are preceded by [i] or [ii]. If [m] and [g] were abbreviations whose purpose was to obviate the need to write out multiple [i]s or [e]s, they must have been employed very inefficiently!

Yes, the two glyphs are comparable with Latin abbreviations for d- r-. Actually, the downward stroke is often used for truncation, and can stand for anything (but '-is', is frequent, in particular in -tis that looks somehow like EVA:k k). But several Voynichese characters look like Latin abbreviations, even when they don't seem to behave like abbriations (e.g. EVA:k,t,sh).

You are not allowed to view links. Register or Login to view. gives a few examples for 'd' abbreviations similar to g (but the ductus is different). Most of his examples for 'r' are based on the capital version R. Here are a couple of examples from You are not allowed to view links. Register or Login to view. ('moribus', 'aeris').

[attachment=9324]

I remember discussing the two Voynich characters in the past, but I don't remember the details. Their position at the end of line does suggest abbreviations, but Patrick's point about general left/right preferences is a good one. A possible approach to test abbreviations could be going page by page and checking the words that correspond to the same word prefix (the part before m/g) and seeing what can appear in place of m/g.

Yes, there does seem to be a disagreement or reasons to question the mapping. Emma You are not allowed to view links. Register or Login to view. final m might be replacing final r.

My own headcanon is that final m is an abbreviation for more than one final cluster, but that it's a less desirable abbreviation used predominantly near line end or image breaks because it has lower information content, and that radical transformation is going on elsewhere in the word ... if (and that's a big 'if') the assumption below holds.

I've looked at it operating on the assumption that we should expect Line End words to reasonably resemble words elsewhere, i.e. in the midline. There could be reasons against this, e.g. Line End words may disproportionately contain words that are found more at the end of sentences, which could distort the statistics. Or if it's all meaningless.

But keeping that assumption, if final m is replacing final n, we could expect final n word types to be "missing" at Line End to a reasonably similar amount as the amount of extra final m word types. The attached table shows expectations based on mid-line performance (with top row excluded due to its unusual behaviour).

[attachment=9326]

In summary:

For Scribe 3 in the Stars section, final m has the largest gap by far: over 160 extra instances at Line End. None of the "missing" finals reaches even half of that. By numbers alone, this would suggest that final m word types are replacing at least three of the four common finals, if not all of them. Incidentally, most of the missing final n word types are final kaiin or kain.
For Scribe 2 in the Balneological section, final m is again the only final with a large surplus at Line End. Numbers wise, it's conceivable that it is replacing most of final n and final y word types.
For Scribe 1 in Herbal A, we do see a difference in Line End behaviour. Final m isn't the only one with a massive surplus: we see final n actually beat it, mostly final daiin. So it seems harder to argue that final m is replacing final n here. (NB: final y's score is deceptively low because it is torn between final dy having a surplus of over 50 and final ey being missing about a similar amount)

The problem, though, is that even if we "swap" the finals, there is often a mismatch between the missing word types and the surplus word types. It's easier if the missing final is preceded by a. That's what makes final aiin quite an appealing swap for final am. But even for that kind of mapping, it's not that simple. Final kaiin/kain tend to be the driving force behind missing final n at Line End, but we don't see final kam to the same degree. It's much worse if we're looking at mapping final m with a final that isn't (visibly!) adjacent to an /a/, e.g. final dy or final or.

There are similar problems trying to map the differences between other locations' behaviour as well, putting a lot of pressure on the assumption above. We can keep the assumption if we allow for more radical transformation, but that leads to the other problems I talked about on Voynich Day in terms of reducing glyph range, word type diversity, and intelligibility.

Is it possible that certain contexts "reminded" the scribe to do the m-replacement more than others? Or is it really all over the place?

Patrick: I agree with your points apart from the first one. There I strongly disagree Smile

I don't see the point in having an extra glyph if it behaves like eva-m. We might be getting closer to understanding the system if we figure out what m may be equivalent to. Whatever role m may be assigned in a solution cannot be that of a full letter in the first place, so it is already not a stand-alone part of the alphabet. Or am I missing something?

(20-10-2024, 04:23 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.It looks like Stolfi leaned towards seeing EVA-m and g as shortened versions of in-clusters. To me this feels right intuitively. But I wonder, has anyone looked into this further? Are there any objections or better solutions?

Eva-m m and Eva-g g appear in different contexts.

g looks like y with an extra loop. It could also be seen as a combination of d and y .
Replacing g with y generally results in valid words, but replacing g with dy does not, because g often follows d.

So, my favourite hypothesis is that g is just a beautified version of y that is used preferably at line ends.
Replacing g with in (or similar) does not generate a lot of valid words.

A similar argument can be made for m . However, it is a bit less straightforward. There are a lot more m (over 1000) than g (around 160).
It might seen that m could be a beautified version of r , but the parallel with g suggests that it could rather be a beautification of l . Both options lead to valid words, if one were to replace the m .

The thing that leaves me more in doubt here, is that there are zodiac labels that end with m, and the equivalent words ending with r or l also exist as zodiac labels. (This is certainly true for r, but I am not sure about l ).
This was already discussed here in the past.
Neither the pro nor the con arguments are very compelling, but for me these are the more likely explanations.

The above is part of a 'grander' scheme about the shapes of word final characters.
These all tend to be a c or an i-shape with an additional swirl.

Of the four swirls that one can imagine:
- starting at the top and going up
- starting at the top and going down
- starting at the bottom and going up
- starting at the bottom and going down

... the first three are frequent and the fourth does not exist, for both c- and i- shapes.
Instead, there is a fifth option which is first going up and then swirling down.
These are the m and g characters.

For me, this is one of the fundamental building blocks of the Voynich writing system.

I ran the experiment I described above. It's well possible I made errors: be careful. This is based on ZL_ivtff_2b.txt with uncertain spaces. Each page was processed separately and only word types were considered: each time that Xm (or Xg) has a match XY (where instead of m/g there is a different character sequence Y after X) Y is counted once.

You are not allowed to view links. Register or Login to view., otam is matched by otaiin, and dam is matched by both daiin and daldalol. This results in two counts for -iin and one count of -ldalol.

You are not allowed to view links. Register or Login to view.

These are the top 10 matches found for m and g (counts and frequency wrt all matches).

==> results.m.freq <==
_r 281 11.69372
_iin 239 9.94590
_l 233 9.69621
_in 164 6.82480
_ir 89 3.70370
_ 50 2.08073
_ly 41 1.70620
_ldy 30 1.24844
_n 28 1.16521
_ry 24 0.99875

==> results.g.freq <==
_ 15 4.70219
_l 13 4.07524
_y 13 4.07524
_dy 12 3.76176
_r 11 3.44828
_in 11 3.44828
_iin 10 3.13480
_s 8 2.50784
_ol 6 1.88088
_or 6 1.88088

The top result for 'g' (and 6th for 'm') is the empty string, where removing -g/-m results in a word that occurs in the page (e.g. You are not allowed to view links. Register or Login to view.).
[attachment=9327]

(20-10-2024, 09:13 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Patrick: I agree with your points apart from the first one. There I strongly disagree I don't see the point in having an extra glyph if it behaves like eva-m. We might be getting closer to understanding the system if we figure out what m may be equivalent to. Whatever role m may be assigned in a solution cannot be that of a full letter in the first place, so it is already not a stand-alone part of the alphabet. Or am I missing something?

I suppose it depends on what the overall encoding scheme is. It's hard to imagine [m] functioning in a simple substitution cipher as anything but a null, abbreviation, hyphenation, punctuation, or ornamentation. But if the script works in some other way, I could imagine [m] serving a real purpose linked to some situation that happens to occur mostly at the ends of lines -- something beyond just marking or embellishing them.

Let's imagine just for the sake of argument that Voynichese is designed to encode text in blocks of two plaintext letters, and that plaintext words are never split across line boundaries or image intrusions. If the quantity of plaintext characters in a line (or before an image intrusion) is even, there's no problem. But if it's odd, there will be a single character left over at the end of the string that needs to be encoded separately by itself. The special loop that forms [m] and [g] could then be a way of indicating that a block contains an empty second slot.

I don't mean that as a particularly serious proposal in itself -- just as one example of a type of explanation. But to turn your statement around: it seems to me we might also be getting closer to understanding the system if we can figure out a plausible scenario in which [m] could serve a unique and meaningful purpose in spite of its weird distribution.

Pages: 1 2

Koen G

RenegadeHealer

pfeaster

MarcoP

tavie

Koen G

ReneZ

ReneZ

MarcoP

pfeaster