The Voynich Ninja
Another link with Roman numerals? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Another link with Roman numerals? (/thread-3662.html)



Another link with Roman numerals? - Koen G - 03-10-2021

The use of Arabic numerals has been promoted in Europe since the 10th century, but it took until the age of printing for them to (almost) entirely replace Roman numerals. One reason for the lasting popularity of Roman numerals in certain contexts would be that they are harder to tamper with, so they kept being used for things like accounting. (So far I've only seen anecdotal mention of this though. May need confirmation.)


The idea would be that they wrote for example xxvij. Because of the "j", nothing could be added. By contrast, something like 27 could be changed to 271 with minimal effort and drastic effects.

Obviously, one often-mentioned commonality of Roman numerals with Voynichese is low character entropy, or "low positional information". If I give you the alphabetically sorted set EVA [a, d, i, i, n], you know exactly how to combine them into a vord. The position of glyphs in clusters is inherent to the system and hardly provides information. In Voynichese as well as in Roman numerals, this is not an absolute truth (IV does not equal VI), but the contrast with regular texts and Arabic numerals respectively is similar.

Now I'm not saying Voynichese is designed for accounting or to prevent tampering, but the writing system also shows some resemblances to the practice with Roman numerals described above. There are some glyphs that look like they have something added to them, and those tend to "guard" edges of vords.

* The "c" or "a" with a swoop, in EVA called [y], guards both the left and right sides of vords. According to Voynichese.com, only 3% of [y] tokens do not appear at the edge. I think the actual percentage is even lower, since many examples seem to involve uncertain spaces.
* If you add an upward swoop to the [i] minim, you get what EVA calls [n]. This extended form of the minim could be said to "guard" the right side of words.

* EVA [m] is another glyph with an added swoop, and it appears almost exclusively at the end of vords. Exceptions often occur at the ends of lines, where some compression may have taken place.
* EVA [q] seems to guard the left side of vords. But [o] can do this as well, which is an argument against the system: if you have a word starting with [o], you could change it by adding [q] in front.
* 83% of [s], a curve with a swoop, is found at the first or last position of vords (this is [s] as a standalone glyph, not as part of the capped bench. I just did some quick calculations on Voynichese.com but there may be errors).

To be clear, this is not an attempt to read Voynichese as Roman numerals. But it is interesting to see some common tendencies between both systems.


RE: Another link with Roman numerals? - pfeaster - 03-10-2021

Interesting line of speculation!  One point that occurs to me is that the extension of final minims wasn't limited to Roman numerals.  To give just one other example, in much early modern French handwriting final "n" is written as (in effect) "ij," and final "m" as "iij."  A couple quick examples pulled from the closest document at hand: "ou environ" --

   

and "ch[asc]un an" --

   

In this case, the result is to disambiguate final "n" from final "u," which instead ends with an upward curve.  Meanwhile, a word-initial minim of "m" or "n" circles up and around from below, while a word-initial minim of "v/u" descends from above.  But "u" and "n" *within* words aren't disambiguated, so if the goal was to disambiguate, it was inconsistent at best.

Maybe such features later made Roman numerals attractive as a way to make records more difficult to tamper with, but I doubt that's why they came about in the first place.

What other thoughts do people have about how these features came about and what functions they might have served?


RE: Another link with Roman numerals? - Koen G - 03-10-2021

I cleaned up the formatting for you Smile 

Differentiation within clusters is also an interesting option. Though if we focus on the [i] minim cluster in Voynichese, it becomes clear that the mere presence of a swoop does not differentiate much, it's always present. Of all words with [i], 85% end the cluster in [n], 11% end it in [r]. So in most cases you add an upwards swoop from the bottom of the last minim. The differentiation occurs when you also add an upwards swoop, but connect it elsewhere?

EVA [y] is also an interesting example. It occurs almost always in a "free" position (word-initial, final or by itself), but has much more flexibility than [n]


RE: Another link with Roman numerals? - pfeaster - 03-10-2021

(03-10-2021, 07:07 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Differentiation within clusters is also an interesting option. Though if we focus on the [i] minim cluster in Voynichese, it becomes clear that the mere presence of a swoop does not differentiate much, it's always present. Of all words with [i], 85% end the cluster in [n], 11% end it in [r]. So in most cases you add an upwards swoop from the bottom of the last minim. The differentiation occurs when you also add an upwards swoop, but connect it elsewhere?
The value of the flourish itself for disambiguation could vary depending on the quantity of minims.  If we analyze [n] as [i] closed with a swoop, and [r] as [i] closed with a different kind of swoop, then the probabilities of one or another swoop-type would be, in Currier A:

[...ai-]
>[n] as [...an] = 3.64%
>[r] as [...ar] = 38.18%

[...aii-]
>[n] as [...ain] = 59.95%
>[r] as [...air] = 28.57%

[...aiii-]
>[n] as [...aiin] = 95.30%
>[r] as [...aiir] = 2.16%

The differences in Currier B are similar but not quite as extreme.  I'm also ignoring other options, such as [l] and [m].  But for [ai-] or [aii-], the choice of [n] to close the sequence seems less of a foregone conclusion than it is for [aiii-].

For [e] clusters, the most obvious morphological equivalent of [n] is [b], which likewise becomes more probable the more "hatchmarks" there are leading up to it (e.g., [...eeeb]) -- an interesting parallel behavior, in spite of how rare [b] is.

Early modern French cursive would probably also show an uneven distribution of "ending flourishes" based on minim quantities.  For example, a cluster of three minims would almost always be [iij], corresponding to modern [m], [in], or perhaps [ui]; the Latin word "diu" would be about the only likely exception.  But [j] would still always be most common as the final glyph, regardless of the quantity of minims -- it would just be more common to varying degrees.  So this differs from the situation with [i] clusters in Voynichese, where [ar] is more common than [an], but [aiin] is more common than [aiir].  I think you're right to suggest that if there's a connection, it's one based on general principles, and not on specifics.  But my sense is that the practice of writing, for example, "iij" for "iii" arose out of conventions of alphabetic writing, rather than being unique to the writing of numbers, so those "general principles," whatever they might be, are probably to be sought there -- or maybe in the physical mechanics of writing itself, irrespective of what it's intended to represent.


RE: Another link with Roman numerals? - Koen G - 04-10-2021

Right, I agree that word-final flourishes should be considered as a wider phenomenon, of which the use in Roman numerals is but one example. I find them extra intresting though, since like Voynichese, Roman numerals tend to sort glyphs in a specific order. This is especially true if you avoid subtractive notation (so write IIII instead of IV).

If I give you, sorted alphabetically, six different glyphs from a Roman numeral, you can reconstruct the "word" I took them from (assuming additive notation):
CDILVX --> DCLXVI
If I give you six different EVA glyphs, chances are you will be able to reconstruct the "word" I took them from by putting them in the right order as well:
ainoqt -> qotain


None of this explains the behavior of [y] though, since apart from mostly occuring at the final position, it is also common word-initially. We cannot say that it behaves like a capital letter, because those tend to be limited to the first position. It also doesn't seem to function as a disambiguator - in words like [ytchor] and [ytaiin], what is there to distinguish?


RE: Another link with Roman numerals? - nablator - 04-10-2021

(04-10-2021, 12:35 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.None of this explains the behavior of [y] though, since apart from mostly occuring at the final position, it is also common word-initially.

Common prefixes [oydlrs] (sometimes detached) are also common endings, [q] and [n] are exceptions. Gallows and benches are special in different ways.


RE: Another link with Roman numerals? - pfeaster - 04-10-2021

(04-10-2021, 12:35 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I find them extra intresting though, since like Voynichese, Roman numerals tend to sort glyphs in a specific order. This is especially true if you avoid subtractive notation (so write IIII instead of IV).


That's true.  If we analyze the situation with Roman numerals, the basic reason for the sorting is of course that numbers are written in order by place value.  Even though Roman numerals themselves aren't a place-value notation, they were integrated into systems of counting and calculation that centered on place-values, such as finger-counting on two hands, counting-boards based on columns with different place values, and the Latin words for numbers (e.g., "tria milia trecenti triginta tres").  With additive notation, the units will be written only with V and I; the tens only with L and X; the hundreds only with D and C; with, in each case, a maximum of one token of the first glyph type followed by a maximum of four tokens of the second glyph type.  With subtractive notation, we sometimes also find a single I, X, or C preceding a higher-value numeral.  But the number can still be parsed in terms of descending place values, e.g., MCDXCII = M / CD / XC / II.  

Subtractive forms that would violate this rule, such as ID for 499 or IIX for 8, seem always to have been nonstandard and rare, even though they can be found all the way back to Classical Antiquity.  They do, however, suggest one mechanism by which glyphs that usually appear at the end of a "word" could also appear at the beginning, even before glyphs that ordinarily fill a "higher" slot.

When numbers are written in Roman numerals separately from one another -- say, in columns listing folio numbers, or inserted into texts (e.g., "xii folia") -- there's no need to disambiguate beginnings and endings.  But I don't have a good sense for how situations were handled in which multiple numbers would have been written next to each other in a line.  My understanding is that there wasn't a compact notation equivalent to, for example, 4+5, but I don't know whether there might have been some convention equivalent to comma-separated values, either in the era of scriptio continua or later.

If we consider a straightforward substitution cipher in which each plaintext letter is represented by an integer expressed in Roman numerals, along the lines of the "pen test" shown by J. K. Petersen You are not allowed to view links. Register or Login to view., it's not obvious how to proceed.  According to the "pen test" key, HELLO would, I suppose, be enciphered as xii ix xv xv xviii.  Obviously the spacing here is crucial; xiiixxvxvxviii would be illegible.  The first pair of letters could be clearly separated by writing them xij ix.  Presuming final i were always written j, the other numbers could only be divided in the specific way they are right up to the last one: xviii.  Since final x and v have no distinct forms (as far as I know), this could be parsed as x viii, x v iiixviii, or xv iii.  But I notice that the "pen test" key shown by Petersen doesn't assign iii, v, or viii to anything, which could perhaps rule out every option but xviii.  It's also missing a few letters (V, X, Y, Z), so I'm not sure the key is complete as given, but the fact that D = xxviii (28) and P = xxix (29) would be consistent with certain numbers having been strategically skipped.  I hesitate to spend too long analyzing this cipher arrangement without knowing its source or whether it's complete, but it looks as though whoever designed it was making a conscious effort to minimize parsing ambiguities.  That's not to say there wouldn't be any ambiguity -- F and R are both assigned to xx (20), and I and T are both assigned to xxiii (23) -- but that's not a *parsing* ambiguity.  Assigning x to Q was clever, since a reader could assume that (for example) xxv is A rather than QL, and that xxviij is D rather than QO, even if everything were run together.  If there's more to the key than shown, all this analysis could be incorrect, of course.  But the design quirks in the "pen test" cipher as shown do seem consistent with an effort to avoid parsing ambiguities.

If parsing ambiguities were recognized as a problem for such numerical ciphers, another way to eliminate them would have been to add a custom flourish after each number, e.g., xv'xviii' = LO, perhaps also with different flourishes for word and syllable breaks, e.g., xv'xviii,xxv'vii'xviii,vi'ix"xii'xviii'xv'xxviii, = LO AND BE-HOLD.  Or one could switch between alternate sets of numeral glyphs.  Or, as Petersen suggests, insert nulls as dividers.  If there are other early examples of simple numerical substitution ciphers, and especially ones that were actually put into practice, it would be interesting to learn whether they made any provisions like these.  It's hard to imagine them not being necessary.

As with so many models, this seems so close to Voynichese in some ways, and yet so far from it in others.  One difference may be that an individual Roman numeral is unidirectional, starting with the higher places and moving towards the lower ones.  Voynichese text is usually continuous -- broken into vords, but (except for labels) with other vords before and after them, with lots of statistical interdependencies across the vord breaks.  It often seems to behave more like a punctuated loop, where glyphs have a strongly preferred sequence (e.g., qokedyqokedyqokedyqokedy), and where vord breaks help break the text into visually manageable chunks at a more or less consistent point within the loop.  If Voynichese were organized in terms of a punctuated loop, then the "beginnings" and "ends" of words wouldn't be at opposite ends of anything -- they'd be the *closest* positions to one another.

(04-10-2021, 12:35 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
None of this explains the behavior of [y] though, since apart from mostly occuring at the final position, it is also common word-initially. We cannot say that it behaves like a capital letter, because those tend to be limited to the first position. It also doesn't seem to function as a disambiguator - in words like [ytchor] and [ytaiin], what is there to distinguish?

In a punctuated loop, [y] could conceivably "sort" to the vicinity of the vord break, but sometimes falling on one side, sometimes on the other (or, in those cases where there happen to be two [y] in a row, with one on *either* side of the break).   Without looking, would you guess [qokchod.ychear] or [qokchody.chear]?  Or [dar.ytey] or [dary.tey]?  Could it be the spacing that's inconsistent or variable, and not the role of [y] itself?


RE: Another link with Roman numerals? - MarcoP - 04-10-2021

Another well-known property of word-initial 'y' that I think is worth mentioning here is that it is has a strong preference to appear at the beginning of lines. About 30% of y-words appear in that position, vs an expected ~10% (assuming lines averagely include 10 words).