The Voynich Ninja

Full Version: Peter Bakker on the VMS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
(06-08-2020, 10:24 AM)Helmut Winkler Wrote: You are not allowed to view links. Register or Login to view.
JKP,

some friendly advice:You really should start reading up  things before you start posting.

A Cisiojanus (sic, and not Cisiojanis) is a Liturgcal Calendar in a mnemotechnical useful form and has ot the least to do with encryption


I know this. Some of the medieval calendars I have seen have a portion of a Cisiojanus (usually Latin but sometimes German) on the folio for the specific month that is mentioned in the text.


But this manuscript (Cod. Pal. germ. 597) is not the normal context for the Cisiojanus. It is not a calendar. It is not a philosophical text on mnemonics. It is not a grammar book for explaining syllables. It is a manuscript full of ciphertext, personal messages, and alchemical references (salt and sulphur). There is ciphertext at the top of the Cisiojanus folio and there are many pages of cipher following it. There are also many pages cut out (possibly cipher messages that have been sent?).

Also, when I have seen the Cisiojanus in calendars (which is where I have generally seen it), it is not broken up into syllables even though there is a syllabic concept underlying it. In pal.germ.596 it is broken up into syllables.

I apologize for misspelling it. Most of the time I get the spelling right. But I don't apologize for mentioning the syllabification.


I think nablator's explanation is probably correct. It's simple and practical. Maybe they did it to count syllables (although they are not perfect syllables and the text is in dialect that doesn't match the syllables in the more classical versions). Some words are broken across lines. They could have used slashes or dots or just written it as normal text as I have seen in other manuscripts, but they chose to spread it apart.

If someone is using extensive sections of ciphertext and the Cisiojanus seems out of place, I still have to ask myself why is THIS version spaced in this way? Were they planning to use the idea of it? It is preceded and followed by ciphertext.

By the late 1500s, ciphers were obfuscating or adding spaces. This is earlier, 1426... did the germ of the idea start here??
(04-08-2020, 01:39 PM)DONJCH Wrote: You are not allowed to view links. Register or Login to view.I see gratr in German translates to burr or razor edge - the band is German.
I like this sort of music. Smile German metal is outstanding!

(offtopic) "If you like metal, you're my friend" Cool

BTW, I run a metal You are not allowed to view links. Register or Login to view. with a Voynich-inspired logo!
(05-08-2020, 08:52 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view....if (as with the Voynich Manuscript), you look at a pre-1500 text or ciphertext with spaces, there is no reason to think that the spaces are anything but word separators.

TLDR: Voynichese does not seem to be a simple substitution cipher of a European language, not a simple verbose cipher of such a language.

I am only too happy to follow what Nick says and consider spaces as word separators.
But, though Bakker's conclusion of meaninglessness is not well-grounded, I think he has a good point here:

Bakker Wrote:The average word length for the Voynich manuscript appears quite similar to those of English and Latin, which again suggests that it would be written in an alphabet rather than an abjad, and that the language type is not polysynthetic like Greenlandic, nor isolating like Chinese.
...
The Voynich text too adheres to Zipf’s law, which might suggest that it is in fact a natural language. But, since computer analyses did not result in a convincing match with any other language, probably not a known language with transliterated letters.

One way to look at this is conditional entropy: this property is not altered by simple-substitution ciphers, so it gives us a way to compare written languages with Voynichese without making assumptions on any specific character-by-character mapping. Voynichese has a number of glyphs that is comparable with that of alphabetic writing systems. Of course this number varies with different transliterations, but conditional entropy remains low with various encodings. Low entropy together with similar alphabet size suggest that Voynichese must be compressed to make it comparable with written natural languages (I think this was pointed out by Rene). If the VMS were a verbose cipher, decoding it would be such a compression. But another effect would  be reducing word length.

The following plot shows average word-length (X axis) and conditional entropy (Y axis). The plot was cropped to the area close to Voynich  samples. Language files come from You are not allowed to view links. Register or Login to view..

[attachment=4652]

You are not allowed to view links. Register or Login to view. recently experimented with replacements that transform EVA into something with almost a normal entropy level. The average average word length for EVA ("ZL" in the plot) is ~5.2, for the Currier-D'Imperio system (CD) it's 4.1. The X axis of this plot shows that word-length is comparable with:
ENM (old English) 4.0
ITA (Italian) 4.3
GRC (Greek) 4.8
LAT (Latin) 5.8

Koen's transformations take average word length down to 3.5, close to Persian (an Abjad) and Viet (a monosyllabic language).

So it is not clear that Voynichese can be mapped to a European language in such a way that  both word-length and entropy are right. Fixing entropy breaks average word-length.

So, Bakker is right: Voynichese is not a direct transliteration of a European language. This is something we already know: we have seen enough one-to-one translations.
Koen's experiment suggests that a verbose cipher of a European language does not fit either. A verbose-cipher could still be a component of the cipher system, but it should be paired with something that shortens words. Low entropy and a limited number of different characters are not compatible with abbreviations like those we see in medieval manuscripts, but maybe something like an abjad where some sounds are not written could be an option.
If, following the similarity with Vietnamese, one interprets Voynichese words as syllables of a European language, Nick's axiom is violated; moreover, one has to conclude that labels are not what they seem to be (i.e. whole words related with illustrated items).

So, when we find languages that appear to share some properties with Voynichese, we should also check conditional entropy, if the idea is a simple-substitution cipher or a direct phonetic encoding (which is a very similar concept).
That's a very clear summary, Marco, and I certainly agree. It's a bit like playing whack-a-mole, isn't it. You find a way to solve one problem, only to see that another problem has now gotten worse.

The labels are one thing I would not get too hung up on. We don't know if they are supposed to be nouns labelling the thing they are with. They also seem to have different properties than the main text, although I must admit I never understood the degree to which this is the case.

I see only one way they could get away with really short labels, and that is if the name of the labelled thing is known from the text (where it is written in full) or from general culture. For example, in Byzantine mosaics it was sufficient to label Mary "Mother of God" with 2x2 letters, effectively halving the amount of letters required. Example: You are not allowed to view links. Register or Login to view.

The effect is similar, I guess, to an abjad that is then expanded by verbose cipher, in the sense that you don't write certain things, and then expand what is left in a systematic way. This sounds overly complex, though I guess with the right rules it would not become a one-way cipher, if your text is one that can be reconstructed from the "abjad phase".
Thank you for your comment, Koen!

(10-08-2020, 03:52 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.The labels are one thing I would not get too hung up on. We don't know if they are supposed to be nouns labelling the thing they are with. They also seem to have different properties than the main text, although I must admit I never understood the degree to which this is the case.

I see only one way they could get away with really short labels, and that is if the name of the labelled thing is known from the text (where it is written in full) or from general culture.

Labels are fairly consistent with the main text. The subject was discussed in You are not allowed to view links. Register or Login to view..

You are not allowed to view links. Register or Login to view. different prefixes in labels and main-text are shown: the main differences are that labels have less frequent q- and more frequent o- ch- sh-. Since q- appears to be a prefix that is sometimes added to o- words, the difference in q- and o- appear to be consistent. What happens  with ch- and sh- is more difficult to explain, but it does not seem to be enough to conclude that labels are not words. I think it's the kind of variability that one can expect when comparing a specific category of words (e.g. nouns) with a larger vocabulary.

You are not allowed to view links. Register or Login to view. Stolfi shows that label tokens are longer than main-text tokens (fewer short words and more long words in label). The distribution of word-types is a nearly identical binomial for labels and main-text.

If labels are abbreviated, the same words appear to also be abbreviated in the main text: I am not aware of evidence  suggesting that they are written in full in the main text. But yes, some form of general abbreviation / truncation / abjad together with a verbose cipher seems like a possibility.

Personally, I find it difficult to believe that labels are not significant while the rest of the text is. I find total meaninglessness or total meaningfulness more plausibile. But this could just be my biased subjective preference. Of course, everything is possible.
Would considering them all as syllables (both "words" and labels) fix the issue?  



I must admit the wack-a-mole analogy is quite apt and has me a little confused.



I know this contradicts Nick's idea about spaces -- but as the discussion clearly shows something has to give and I would be interested in reactions to this approach.



There is definite precedence for labels in figures beings strung out syllables that add up to a word when taken as a whole.
[attachment=4655]

e.g. TONI separate from  TRUA, VO separate from  CES, and FUL separate from GURA in this figure.  Obviously, those who have looked at more manuscripts than I have may have a different opinion, but it doesn't seem rare to me.
You are not allowed to view links. Register or Login to view.

And I can understand Marco's concern with spaces being syllabic rather than word separators given the work he and Emma have done, but I don't think the shown relationships are invalidated merely because the spaces mark syllables rather than full words.  Certainly a good chunk of the separators would still be between whole words -- just not all of them.  Of course, figuring out which is which would be the tricky part, but we knew this was not going to be easy, given the numbers.  Could the relationship analysis be somehow leveraged to try to propose a pattern that shows which spaces are between syllables and which are between words?*

*Note I don't expect an answer for this, I'm just throwing ideas out to get more discussion for better understanding.
(10-08-2020, 03:52 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I see only one way they could get away with really short labels, and that is if the name of the labelled thing is known from the text (where it is written in full) or from general culture.
Another way (or both combined) would be to concatenate several labels to get a reasonable length.
(10-08-2020, 06:06 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Would considering them all as syllables (both "words" and labels) fix the issue?  

I must admit the wack-a-mole analogy is quite apt and has me a little confused.

I know this contradicts Nick's idea about spaces -- but as the discussion clearly shows something has to give and I would be interested in reactions to this approach.

This does not necessarily contradict what Nick wrote.
I am thinking of Guy-Stolfi's You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. about the binomial distribution of dictionary word lengths in Voynichese and Vietnamese. These are mono-syllabic languages: words are one-syllable long.
In this view, the almost-overalp of Koen's modified-EVA with Vietnamese in the graph I posted above seems interesting.
The Friedmans concluded that Voynichese cannot be a simple substitution cipher more than half a century ago, so this really shouldn't be news. And Voynich words are way too short to be 'pure' verbose cipher, so that too is not something that anyone is arguing for.

At the same time, the tightly coupled groups of glyphs we all see - qo ar al or ol aiin aiir am ee etc - behave exactly how you would expect verbose ciphers to behave. Add in abbreviation, and we have a complicated landscape to navigate our way across.

Me, I'm still bemused by our inability to properly account for the first glyph of each line; for the differences between A, B, labelese, etc; for Neal keys; and so forth. And so when people try to claim that they can understand Voynichese without being able to link what they see with these strange behaviours, I'm naturally skeptical to the extreme.

So in many ways spaces are arguably the least puzzling aspect of Voynichese, because the historical timeline suggests that they are indeed spaces. Sorry if that's inconvenient. :-|
(10-08-2020, 06:06 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view. ...
There is definite precedence for labels in figures beings strung out syllables that add up to a word when taken as a whole.
...


For a long time I sensed resistance to the idea that the VMS might be broken up into components that are not specifically word-related. My feelings about it...

-JKP- 29-08-2019, 09:37 PM Wrote:Manuscripts are full of syllables.

Many charts and especially wheel diagrams are full of syllables (especially in the semi-compotus manuscripts). Lullian diagrams often have syllables. Maps often have syllables broken across topology. Greek manuscripts and basically all Greek art are full of syllables. Most eastern scripts are based on syllables, not on individual letters. I have lots of samples of syllables from western manuscripts.

I've been thinking syllables for a very long time. I was quite surprised when I brought it up in the early days of the forum (or maybe it was on someone's blog), that there was active resistance to the idea. To me syllables are high on the list, not just a vague possibility.[font=Tahoma, Verdana, Arial, sans-serif]
[/font]

I've already posted examples of syllables broken across imagery, but here are some more...

It occurs frequently in Greek icons

[attachment=4658]

but it is also fairly common in Latin manuscripts and charts.

Breaking syllables across imagery happens frequently in maps and fairly regularly in other kinds of images. But sometimes it is done even if there are no drawing components in the way as in this example of a place-name on a 14th-century map:

[attachment=4656]

In this blog, halfway down, is another example of syllables broken across the streams of three rivers:

You are not allowed to view links. Register or Login to view.


I have also seen numerous passages where the syllables are marked with initials or with numbers. Here is an example from an 11th-century  manuscript where the syllables are marked with double and single letters. Notice that one is a gamma symbol rather than a Latin letter. I see this quite often in early medieval manuscripts, where Greek letters are used, or Greek letters are mixed with Latin letters:

[attachment=4657]

Liber Floridus has examples of labels broken into syllables.

Syllables were used in indexing systems:


You are not allowed to view links. Register or Login to view.

Even though I see syllabification quite frequently, I haven't noticed any specific system for marking syllables. It looks like each scribe created his own or used one that was very local. It's similar as the situation for musical notes... there are dozens (maybe hundreds) of different notation systems in the medieval period. They weren't standardized until the Renaissance and early modern period (I'm guessing the invention of the printing press and mass-market books had a strong influence).


Marking syllables with various symbols was common, but I don't know exactly when they started splitting the syllables apart with a space in between, but I'm pretty sure I have examples going back to at least the 10th century and possibly earlier. Not only on maps, but in documents that include songs and notes.


The concept of syllables was certainly in medieval awareness. I am not sure why it took so long for the concept to be applied to ciphers (I'm always keeping my eyes open for clues), but it may have happened sometime in the 14th or 15th century. As can be seen from examples, they ran words together, broke words across lines, and broke words across pieces of imagery much earlier than that.
Pages: 1 2 3 4 5 6 7 8 9 10