The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10

Missing from the list (and the discussion) is the idea of abbreviation. A word such as EVA oty is almost too small (or, put another way, contains too little information) to be a complete word in the fully linguistic sense. If these labels are in some way abbreviated, many of the difficulties people encounter when trying to explain all Voynichese behaviours disappear.

You are not allowed to view links. Register or Login to view. 09-08-2018, 12:00 PM
Thank you, nablator, that was my point, that the labels may not be full words (that they might be abbreviated or shorthanded in a number of ways). The context would be enough to recognize the reference if you created the manuscript, but would be hard to discern by an "outsider".

As for ways they might be abbreviated, the pic above shows one way. Another way would be text that stretches across labels (in other words, in some instances, the labels may be broken up). A third would be a sort of rhyming mnemonic as was used to help remember oral history or grammatical conjugations.

The same concepts apply in a visual sense to Lullian diagrams. To most people the diagrams look mysteriously unreadable, but if you already know the information behind them, then they provide a way to remember scriptural passages and other information in a very succinct format.

P.S., Nick, just so you know, I didn't post the above as a commentary on your comment. I copied and pasted it because I didn't want to have to type it all in again.

Well, abbreviation would indeed be a good way to unintentionally generate homography. What we shouldn't forget is that readers can become very good at understanding words from context. An identical abbreviation might be read differently depending on the surrounding subject matter.

Still, Marco, I get what you're saying. There's also the matter of rampant almost-identical words which makes the problem all that more complex.

(01-08-2019, 11:30 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Well, abbreviation would indeed be a good way to unintentionally generate homography.

I am far from sure that this is the case with normal Latin abbreviations. But I am not a palaeographer, so I may be wrong of course.

[attachment=3108]

For instance in this fragment from You are not allowed to view links. Register or Login to view. the symbol -3 has three different functions in the three words:
coq3 (coque)
maīb3 (manibus)
ide3 (idem)
But the symbol typically stands for -m; b3 and q3 are codified forms of frequent endings, -3 is totally disambiguated by the preceding character.
The "m" / "n" positional ambiguity with macron (manibus / mamibus / mainbus / maimbus) rarely produces alternatives that match more than a single word.
Also, many abbreviations cluster at the end of words, where grammatical constraints limit the number of possible endings. You can have some residual ambiguity, but it will often be limited to different inflections of a single word. Even if most of the abbreviation symbols are ambiguous, you don't get many "true" homographs.
Turning the name of a plant and the name of something entirely different into identical words will not be that frequent: we are not speaking of different inflections, but different root words. I guess that in a Latin abbreviated text this would happen in at most half a dozen cases in 1000 labels; in the VMS we are talking of more than 100 cases in about 850 single-word labels.

More importantly, abbreviations like those in medieval European manuscripts would not generate anything like Voynichese (e.g. the low entropy values). As Rene wrote You are not allowed to view links. Register or Login to view.:

(21-03-2019, 06:48 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is absolutely no reason to think that the effect of abbreviating a text, either by leaving out characters, by replacing frequent combinations by a single sign, or a combination of that will:
- reduce the entropy in any significant manner
- introduce the word patterns we see in the Voynich MS text.

See also what Anton wrote You are not allowed to view links. Register or Login to view..

But of course one can consider different forms of "abbreviation".

For instance, the method proposed by You are not allowed to view links. Register or Login to view. seems to produce a comparable level of ambiguity. I ran a simple test on 1000 Latin words:

vowels were removed (with the exception of word-initial occurrences)
the remaining characters were sorted alphabetically

So "manibus" is encoded as "bmns".
This results in 14% "collisions" (different source words being mapped into identical coded words), a number comparable with the overlaps in Voynichese single-word labels. Of course, this method also results in a considerable decrease in entropy values.

(01-08-2019, 11:30 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.What we shouldn't forget is that readers can become very good at understanding words from context. An identical abbreviation might be read differently depending on the surrounding subject matter.

This is certainly true. You can use context to decode Hauer and Kondrak's anagrams: I guess it will be difficult at the beginning, but with some practice the level of residual ambiguity might be acceptable. But their method also results in a drastic reduction of the Type-Token-Ratio (about a 10% decrease with W=200). How can different source words be mapped into homographs and still produce a high TTR?

(02-08-2019, 08:36 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I ran a simple test on 1000 Latin words:

vowels were removed (with the exception of word-initial occurrences)

the remaining characters were sorted alphabetically

So "manibus" is encoded as "bmns".

This results in 14% "collisions" (different source words being mapped into identical coded words), a number comparable with the overlaps in Voynichese single-word labels. Of course, this method also results in a considerable decrease in entropy values.

This is certainly true, but the character set is reduced, so not only are all entropy values reduced, also the 'theoretical maximum' entropy values are reduced.
It means that it is very difficult to compare entropy values for different sizes of character sets.

In the very early days of the internet I had done something very similar, in order to try to generate word patterns. I can't remember exactly, but I believe that I just rearranged all characters in every word to put them in alphabetical order. Maybe I also enforced strict vowel-consonant alternation (to the extent possible). The number of collisions was surprisingly small.
Some reduction in entropy was expected (the combinations become more limited) but also that reduction was surprisingly limited.

(02-08-2019, 08:36 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.How can different source words be mapped into homographs and still produce a high TTR?

You make a good point. If the VM is encoded in any way that generates homographs, we'd expect a relatively low TTR. Unless perhaps we assume that it started at a top-tier TTR in the first place, like classical Latin or even something like Sanskrit. Or one of the many possible candidates I haven't tested yet. (I'm not saying this is likely the case, but maybe theoretically possible?)

Now about the labels specifically, there is nothing wrong with apparent homonyms on the one hand and high TTR on the other. You know that high TTR is influenced by inflection, conjugation and other grammatical processes which change the appearance of a word. But in labels we might expect words to mostly appear in their base forms (exceptions exist, like the labels with Rotae Fortunae I've posted before). If we can assume a set of labels to be dominated by base forms, like nominative singular and plural which most languages have, then TTR-increasing factors are mostly absent.

...we also discussed extensively the possible use of abbrev. in the text, a couple of years back!!
Back then, I wrote exactly the same (based on ext. research) as I will write now: the statistics do not change in the case of abbrevs.

But I do not see the point in talking about homographs whatsoever.
If they are, they are, if not, they are not. What would it signal or what is the value of dectecting the possibility of homophones?

(02-08-2019, 09:58 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
(02-08-2019, 08:36 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.How can different source words be mapped into homographs and still produce a high TTR?

You make a good point. If the VM is encoded in any way that generates homographs, we'd expect a relatively low TTR.

The reduction could be counterbalanced by generating several different ciphertexts for the same cleartext.

bi3mw:

Isn't it natural that labels should consist of a lot of unique words (if they make sense) ? Of course, the author(s) could have "simulated" this property, but I think that's unlikely.

--------

If you are interested you can read what I have written on Nick's blog on this subject. Suffice it to say I believe that some labels are nulls or null words, as I term them, and others are real genuine non-null words. I have guessed the number of 60% null labels and therefore 40% non-null labels, but this is really to give an idea of the order of magnitude of the number of null labels. I have suggested that one could possibly generalise this to Voynichese as a whole. I will be writing up my discoveries in full when I have time.

Pages: 1 2 3 4 5 6 7 8 9 10

nickpelling

-JKP-

-JKP-

Koen G

MarcoP

ReneZ

Koen G

Davidsch

nablator

Mark Knowles