(01-08-2019, 11:30 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Well, abbreviation would indeed be a good way to unintentionally generate homography.
I am far from sure that this is the case with normal Latin abbreviations. But I am not a palaeographer, so I may be wrong of course.
[
attachment=3108]
For instance in this fragment from You are not allowed to view links.
Register or
Login to view. the symbol -3 has three different functions in the three words:
coq3 (coque)
maīb3 (manibus)
ide3 (idem)
But the symbol typically stands for -m; b3 and q3 are codified forms of frequent endings, -3 is totally disambiguated by the preceding character.
The "m" / "n" positional ambiguity with macron (manibus / mamibus / mainbus / maimbus) rarely produces alternatives that match more than a single word.
Also, many abbreviations cluster at the end of words, where grammatical constraints limit the number of possible endings. You can have some residual ambiguity, but it will often be limited to different inflections of a single word. Even if most of the abbreviation symbols are ambiguous, you don't get many "true" homographs.
Turning the name of a plant and the name of something entirely different into identical words will not be that frequent: we are not speaking of different inflections, but different root words. I guess that in a Latin abbreviated text this would happen in at most half a dozen cases in 1000 labels; in the VMS we are talking of more than 100 cases in about 850 single-word labels.
More importantly, abbreviations like those in medieval European manuscripts would not generate anything like Voynichese (e.g. the low entropy values). As Rene wrote You are not allowed to view links.
Register or
Login to view.:
(21-03-2019, 06:48 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is absolutely no reason to think that the effect of abbreviating a text, either by leaving out characters, by replacing frequent combinations by a single sign, or a combination of that will:
- reduce the entropy in any significant manner
- introduce the word patterns we see in the Voynich MS text.
See also what Anton wrote You are not allowed to view links.
Register or
Login to view..
But of course one can consider different forms of "abbreviation".
For instance, the method proposed by You are not allowed to view links.
Register or
Login to view. seems to produce a comparable level of ambiguity. I ran a simple test on 1000 Latin words:
- vowels were removed (with the exception of word-initial occurrences)
- the remaining characters were sorted alphabetically
So "manibus" is encoded as "bmns".
This results in 14% "collisions" (different source words being mapped into identical coded words), a number comparable with the overlaps in Voynichese single-word labels. Of course, this method also results in a considerable decrease in entropy values.
(01-08-2019, 11:30 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.What we shouldn't forget is that readers can become very good at understanding words from context. An identical abbreviation might be read differently depending on the surrounding subject matter.
This is certainly true. You can use context to decode Hauer and Kondrak's anagrams: I guess it will be difficult at the beginning, but with some practice the level of residual ambiguity might be acceptable. But their method also results in a drastic reduction of the Type-Token-Ratio (about a 10% decrease with W=200). How can different source words be mapped into homographs and still produce a high TTR?