Jorge_Stolfi > 11 hours ago
(Today, 06:26 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.That’s an interesting parallel and a great opportunity to remember how peculiar Voynichese is. First, character entropy tells us that Voynichese isn’t a phonetic rendering of a European language: we know that Voynichese glyphs do not behave like vowels and/or consonants in French, English, German, Latin, Greek, Italian etc.
Quote:Second, while in ordinary European languages it is possible that an initial character can be replaced with a different initial, this is much rarer than in Voynichese (where the phenomenon is systematic).
dashstofsk > 11 hours ago
(Yesterday, 05:29 PM)ololololo Wrote: You are not allowed to view links. Register or Login to view.What do you think?
Ruby Novacna > 10 hours ago
(Today, 06:26 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view....we know that Voynichese glyphs do not behave like vowels and/or consonants in French, English, German, Latin, Greek, Italian etc.
... the differences with European natural languages are so huge that I think you can get the idea in any case.
ololololo > 10 hours ago
(Today, 06:26 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.In natural languages, this phenomenon of letter substitution is not universal (but it is inherent in almost all of them). This applies only to some examples.(Yesterday, 06:55 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.We are dealing with words that differ by only one consonant, such as, for example, in French the words "bateau" (boat), "château" (castle), "gâteau" (cake), and "râteau" (rake).
Do they all mean the same thing?
And what about the verbs: "disait" (was saying), "gisait" (laying), "lisait" (was reading), "misait" (was betting), "visait" (was aiming)?
That’s an interesting parallel and a great opportunity to remember how peculiar Voynichese is. First, character entropy tells us that Voynichese isn’t a phonetic rendering of a European language: we know that Voynichese glyphs do not behave like vowels and/or consonants in French, English, German, Latin, Greek, Italian etc.
Second, while in ordinary European languages it is possible that an initial character can be replaced with a different initial, this is much rarer than in Voynichese (where the phenomenon is systematic). In You are not allowed to view links. Register or Login to view., I only find 190 couples that differ by only the first character, while I find 1004 for the top 1500 Voynichese words.
For instance, daiin has these variants among the top 1500 word types:
daiin 850 / saiin 127
daiin 850 / kaiin 79
daiin 850 / raiin 64
daiin 850 / taiin 45
daiin 850 / oaiin 25
daiin 850 / laiin 14
daiin 850 / paiin 7
daiin 850 / yaiin 5
Third, word frequencies for matching pairs in French often differ by a whole order of magnitude.
E.g. t/b matches:
ton 15513 / bon 11483
tout 47221 / bout 4571
tete 11999 / bete 1706
This of course makes sense, since the words in a couple typically are semantically unrelated and the match is "coincidental". This also happens for most of the 'daiin' examples above, but it usually doesn't happen for ch/sh.
Also, in most cases the t/b replacement doesn’t work in French:
trouver 16833 brouver?
temps 16785 bemps?
toujours 14336 boujours?
…
Compare with the top ten Voynichese ch/sh matches:
chedy 507 / shedy 434
chol 395 / shol 185
chey 352 / shey 278
chor 211 / shor 95
cheey 185 / sheey 149
cheol 173 / sheol 108
chy 164 / shy 98
chdy 146 / shdy 45
chckhy 140 / shckhy 60
cheor 93 / sheor 47
Here the replacement almost always works and typically results in comparable frequencies (max/min < 3).
EDIT: as always, it's possible that I made errors, but the differences with European natural languages are so huge that I think you can get the idea in any case.
ReneZ > 7 hours ago
(11 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Character entropy is totally dependent on the encoding of phonemes. The character entropy of Italian would increase noticeably if one replaced "ch"->"k", "gn"->"ñ", "gl"->"ł', "sc"->"š", etc. Even more if one replaced every open "e" by "ɛ", every open "o" by "ɔ", and marked every stressed vowel with a diacritic.
(11 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Word entropy is less dependent on the encoding. But even that is affected by orthography, e.g. by word splitting and joining. The word entropy of Italian would be lower if oblique pronouns were split from the verb ("ditemelo" -> "dite me lo") and compounds were split into components ("automoble" -> "auto mobile", "solamente" -> "sola mente", etc. And it is lower also if the text is heavily abbreviated or in shorthand, so that many word types are merged ("pasto","pesto","posta" -> "pst" etc.)
Anyway, IIRC the word entropy of Voynichese was about 10 bits per word, which was well within the range of European languages.
Jorge_Stolfi > 2 hours ago
(7 hours ago)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Secondly, making such [digraph-for-letter] substitutions means it is no longer Italian.
Quote:Word entropy can barely be measured for the Voynich MS text. Word pair entropy is completely out of reach. For single-word entropy, the uncertainty of the alphabet, the handwriting itself and the word spacing makes that this cannot be estimated reliably. As an example, one will get a highly variable number of hapax from the text, depending on which transliteration (of the same text) one uses.
Jorge_Stolfi > 1 hour ago
(2 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Let me just say that, by my estimate, there are still about 3000 word breaks that were completely omitted either by the Scribe or by the transcriber [me].
Mauro > 1 hour ago
(2 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Anyway, let me insist again that statistics (glyph and word frequencies, Zipfness, entropies, correlations etc) are not properties of languages, but of texts.
Jorge_Stolfi > 32 minutes ago
(1 hour ago)Mauro Wrote: You are not allowed to view links. Register or Login to view.Indeed it's almost always possible to determine the language of an unknown text just by comparing basic statistics: for example, this is the output of a program I wrote some time ago which categorizes texts according to their statistics. Here I'm comparing a book in Italian ("L'amore di Loredana") with a panel of texts in different languages, by calculating the root-mean-square distance of just the bigrams distributions.
Mauro > 8 minutes ago
(32 minutes ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(1 hour ago)Mauro Wrote: You are not allowed to view links. Register or Login to view.Indeed it's almost always possible to determine the language of an unknown text just by comparing basic statistics: for example, this is the output of a program I wrote some time ago which categorizes texts according to their statistics. Here I'm comparing a book in Italian ("L'amore di Loredana") with a panel of texts in different languages, by calculating the root-mean-square distance of just the bigrams distributions.
Generally true, ... provided that the unknown text is written in the "official" orthography, and contains a sufficiently large fraction of "normal" prose text. In that case, even the letter frequencies could distinguish Italian from English.
But the VMS is definitely not written in the official orthography of any language. Even if it was in un-encrypted Italian, it would be in an orthography that is not simple letter-by-letter mapping of the "official" one. It would use its own alphabet and digraphs.
And it would probably have its own quirks of spelling, like attaching the articles to nouns, detaching oblique pronouns from the verbs, marking stressed vowels in a different way, making heavy use of abbreviations (like "cãtaaŕ" for "cantare") etc.
And the text may make heavy use of certain words that are rare in typical novels and contain unusual digraphs. Like, a Medieval Italian herbal may have an excess of occurrences of the digraph "rb", and an excess of word-initial "h", if it repeats the old word "herba" often enough.
All the best, --stolfi
Quote:Anyway, let me insist again that statistics (glyph and word frequencies, Zipfness, entropies, correlations etc) are not properties of languages, but of texts.