oshfdk > Yesterday, 04:16 PM
(Yesterday, 03:57 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view."the frequency of 'e' in english" is effectively saying "the frequency of 'e' in english texts on average". Just because a certain text can be deliberately changed from the norm, does not mean the norm doesn't exist. There is still a distribution of frequencies from all existing texts, and those meaningfully differ from language to language.
Jorge_Stolfi > Yesterday, 07:49 PM
(Yesterday, 03:57 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Of course there are statistics for a language, and those statistics can be useful for comparing languages.
Quote:"the frequency of 'e' in English" is effectively saying "the frequency of 'e' in English texts on average" ... There is still a distribution of frequencies from all existing texts
Quote:The frequency of the word written "t h e" is significantly higher in an average english text compared with that of an average french text.

JoãoFelipe > Yesterday, 07:59 PM
Labyrinthinesecurity > Yesterday, 09:28 PM
(Yesterday, 03:29 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 02:30 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.Concretely, we propose 4 criteria that any Voynich script "generator" (like Naibbe) should meet to really look like Voynich. These are necessary criteria, not sufficient ones.
But your criteria are based on features that distinguish Voynichese from four (4) languages - two Western Indo-European ones and two Semitic ones.
Natural languages are MUCH more varied than that. There are the East Asian monosyllabic languages, agglutinative languages, languages with vowel harmony like Turkish and Hungarian, languages with definite articles written as postfixes, languages with and without noun inflections for gender and number and noun-adjective agreements, languages with postpositions instead of prepositions... Then there is sandhi, which may be realized in writing (like "a" changing to "an" in English).
But a bigger problem is that character statistics depend on the spelling system much more than on the language. For instance, tones in romanized Chinese may be encoded as diacritics on a vowel, or as a numeric suffix 1-4. The second choice would radically change the statistics of suffixes...
And, finally, statistics are a property of a text, not of a language. There is no such thing as "the frequency of 'e' in English" or 'the most common Engish word'. Someone wrote a whole novel in English without using 'e' even once -- and readers don't notice unless they are told. In a materia medica the most common word may well be "take" or "cures", and the word "the" may hardly be used...
All the best, --stolfi
oshfdk > Yesterday, 10:03 PM
(Yesterday, 09:28 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.In facr, regardless the meaning of a voynich grapheme (a letter or something more complex), the point is that a generator will have to match the 4 weird signatures...
Labyrinthinesecurity > Yesterday, 10:33 PM
(Yesterday, 10:03 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 09:28 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.In facr, regardless the meaning of a voynich grapheme (a letter or something more complex), the point is that a generator will have to match the 4 weird signatures...
This is assuming the patterns are the result of a generator and not just a feature of the source material. For example, while Naibbe fails some of the criteria, can you say that it's impossible for it to conform to all 4 given the right type of the plaintext?
oshfdk > Yesterday, 10:35 PM
(Yesterday, 10:33 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.My personal opinion is that it could be possible that a Naibbe variant matches the 4 signatures simultaneously, but I dont know how one could achieve that as of today.
Labyrinthinesecurity > 6 hours ago
(Yesterday, 10:35 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 10:33 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.My personal opinion is that it could be possible that a Naibbe variant matches the 4 signatures simultaneously, but I dont know how one could achieve that as of today.
Which of the criteria will be the hardest to fine-tune Naibbe for? Suppose Naibbe is optimized so that the transitions between Naibbe tokens match transitions between words in the Voynich MS, will this solve most of the problems?
oshfdk > 1 hour ago
(6 hours ago)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.M. Greshko took care to make Naibbe of plausible algorithmic complexity for late Medieval scholars. Yet, it is at the upper level of reasonable complexity for that time, I believe. The most difficult challenge when "forcing" Naibbe into matching the signatures will be to prevent a complexity explosion.
We can always imaging "ad hoc" generators that will match the signatures, but we must always keep in mind that they must be reasonably simple and realistic.