(29-11-2024, 11:31 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (27-11-2024, 04:19 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.However, I have recently developed a word grammar or, better, a family of grammars, which I would like to share, together with a comparison with the grammars proposed by ThomasCoon, Zattera and Stolfi.
Looks great! I'd like to better understand the implications of your results wrt statistical properties of Voynichese. Could you publish the wordlist that was used as the basis of the grammar/efficiency computations? Sorry for my ignorance if there is already some "standard" list of words for this task. I assume I could just take the EVA file and split by periods, but then there are many variables to consider: like what to do with ambiguous readings, ligatures, half spaces, weirdos, etc.
Sure I can publish the words lists (and the raw outputs of the various grammar). Now I have them in Excel files, would it be okay if I upload them onto Google? (I ask this because Excel files are often seen with suspicion, they can contain dangerous macros, but there are no macros in those files).
And surely I can try with the RF1a-n transcription. Just, can you point me to a link? Ideally it should be a single .txt file without any metadata or added remarks (that would save a lot of asinine work).
Well, in the meantime I wrote down some other considerations (a kind of qualitative Bayesian analysis of where "word chunks" could lead). It's just in draft form (and formats badly, sigh), but it may be interesting.
------
IF: LOOP grammar is valid, THEN:
Voynichese words can be generated by an algorithm which cycles through a short ‘word chunk’ slot alphabet, plus, possibly, a header and a final tail. The exact definition of the chunk slot alphabet is ambiguos, because different choices and variations are possible (more on this problem later), but in any case each slot can hold a certain amount of information. For instance:
HEAD = “q” holds one bit of information (“q” or null)
CHUNK SLOT1 = “ch, sh” holds 1.5 bits (“ch”, “sh” or null)
And so on.
Thus, each Voynich word can be thought as a sequence of ‘fields’, each one holding some bits of information.
Note: this reminds me of the control registers in microprocessor, where a sequence of bit fields is used to specify different functions. For instance (a completely made-up example) bit0 = s.erial communication interface enabled/disabled, bits 1-4: set speed of the serial line, bit5 = parity even/odd, etc. etc.).
Now, as an aside, I have always been an “agnostic, leaning on meaningless” for the Voynich text, but the LOOP grammar tilts my ‘evidence balance’ towards “agnostic, but meaningful more probable than before”. So I asked myself:
IF the text is meaningful:
1. What could the fields encode?
2. What are the chances to ever discover what they actually do encode?
One big problem is the ambiguity in the definition of the chunks already noted above. Are there two slots [“e ee eee”, “o”], or one slot [“e eo ee eeo eee eeeo”]? This is a terrible complication, and, unfortunately, decreases a lot the chances of finding a solution.
The other big problem is, of course, that the encoded information could be any number of different things, and each one could be encoded in a number of possible ways. Which decreases by another lot the chances of a solution.
So my answer to the second question is, unfortunately, that the chances for a solution are very, very (you may add more ‘very’ at your pleasure) low.
Now for the first question, I had a little fun trying to imagine some possibilities. The list, of course, is not exhaustive by any means (imagination is the limit!). And notice: the ordering of the list has no meaning, it’s just as things came to my mind.
·[font=Times New Roman] [/font]It’s a syllabary. I actually tried a little to investigate more, and I even found a reasonable way to convert fields to syllables, but then I realized the most frequent word in Voynich (daiin) has 2 chunks, which would be two syllables, while in all the languages for which I have statistics I can trust (English, Italian, Spanish, Latin, Classic Greek, Koine Greek, (rather old) German, (rather old) French) all the most frequent words, by far, have only one syllable. On the upside, if it’s a syllabary, the chances of finding a solution do not decrease much beyond the baseline (just divide by the number of all possible languages). The ‘decoding’ worked roughly like this: chunks such as ‘aain’encode CV/VC/V syllables (the slots can be arranged to get two fields with about 3 bits of information, enough for vowels, and one field for a consonant (with ~14 possible consonants, which would be +- enough for Latin, much less for English). Chunks such as ‘Cedy’ would encode CVC syllables: it’s possible to get a field for 5 vowels (but only in the first syllables, choices are limited to three after the first) and two fields for the consonants (but one of them is limited to about 9 choices).
·[font=Times New Roman] [/font]It’s a nomenclator cipher (which includes “it’s a constructed language”). Each word could be an index to the nomenclator table (or to the dictionary of the constructed language). Ie., with a nomenclator ‘qokeey’ could mean “Bible-Revelation-Chapter 1-2nd column-5th word-from top”, or anything like that. The worse problem I see with this are the points in the Voynich where a word is repeated four times in a row. Yeah, one could conceive of formulaic phrases (“Sanctus! Sanctus! Sanctus! Sanctus!”), but four words in a row are really a lot. If it’s a nomenclator, the chances for a solution drop to nihil (there is nothing recognizable with certainty in the manuscript, and every attempt to find cribs has failed miserably). So it’s an unfalsifiable hypothesis anyway.
·[font=Times New Roman] [/font]It’s mathematics, which would require words to encode mostly numbers plus some other symbol/feature/maybe some actual real-language words. On the same flavour: it could be an accounting book, or a list of astronomical observations, or a plotting language (someone proposed this already, I don’t remember who or where), or music (but it looks way too much complicated to encode music, unless it was for an orchestra xD), or a myriad of other mostly-numerical things. I think all these possibilities are highly improbable, even if they all get rid of the pesky “four words in a row” problem. Chances to find a solution in this case? Essentially zero, essentialy an unfalsifiable hypothesis.
·[font=Times New Roman] [/font]It’s meaningless, after all. But one must then explain how the consistency of the words grammar was mantained all along the text, even surviving a radical change of ‘language’ (from Currier A to Currier B): I think this is a problem particularly for the Timm & Schinner ‘self citation’ idea (which I really loved, btw). A possibility is that the words (or at least most of them) were generated before writing the text, then choosing somehow the sequence of words (and sprinkling in some 'more complicated' word), but this looks rather weird to me. Another possibility is the writer being affected by some kind of mental problem (akin to some forms of autism), so that the word structure came naturally to him. This would be consistent with all the data (including the weird illustrations and diagrams), but, unfortunately, it’s another unfalsifiable hypothesis.
PS.: as a corollary, if the 'words chunks' hypothesis is valid then the statistics based on bigram frequencies and the like are just a consequence of the underlying grammar, and have no meaning by themselves.