The Voynich Ninja
[Article] "The Strange Quest to Crack the Voynich Code" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Article] "The Strange Quest to Crack the Voynich Code" (/thread-3098.html)

Pages: 1 2 3 4 5


RE: "The Strange Quest to Crack the Voynich Code" - -JKP- - 19-02-2020

What kind of word statistics?

If the word statistics record the length of the words, then it matters.

If the word statistics record patterns of glyphs that occur in the transition from one word to the next, then it matters.

If the word statistics record the balance of so-called vowels to consonants, then it matters.


If all you are doing is counting spaces, then it doesn't matter EXCEPT that spaces might be a character (or a null) in Voynichese, in which case it does matter (because a word break might be character-based in Voynichese).


RE: "The Strange Quest to Crack the Voynich Code" - nablator - 19-02-2020

(19-02-2020, 05:40 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.What kind of word statistics?
Any statistics that use networks of words as input and no information on word constituents: for example clustering coefficient, characteristic path length, MATTR, WPPA, ...


RE: "The Strange Quest to Crack the Voynich Code" - -JKP- - 19-02-2020

How can you measure word clustering (the relationship of words to their neighbors) without taking into account the word constistuents?


RE: "The Strange Quest to Crack the Voynich Code" - nablator - 19-02-2020

(19-02-2020, 06:30 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.How can you measure word clustering (the relationship of words to their neighbors) without taking into account the word constistuents?
Words are just nodes in a network, connected to other nodes: for the calculation to be made, no information about them other than connectivity needs to be kept. To build the network it is not necessary to decide whether EVA-qo and EVA-iin are "indivisible" or not (whatever that means), only where words are, and which words look similar enough to be considered identical.


RE: "The Strange Quest to Crack the Voynich Code" - Torsten - 19-02-2020

(18-02-2020, 09:27 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.At least with Linear B pretty much all the experts agreed that it was a syllabary. With Voynichese, we still have not even the outline of a proof as to what constitutes a single token.

This is indeed a interesting question. One of the first steeps in deciphering is to determine the type of the writing system. Normally it would be helpful to count the number of distinguishable glyphs (see You are not allowed to view links. Register or Login to view.). Writing systems using fewer than 30 glyphs are usually alphabetic, systems with 50–100 glyphs are likely syllabic, and writing systems with hundreds of glyphs are likely logographic. In the case of the VMs the number of 20-30 different glyphs suggests that the script is alphabetic. 

However, in the case of the VMs nothing is as it seems on first view. For instance, "the word length distribution in the manuscript follows a binomial distribution with an underrepresentation of short and long words, an unusual characteristic in a natural language" (You are not allowed to view links. Register or Login to view., p. 2). Another unusual result is that "similarly shaped glyphs can replace each other" (Timm 2015, p. 4). So why does the Voynich manuscript looks familiar on first view but behaves unusual and strange on second view?


RE: "The Strange Quest to Crack the Voynich Code" - -JKP- - 20-02-2020

(19-02-2020, 09:32 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(19-02-2020, 06:30 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.How can you measure word clustering (the relationship of words to their neighbors) without taking into account the word constistuents?
Words are just nodes in a network, connected to other nodes: for the calculation to be made, no information about them other than connectivity needs to be kept. To build the network it is not necessary to decide whether EVA-qo and EVA-iin are "indivisible" or not (whatever that means), only where words are, and which words look similar enough to be considered identical.

The bolded text is the part I am having trouble with, with regard to relating these kinds of calculations to the VMS script (and medieval Latin and Indic scripts in general).

In medieval script 9lt9 and 9lt9 can mean two completely different things. One might be conlitum, the other might be comlantus. They are written the same; they are interpreted differently depending on which words are nearby. This is NORMAL in the 15th century.

The same is true of this symbol in languages that use Latin conventions: EVA-ch.  It can variously be interpreted as cr, tr, cc, ci, ce, ec, et, er, te, or tc (or other combinations). Interpretation depends on context.


These are not oddballs or uncommon in medieval script. They are normal and common. Their interpretation is NOT based on shape entirely. It is heavily based on context. So how do you determine which nodes are the same for something like the VMS, which might use the same conventions? Looking similar means very little if you can't read the glyphs. Especially since we don't know if height or tail-length changes the interpretation of a glyph. There are Asian languages where a tiny little serif changes the meaning of a syllable. Even in Hebrew, the distinction between Hey and Chet is difficult for some people to see and much more subtle than the variations among VMS glyphs.


RE: "The Strange Quest to Crack the Voynich Code" - RenegadeHealer - 20-02-2020

(20-02-2020, 12:40 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Even in Hebrew, the distinction between Hey and Chet is difficult for some people to see and much more subtle than the variations among VMS glyphs.

Dotless Arabic comes to mind also. Certain Arabic consonants are indistinguishable without the dots, which are always added after the line portion of a word is written. Thus one word in dotless Arabic, without any context, can be highly ambiguous


RE: "The Strange Quest to Crack the Voynich Code" - Alin_J - 20-02-2020

(20-02-2020, 12:40 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.These are not oddballs or uncommon in medieval script. They are normal and common. Their interpretation is NOT based on shape entirely. It is heavily based on context. So how do you determine which nodes are the same for something like the VMS, which might use the same conventions? Looking similar means very little if you can't read the glyphs. Especially since we don't know if height or tail-length changes the interpretation of a glyph. There are Asian languages where a tiny little serif changes the meaning of a syllable. Even in Hebrew, the distinction between Hey and Chet is difficult for some people to see and much more subtle than the variations among VMS glyphs.

This can be true for any text where we don't know the language or anything about. However, statistics can still say a lot from starting with a basic assumption, and that can be that the meaning of a word is not context-dependent. To go from there and only consider those words which are written exactly the same way in the text could then be a reasonable starting point. If you deal with an unknown language you have to start from an assumption such as this, it is the only way to go. If it is a correct one, the results will probably reveal something. But of course, you can never be sure of anything already from the start regarding an unknown writing.


RE: "The Strange Quest to Crack the Voynich Code" - -JKP- - 20-02-2020

Alin_J Wrote:This can be true for any text where we don't know the language or anything about. However, statistics can still say a lot from starting with a basic assumption, and that can be that the meaning of a word is not context-dependent. To go from there and only consider those words which are written exactly the same way in the text could then be a reasonable starting point. If you deal with an unknown language you have to start from an assumption such as this, it is the only way to go. If it is a correct one, the results will probably reveal something. But of course, you can never be sure of anything already from the start regarding an unknown writing.


But that assumption has been at the basis of computational attacks for decades and has not yielded anything that illuminates the meaning (or lack of meaning) of the VMS text.

I am not against computational attacks. I think some of them have value but...

Isn't it time we questioned whether "similarity" in shape equal "sameness" in meaning?

ESPECIALLY considering that similarity in shape and sameness in meaning were NOT equivalent in medieval scripts?


RE: "The Strange Quest to Crack the Voynich Code" - nablator - 20-02-2020

(20-02-2020, 12:40 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Especially since we don't know if height or tail-length changes the interpretation of a glyph. There are Asian languages where a tiny little serif changes the meaning of a syllable. Even in Hebrew, the distinction between Hey and Chet is difficult for some people to see and much more subtle than the variations among VMS glyphs.

There is a way to tell when small variations are significant in a long text written in a (real) language: compare a large amount of text and see if these variations appear globally or if they are more likely to be a local quirk or mistake by a tired or clumsy scribe. The length of tails is probably not meaningful because different scribes had different habits as the study (to be published) by Lisa Fagin Davis will show. Another example, V101 transliterates some loosely written a as ei in Q20, which may be wrong. Also, choosing distinct symbols/letters in the transliteration alphabet for some variants of glyphs when the actual writing in the VMS shows a continuum is probably a wrong choice.