The Voynich Ninja

Full Version: A key to understand the VMS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sam and all,
There's enough in the imagery to support the possibility of  a south-east Asian language, though some readers may be unaware of the extent to which southeast Asian centres of manufacture and trade (and thus religious and diplomatic travellers) had regular commerce with more westerly regions, or how early those connections had developed before c.12thC AD. I've written a bit about it, because the Voynich botanical images begin in my opinion with the Moluccas' cloves+Ravensara, and then the next few appear to me to be ordered from east to west in terms of natural occurrence - along a trade route is the most likely explanation.

One matter that troubles me in reading analytical discussions of the written text that there seems to be so little allowance made for the disparity that exists between modern, standardised orthography and grammar and what we'd expect given conditions 'on the ground'.  If a non-native tried to copy a text in a foreign script, t or to record the sounds of another language, even if they developed a regular system and had a perfect ear, the stats would surely not be exactly parallel to what we get using modern transcriptions, orthography and grammars. As proof, one only need consider the way that Chinese or Indian or Islamic sources record names of western people and places - and the way Latins tried to do the same in the east.

 Interpreting those efforts sometimes needs near-intuition - so what troubles me is the apparent expectation of perfect consistency and regularity in this 15thC text.  I wonder whether the expectation owes its foundation to the idea that the text must be encrypted, and thus that it must use consistent and standardised spelling and grammar. 

So - is there a 'flexibility'/'subjective impressions' factor in entropy calculations? Smile
The recording of spoken language could go both ways. Normally in a "written spoken language" medieval text, we'd expect entropy to be a bit higher than in a standardized one, because spoken language is less consistent. Modern English spelling might be an exception to that, because it's full of entropy Wink

But in order to decrease the entropy significantly, we'd need a simplifying factor. That's why I have often suggested the possibility of a pidgin that was written down for some occasion. The problem here is that pidgins are normally spoken phenomena, and there won't be any records available to us for comparison.

Hence, the best approach might be something like what Bax did. Try to tease out bits of vocabulary one by one and build your understanding from there. But to do that correctly, one needs to understand the imagery first Smile
(23-02-2017, 02:26 AM)Diane Wrote: You are not allowed to view links. Register or Login to view.Sam and all,
...

One matter that troubles me in reading analytical discussions of the written text that there seems to be so little allowance made for the disparity that exists between modern, standardised orthography and grammar and what we'd expect given conditions 'on the ground'.  If a non-native tried to copy a text in a foreign script, t or to record the sounds of another language, even if they developed a regular system and had a perfect ear, the stats would surely not be exactly parallel to what we get using modern transcriptions, orthography and grammars. As proof, one only need consider the way that Chinese or Indian or Islamic sources record names of western people and places - and the way Latins tried to do the same in the east.


The problem is not one of orthography.

The problem with the VMS text (in addition to the high level of repetition) is that it is positionally inflexible in a way that is uncharacteristic of natural languages.

I'm familiar with some of the Asian languages and I can read and write a small amount of Korean (and read a bit of Japanese and Chinese), not a lot, but enough to understand their structure, and while the VMS does resemble Asian languages or Turkic languages in some ways (more than it does most other language groups), one has to account for the fact that certain glyphs occur only at the ends of words, the beginnings of words, or in the middle. And that's not even counting the line dynamics, which show a similar pattern.

It is excessively rigid. Even syllabic languages, based mainly on consonant-vowel combinations, even abjads (like old Hebrew with word stems that are used as "base" words to build related words) do not act like the VMS in terms of WHERE you can put the glyphs. They show a great deal more variety, even when applied to highly repetitive formats like prayers, poems, and lists.
(22-02-2017, 09:47 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Vietnamese is a monosyllabic language. Therefore the words can be written with three letters. For words with two, three or four letters the number of available changes is limited. Therefore the effect is explainable in Vietnamese. In the case of the VMS we can found beside [qokedaiin] also the words [qokeedaiin], [qokedain], [qotedaiin], [okedaiin]. Even if we would interpret glyphs like , [ii] and [iii] as diacritical marks the explanation for Vietnamese would not fit for the VMS. 
Moreover in the case of the VMS the network of similar words is more homogeneous then for Vietnamese. In some way all words are similar to each other. 

One reason for this effect is that common word types ending with "iin", "ol" and "dy" are combined with common prefixes like "d", "ch" and "qo". The following table combines all typical 'suffixes' and 'prefixes' and describes this way the main landmarks within the network of similar words for the VMS:

prefix  aiin    ol     dy
none    aiin    ol     dy
d-     daiin   dol    ddy
ch-   chaiin  chol  chedy
o-    okaiin  okol  okedy
qo-  qokaiin qokol qokedy

First it was "no natural language has a word network".  Now it's "no natural language with polysyllabic words has a word network".  Looks like you're losing ground, but despite this you seem to have no interest in considering further evidence.  Anyway, I've already talked about how a polysyllabic language could have the word network property, but despite the fact that you even quoted some of this below you did not actually address it in any way.  Instead you're just mechanically repeating the same points over and over again.

Quote:Nick Pelling has described this effect this way "a reconstructed Voynichese 'dictionary' would, to a modern computer scientist’s eyes, look very much as if it had been generated or permuted by some means." [You are not allowed to view links. Register or Login to view.].

When Voynicheros start putting their own words in other people's mouths, I take it as a sign that their arguments are weak and cannot stand on their own merits.

Where are the actual modern computer scientists who "would" say such a thing?  I suspect there aren't any.

Meanwhile, an actual computer scientist like Jorge Stolfi has done a large amount of research indicating that the VMS is a meaningful text in some exotic natural language... but who cares, right?  Let's just make things up and pretend that computer scientists would actually say them.

Quote:
Quote:
Quote:But did this mean that we should assume that the text of the VMS represents a monosyllabic language? One feature that doesn't seam to fit is the existence of composed word types like 'olchedy' beside words like 'ol' and 'chedy'. BTW: Also the Vietnamese text contains repeated phrases like 'người đàn'. A feature that is missing for the VMS.

I agree that it's not the same in every respect.  The main similarity is the rigid phonotactic structure which allows the smaller words to be connected into a network.

In Mandarin Chinese, the two-syllable words are disconnected from the one-syllable word network because there are no words of intermediate length to bridge the gap between the two sets of words.  

...

To oversimplify a bit, in order to form a network in Mandarin Chinese you would need to be able to go from words with a CV structure to words with a CVCV structure, which obviously can't be done with an edit distance of 1.  But in some languages you can go CV --> CCV --> CVCV.

This might not be exactly the same as Voynichese either but I think it's another step closer.

The problem is that we didn't know if the VMS contains language or not.

Again, you did not actually address my point here at all.

Quote:The Ethnologue catalogue of world languages currently lists 7099 living languages [You are not allowed to view links. Register or Login to view.]. Therefore it is no surprise if it is possible to find for a single feature of the VMS a language with a similar feature.

Yeah, no kidding.  This seems like an obvious point to me as well, but try explaining this to all the people who say that the VMS cannot be a natural language text because it has this or that property.

Quote:What characteristic features for the VMS exists beside the network of similar words?

One feature is the weak word order. In a text using human language grammatical relations should exist between words, and these relations should result in words used together multiple times. Therefore, the lack of repetitive phrases is surprising for the VMS. Moreover since the weak word order exists beside the network of similar words the existence of both features together is a challenge anyway.

There are many things that could be said about the VMS word order, but do you really believe that there are no natural languages with "weak word order"?

Quote:Another feature typical for the VMS is that the change from Currier A to Currier B. Typical for the sections using Currier A are word types similar to [daiin] and [chol] and typical for sections using Currier B are word types similar to [chedy] and types starting with [qo]. There is no clean distinction between Currier A and Currier B. Therefore it is not possible to explain this feature as two distinct languages. The following table shows the frequencies for some words typical for Currier A like [daiin] and [chol] and for Currier B like [chedy], [qokaiin] and [qokeedy]. This way it is possible to demonstrate a steady development from Currier A to Currier B.

section               daiin aiin qokaiin chol[font=Courier New] qokol cheody chedy shedy qokeedy  total word count[/font]
Herbal in Currier A     403   33       1  228[font=Courier New]    24      8     1     0       0        8087[/font]
Pharmaceutical (A)       99   39       2   45[font=Courier New]    20     18     1     1       0        2529[/font]
Astronomical             23   38       0    8[font=Courier New]     1      8     4     0       0        2136[/font]
Cosmological             36   56      18   19     5      7    24    17       4        2691
Herbal in Currier B      72   72      20   13    10      7    62    35       9        3233
Stars (B)               122  193     114   62    13     33   190   113     137       10673
Biological (B)           84   32      88   14    28      0   210   247     153        6911

The table shows that a word like [shedy] is only frequent in sections where also the word [chedy] is frequent. This is a hint for another stunning feature of the VMS. Similarly spelled word types co-occur within the VMS [see You are not allowed to view links. Register or Login to view. or You are not allowed to view links. Register or Login to view.].

I don't really disagree with much of this.  The A vs. B distinction is not due to different languages or even different dialects, and it's too systematic to attribute solely to differences in vocabulary.  Different features of the grammar are used in different sections, and to some extent this can be considered a gradual change as one moves from the beginning to the end of the manuscript.  I think I know what's going on here, since a similar phenomenon can be observed in other texts, but to be honest I don't really feel like sharing this yet.

Quote:There are more interesting features for the VMS. For instance the line is a functional unit [see You are not allowed to view links. Register or Login to view.].


"LAAFU" mostly amounts to extra letters added to the beginning of each line, and sometimes to the ends of lines as well.  Of course, when you describe it accurately it doesn't sound like such a strong argument against the natural language hypothesis anymore.

Quote:The shape of letter determines in some way how the letter is used within a word or within a line or within a paragraph. ...

What's more likely is that how the words are structured has influenced the design of the script.

Quote:With other words we search for a system with many interesting features at the same time. It is using the same or similar words but not the same or similar word sequences. Additionally this system is changing over time. Did this features really describe language?

Yes, the properties of the VMS text are like those of an unencrypted natural language text, and cannot be explained in any other way that is known and can be demonstrated.

Now, how many of these properties that you have mentioned can be found in sample texts generated by your auto-copying code?  My guess is: none.
Quote:SamG
"Yes, the properties of the VMS text are like those of an unencrypted natural language text, and cannot be explained in any other way that is known and can be demonstrated."

I prefer to think of it as having properties that make it LOOK like unencrypted natural language (Latin, to be exact), especially in the way a, o, and Latin shapes are incorporated into the text. But, as I said up-thread (and have been saying all along), it is not structured like natural language, at least not if the spaces are interpreted literally (and even if you try to process the spaces in a different way, the positional rigidity remains).
[quote pid='12204' dateline='1487839436']
Where are the actual modern computer scientists who "would" say such a thing?  I suspect there aren't any.
[/quote]
Actually, the conclusion was first drawn by William Friedman: and (with my own computer scientist hat on) it's abundantly clear why that should be. Stolfi's word-generation paradigm is merely one of many attempts by computer scientists to algorithmically codify Voynichese.
My point in 2013, however, was quite the reverse: that nobody in the 15th century sat down and formed an abstract word-generation grammar.
If I Write Like This, There I Also Extreme Initial Positional Rigidity In the Glyphs That I Use. And If I Were To Intro Duce Spa Ces Be Tween Sy La Bels And Write Mor Fo Ne Ti Ca Li It Wud E Ven Be Com Mor So. 

What I wrote above only accounts for one aspect of Voynichese's rigidity: a set of glyphs that looks for the beginning of words. Add to that the possibility of proclitics and you can also account for certain standard "prefixes". Example: aWord, theWord, hisWord...

In a language with word endings that can be abbreviated, like Latin, it would also be possible to account for rigidity at the end of words, for example by abbreviation. 9 instead of -us etc.

That leaves one problem: it would imply that Voynichese has a very limited phoneme inventory, and I think it does. Do keep in mind though, that for example what we call EVA "sh" and "ch" could actually be more varied, just like there may be more to "iiin" or "eee" combinations.
(23-02-2017, 09:58 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
Quote:SamG
"Yes, the properties of the VMS text are like those of an unencrypted natural language text, and cannot be explained in any other way that is known and can be demonstrated."

I prefer to think of it as having properties that make it LOOK like unencrypted natural language (Latin, to be exact), especially in the way a, o, and Latin shapes are incorporated into the text.

And interestingly enough, these glyphs that resemble vowels in the Roman alphabet also seem to behave as vowels, in that they occur in every word and delimit groups of 1-2 consonants.  To me that suggests an obvious explanation: they are vowels.

And what other plausible explanation is there, really?  Some unknown and undemonstrated cipher mechanism that magically produces ciphertext with any and all properties we might desire?

Quote:But, as I said up-thread (and have been saying all along), it is not structured like natural language, at least not if the spaces are interpreted literally (and even if you try to process the spaces in a different way, the positional rigidity remains).

I disagree that it is overly rigid.  Rigid phonotactic structures are common in many languages.  I think e.g. Mandarin Chinese is probably as rigid as Voynichese or maybe more so, but the issue is that Voynichese is differently rigid.  I don't know of any language that works exactly like Voynichese, but then I don't think that two different languages ever have exactly the same phonotactic structure.
(23-02-2017, 10:41 AM)Sam G Wrote: You are not allowed to view links. Register or Login to view.And interestingly enough, these glyphs that resemble vowels in the Roman alphabet also seem to behave as vowels, in that they occur in every word and delimit groups of 1-2 consonants.  To me that suggests an obvious explanation: they are vowels.

And what other plausible explanation is there, really?  Some unknown and undemonstrated cipher mechanism that magically produces ciphertext with any and all properties we might desire?


Have you looked at how many "o" glyphs there are and how they are positioned? Do you truly believe these are all vowels?
(23-02-2017, 10:41 AM)Sam G Wrote: You are not allowed to view links. Register or Login to view.
(23-02-2017, 09:58 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
Quote:SamG
"Yes, the properties of the VMS text are like those of an unencrypted natural language text, and cannot be explained in any other way that is known and can be demonstrated."

I prefer to think of it as having properties that make it LOOK like unencrypted natural language (Latin, to be exact), especially in the way a, o, and Latin shapes are incorporated into the text.

And interestingly enough, these glyphs that resemble vowels in the Roman alphabet also seem to behave as vowels, in that they occur in every word and delimit groups of 1-2 consonants.  To me that suggests an obvious explanation: they are vowels.

Suggesting that some symbols represent vowels assumes that the symbols represent either letters or sounds.
Both are natural assumptions, but I have very severe doubts about them.

The following symbols:  q  f  p  m  y   are demonstrably not to be identified with letters.
That's five out of (say) 25. How confident can one be that the others are?
And even if they are, what to make of a mixture of letters and non-letters?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20