The Voynich Ninja

Full Version: A key to understand the VMS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(20-02-2017, 08:15 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.In Vietnamese it is indeed possible to build a network of similar words. This is the case since many smaller networks build a larger network. The reason for this is that Vietnamese is a monosyllabic language and is using tone to distinguish lexical or grammatical meaning. Therefore many short words are used. Ok, a large network of similar words is not an unique feature for the VMS.

Well, there you go. My main point is that statements of the form "no language has such-and-such property" need to actually be checked against a wide variety of languages.

Quote:But did this mean that we should assume that the text of the VMS represents a monosyllabic language? One feature that doesn't seam to fit is the existence of composed word types like 'olchedy' beside words like 'ol' and 'chedy'. BTW: Also the Vietnamese text contains repeated phrases like 'người đàn'. A feature that is missing for the VMS.

I agree that it's not the same in every respect. The main similarity is the rigid phonotactic structure which allows the smaller words to be connected into a network.

In Mandarin Chinese, the two-syllable words are disconnected from the one-syllable word network because there are no words of intermediate length to bridge the gap between the two sets of words.

But some languages with rigid phonotactic structures do have such intermediate length words. Many languages of mainland Southeast Asia have words that are "sesquisyllabic", or "one and a half syllables".

You are not allowed to view links. Register or Login to view.

You are not allowed to view links. Register or Login to view.

To oversimplify a bit, in order to form a network in Mandarin Chinese you would need to be able to go from words with a CV structure to words with a CVCV structure, which obviously can't be done with an edit distance of 1. But in some languages you can go CV --> CCV --> CVCV.

This might not be exactly the same as Voynichese either but I think it's another step closer.
Quote:Interesting to see that lines using a word up to eight times exist in the VMS. Maybe I should use it to illustrate this fact.

I am going into discussion with you 
if you show you've read the underlying article.
[quote pid='12153' dateline='1487670117']
Well, there you go.  My main point is that statements of the form "no language has such-and-such property" need to actually be checked against a wide variety of languages.
[/quote]

Thank you Sam, for getting the thread to this point.

If Torsten would check his other charts  against more languages (the ones regarding edit distance and line position) we might have better grounds to conclude this whole discussion one way or the other.
(21-02-2017, 06:57 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.If Torsten would check his other charts  against more languages (the ones regarding edit distance and line position) we might have better grounds to conclude this whole discussion one way or the other.

Can you name a language and a source for a sample text?
(21-02-2017, 11:07 AM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.
Quote:Interesting to see that lines using a word up to eight times exist in the VMS. Maybe I should use it to illustrate this fact.

I am going into discussion with you 
if you show you've read the underlying article.

Your write about words repeated within a line. A finding of me was that similar words can be found above each other twice as often as they can be found side by side [see You are not allowed to view links. Register or Login to view.].

You write that the VMS did not contain a randomly created language. I also come to the conclusion that the text from the VMS is far from being random. For instance I wrote about "a pattern for the usage of similar words. They are not randomly distributed within the VMS but are used on the same pages next to each other" [You are not allowed to view links. Register or Login to view.]. 

I didn't see any contradiction.
(21-02-2017, 08:40 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(21-02-2017, 06:57 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.If Torsten would check his other charts  against more languages (the ones regarding edit distance and line position) we might have better grounds to conclude this whole discussion one way or the other.

Can you name a language and a source for a sample text?

If people in this thread name some, will you do it?
(21-02-2017, 11:22 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.
(21-02-2017, 08:40 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(21-02-2017, 06:57 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.If Torsten would check his other charts  against more languages (the ones regarding edit distance and line position) we might have better grounds to conclude this whole discussion one way or the other.

Can you name a language and a source for a sample text?

If people in this thread name some, will you do it?

I would do it for one sample text in one language. If you want to check any text class in any language I would give you the source code instead.
Quote:TORSTEN:
Your write about words repeated within a line. A finding of me was that similar words can be found above each other twice as often as they can be found side by side [see You are not allowed to view links. Register or Login to view.].

You write that the VMS did not contain a randomly created language. I also come to the conclusion that the text from the VMS is far from being random. For instance I wrote about "a pattern for the usage of similar words. They are not randomly distributed within the VMS but are used on the same pages next to each other" [You are not allowed to view links. Register or Login to view.]. 

I didn't see any contradiction.

I agree that the text is not random.
A year ago I still thought the text was a language text, the horizontal repeats (together with my other research) shows that the text is not a language.

The difference between my conclusion and yours is this: my conclusion is based on comparing languages and texts. 
Not 1 not 3 but about 50-100 large corpora from different languages and time frames have been compared. 
Based on that you can draw conclusions.

You found vertical repeats as well. Yes, I can see them but the relation between vertical repeats and the horizontal is imo a far fetch: for example the words have not been aligned vertically.  But perhaps I will run stats on those as well in the future.
A language that might satisfy the criteria of words forming a network and having both one-syllable and two-syllable words could be the Mon language.  It was once widely spoken in Southeast Asia but is now spoken only by ethnic minorities in Thailand and Burma.

You are not allowed to view links. Register or Login to view.

Unfortunately, you can probably forget about finding a convenient sample text for this language.  There's a dictionary, which might be good enough, but the interface is clunky, and there doesn't seem to be a way to extract a complete word list.

You are not allowed to view links. Register or Login to view.

But here's an example I was able to come up with quickly for how a two-syllable word can be built up from a one-syllable word:

  la - Mule, donkey.
  lak - To go through, reach the end of, arrive at ultimately.
  klak -  To be dun-coloured, dirty, dusty.
  kəlak - To splash about, be scattered.

It looks like Mon probably has a lot of words like that, though what fraction of the words could be connected into a complete network I don't know.
(21-02-2017, 10:41 AM)Sam G Wrote: You are not allowed to view links. Register or Login to view.
(20-02-2017, 08:15 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.In Vietnamese it is indeed possible to build a network of similar words. This is the case since many smaller networks build a larger network. The reason for this is that Vietnamese is a monosyllabic language and is using tone to distinguish lexical or grammatical meaning. Therefore many short words are used. Ok, a large network of similar words is not an unique feature for the VMS.

Well, there you go.  My main point is that statements of the form "no language has such-and-such property" need to actually be checked against a wide variety of languages.

Vietnamese is a monosyllabic language. Therefore the words can be written with three letters. For words with two, three or four letters the number of available changes is limited. Therefore the effect is explainable in Vietnamese. In the case of the VMS we can found beside [qokedaiin] also the words [qokeedaiin], [qokedain], [qotedaiin], [okedaiin]. Even if we would interpret glyphs like [i], [ii] and [iii] as diacritical marks the explanation for Vietnamese would not fit for the VMS. 
Moreover in the case of the VMS the network of similar words is more homogeneous then for Vietnamese. In some way all words are similar to each other. 

One reason for this effect is that common word types ending with "iin", "ol" and "dy" are combined with common prefixes like "d", "ch" and "qo". The following table combines all typical 'suffixes' and 'prefixes' and describes this way the main landmarks within the network of similar words for the VMS:

prefix  aiin    ol     dy
none    aiin    ol     dy
d-     daiin   dol    ddy
ch-   chaiin  chol  chedy
o-    okaiin  okol  okedy
qo-  qokaiin qokol qokedy

Nick Pelling has described this effect this way "a reconstructed Voynichese 'dictionary' would, to a modern computer scientist’s eyes, look very much as if it had been generated or permuted by some means." [You are not allowed to view links. Register or Login to view.]. The result is a network of similar words in both cases but in my eyes the reason for this result is different in both cases. 
 
Quote:
Quote:But did this mean that we should assume that the text of the VMS represents a monosyllabic language? One feature that doesn't seam to fit is the existence of composed word types like 'olchedy' beside words like 'ol' and 'chedy'. BTW: Also the Vietnamese text contains repeated phrases like 'người đàn'. A feature that is missing for the VMS.

I agree that it's not the same in every respect.  The main similarity is the rigid phonotactic structure which allows the smaller words to be connected into a network.

In Mandarin Chinese, the two-syllable words are disconnected from the one-syllable word network because there are no words of intermediate length to bridge the gap between the two sets of words.  

But some languages with rigid phonotactic structures do have such intermediate length words.  Many languages of mainland Southeast Asia have words that are "sesquisyllabic", or "one and a half syllables".

You are not allowed to view links. Register or Login to view.

You are not allowed to view links. Register or Login to view.

To oversimplify a bit, in order to form a network in Mandarin Chinese you would need to be able to go from words with a CV structure to words with a CVCV structure, which obviously can't be done with an edit distance of 1.  But in some languages you can go CV --> CCV --> CVCV.

This might not be exactly the same as Voynichese either but I think it's another step closer.

The problem is that we didn't know if the VMS contains language or not. The Ethnologue catalogue of world languages currently lists 7099 living languages [You are not allowed to view links. Register or Login to view.]. Therefore it is no surprise if it is possible to find for a single feature of the VMS a language with a similar feature.

What characteristic features for the VMS exists beside the network of similar words?

One feature is the weak word order. In a text using human language grammatical relations should exist between words, and these relations should result in words used together multiple times. Therefore, the lack of repetitive phrases is surprising for the VMS. Moreover since the weak word order exists beside the network of similar words the existence of both features together is a challenge anyway. 

Another feature typical for the VMS is that the change from Currier A to Currier B. Typical for the sections using Currier A are word types similar to [daiin] and [chol] and typical for sections using Currier B are word types similar to [chedy] and types starting with [qo]. There is no clean distinction between Currier A and Currier B. Therefore it is not possible to explain this feature as two distinct languages. The following table shows the frequencies for some words typical for Currier A like [daiin] and [chol] and for Currier B like [chedy], [qokaiin] and [qokeedy]. This way it is possible to demonstrate a steady development from Currier A to Currier B.

section               daiin aiin qokaiin chol[font=Courier New] qokol cheody chedy shedy qokeedy  total word count[/font]
Herbal in Currier A     403   33       1  228[font=Courier New]    24      8     1     0       0        8087[/font]
Pharmaceutical (A)       99   39       2   45[font=Courier New]    20     18     1     1       0        2529[/font]
Astronomical             23   38       0    8[font=Courier New]     1      8     4     0       0        2136[/font]
Cosmological             36   56      18   19     5      7    24    17       4        2691
Herbal in Currier B      72   72      20   13    10      7    62    35       9        3233
Stars (B)               122  193     114   62    13     33   190   113     137       10673
Biological (B)           84   32      88   14    28      0   210   247     153        6911

The table shows that a word like [shedy] is only frequent in sections where also the word [chedy] is frequent. This is a hint for another stunning feature of the VMS. Similarly spelled word types co-occur within the VMS [see You are not allowed to view links. Register or Login to view. or You are not allowed to view links. Register or Login to view.].

There are more interesting features for the VMS. For instance the line is a functional unit [see You are not allowed to view links. Register or Login to view.]. The shape of letter determines in some way how the letter is used within a word or within a line or within a paragraph. ...

With other words we search for a system with many interesting features at the same time. It is using the same or similar words but not the same or similar word sequences. Additionally this system is changing over time. Did this features really describe language?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20