The Voynich Ninja
The incompatibility of Voynichese with natural human language - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The incompatibility of Voynichese with natural human language (/thread-3124.html)

Pages: 1 2 3 4 5 6


RE: The incompatibility of Voynichese with natural human language - elieD - 15-04-2020

(13-04-2020, 08:12 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Elie, you seem to have taken that very personally (and ignored the comment I made about the content of your paper), but I did not post it to disparage you. You assumed that. It's better to look at the big picture and not interpret everything on the forum as a personal jab.

I posted the link because Nick Pelling recently blogged about preprint servers and I don't think the one you used was on the list, so I provided a link so that people on the forum would be aware that there are more than what he mentioned and that it was the one you chose.


I do not judge the content of a paper by where it is posted. I judge it by what is written.

Yeah I was a little on the defensive sorry. I made the answer before reading your second comment on my paper.
To get back on the core subject, my paper sadly doesn't prove anything. 
A way to have more information on the origins of VMS, would be to understand how someone can come up with these characters. Characters usually don't come from anything, except for made up languages of course. For real languages, they usually derive from an other language. From ancient greek to latin, we got α -> a, β -> B etc.. The idea of my paper was that, if a language was close enough to VMS to have such association, then I could detect it because if we got words like "β[font=Tahoma, Verdana, Arial, sans-serif]α" in the original language, I had an algorithm that could associate back B>[font=Tahoma, Verdana, Arial, sans-serif]β and [font=Tahoma, Verdana, Arial, sans-serif]a>α[/font]. We had AncientGreek => Latin, so we should have X => Voynichese. This algorithm would have given X. The idea is to find X.  Even if we found X1 with X => X1, or even if we only had X2 with X => X1 => X2. Knowing that X2 is related to Voynichese in a way or another would be an enormous success. My algorithm did find a lot of X, for example we have X1 => XFrench and X1 => XSpanich. My algorithm found that Spanish and French were related based on character association. Sadly, even if Greek and Latin are related, my algorithm didn't find this association. But it doesn't really matter, for each languages, I only need to find at least one close language. 

Theoretically, all languages are linked to other languages (we don't often create a language from nothing then destroy it like it never existed, without it evolving). It doesn't necessary mean we have one tree, but usually we don't have one language unlinked to all others. This is what I studied in my paper. And indeed the algorithm worked for a lot of languages (latin-based, germanic, and even for asian languages). But the algorithm mostly failed for outliers of my statistical analysis (except old french and scottish gaelic). Why so?

[/font]
[/font]
Outliers are human languages, but they have specificities. There are old languages (old french), languages from every countries (brazil, philippines, england, etc.), mostly from tribes, but most importantly, all close languages, all outliers seem to be, at some point, the restranscription of someone talking. Which would explain a lot for VMS.
Now, there is two possibilities I think. Voynichese was forged by someone to transcribe an unknown language he/she encountered. Voynichese was a real language that derived from others, and I still think the VMS is transcribed from someone talking. But I tried, and I didn't find a way to confirm the second hypothesis. So here are the tracks in my opinion for the two possibilities:

(1) VMS was forged to transcribe an unknown language. The main idea is "Hey, this guy speaks a language we don't know, we must invent new characters for his new language". In this situation, we can't trust characters. They refer to nothing we know, they were invented for the occasion. In this case, it will require a tremendous amount of work to translate VMS. If I was in this situation, I would invent letters and say "these N letters, put together, are for this syllable". We need to reconstruct this association. So the idea would be to identify clusters of letters, and to associate groups of letters from one language to groups of letters from another language. The same way I did with characters, but this time for hypothethic syllable. This would probably be more robust for an analysis than computerized transcription like V101 or EVA, because errors like "m" in V101 being 'iin' in EVA would be gathered in clusters in this model, individual errors on segmented letters wouldn't matter.

(2) Voynichese was a real language that derived from others. In this case, I think our best chance is to look on characters. I already tried computer algorithm with more than 80 languages to find a character association, so the quantitative analysis is probably a dead end. We need a qualitative analysis. We need people looking at every set of characters that existed in the world to find which could be the closest to voynichese. People even looked at Komi Zyrian ( You are not allowed to view links. Register or Login to view. ). This kind of tracks seems promising to me. The other solution would be to have one expert in each languages do what I did in my paper between VMS and OldFrench. I managed to translate more than 15% of unique VMS words, and it only gave garbage so the only way to know if the algorithm find an association between VMS and other language is probably to have an expert (native of the language) looking at it. And all of that was with V101 but it would be required to be done with EVA. It's highly time-consuming and I would deter doing that without any further qualitative analysis on the origins of VMS.

===> What's important to get is that, for all the outliers I got, only a few could be understood with the second method. If you take Mbya Guarani for example, it is based on oral transmission. You have X => MbyaGuarani, but you almost don't have any information on X. You can't do an analysis based on that. Even if we had X (oral) => Voynichese (written) and X (oral) => X1 (oral) => X2 (oral) => (...) => X15 (written). It could mean we wouldn't be able at all to associate X15 and Voynichese, whatever the method we use is, in this case VMS would probably be a forever mystery, this is truly the worst scenario. Voynichese was written at some point to give us VMS, but if the sound someone made which someone else wrote doesn't represent anything at all for us today, we can't translate VMS (even if it meant something at some point). VMS would be a dead end, an unsolvable mystery forever (almost, the rest of our chances would be artificial general intelligence and method to go back from X15 to X, all languages have things in common but for Voynichese as for some outliers, we can't hope to base our algorithms on that to understand it). The first method could work for all hypothesis in fact, it's just that it's probably extremely hard to implement, validate, and have enough computational power to test it efficiently.

To summarize everything I would just say: How would you attack Mbya Guarani or Tagalog if you only had a romanized text of these languages at your disposition? Could you translate it? I think this is the real question to ask ourselves. Look at that: You are not allowed to view links. Register or Login to view. . Translate this without any other information. One of my idea would be to try finding key words like names or cities, no matter how they would be represented. In this retranscription of Mbya Guarani, you have for example "Ko'apy roju rire ma , ou kuri cheñora , Patricia Madre Tierra oje'ea .", which is an enormous amount of information. Now imagine this, with other characters, not knowing the real association between visual symbols and characters (v101 & eva). This would also be a great way to find a solution. But even with this, you would only know how to map unknown characters to known characters, it doesn't mean you will be able to understand Mbya Guarani. But from this you could at least understand the sound made when talking voynichese, and go from that to the original language.



RE: The incompatibility of Voynichese with natural human language - Aga Tentakulus - 16-04-2020

A library has more than Mickey Mouse and Play Boy

I...
I understood the manuscript.
But nobody understands me.
The general opinion is the rule.
No rule without an exception.
Me!

More tomorrow when I sober up.

Noblesse oblige


RE: The incompatibility of Voynichese with natural human language - Alin_J - 29-05-2020

(11-03-2020, 08:55 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Thank you, Jonas!
As a percentage, I consider the number of perfectly reduplicating consecutive word couples vs the total number of consecutive word couples (N.words-1).
According to the numbers you provided, we have 64/40.000 i.e. 0.16%, about one fifth of the VMS average. But, as pointed out by your expected/actual argument, it is clear that these occurrences are very significant.

It seems to me that the majority of the cases occur in the position you described: around line breaks (a line ending with word W, followed by a line starting with word W). This is something I have never seen: definitely interesting.
Possibly even more interesting, the text also features several instances of quasi reduplication. E.g.:
tuuriasi, tuumiasi.
mustien mutien
selvältä selältä

My first impression is that, also in this case, we are far below VMS rates, still it is very exciting to see a text featuring both systematic reduplication and quasi-reduplication!
Thank you again for mentioning Kalevala!

Hey, I thought I should also mention now that I happened to just have analysed it, that the Japanese language sample (N-JPN) in You are not allowed to view links. Register or Login to view. contains 454 perfectly reduplicating words among the first 40710 tokens. According to how you measure, this is about 1%, isn't it?

Because the type-token ratio is so low for this text, this is about the same as you would find if the tokens were shuffled randomly, so according to my way of counting it is not as significant as Kalevala in this regard. But it is according to how you count. There are still no higher number of repetitions than pairs though.


RE: The incompatibility of Voynichese with natural human language - MarcoP - 29-05-2020

Thank you, Jonas, that's interesting!
I plotted a similar figure in the second chart You are not allowed to view links. Register or Login to view., but I did not examine the contents of the file then.
It seems that the vast majority of the repetitions are mono-syllabic e.g. "to to" "i i" "ke ke". I would be curious to know if the repetitions are an effect of the conversion to the Latin alphabet.


RE: The incompatibility of Voynichese with natural human language - Alin_J - 29-05-2020

(29-05-2020, 06:31 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Thank you, Jonas, that's interesting!
I plotted a similar figure in the second chart You are not allowed to view links. Register or Login to view., but I did not examine the contents of the file then.
It seems that the vast majority of the repetitions are mono-syllabic e.g. "to to" "i i" "ke ke". I would be curious to know if the repetitions are an effect of the conversion to the Latin alphabet.

Yes, you did, I missed that. Thank you.
According to the description file, this Japanese text has "Limited set of phonemes. Extremely shallow orthographic depth. Strong CVC structure. Originally logographic and syllabic script." and is "Romanised with the Hepburn system, without capitalisation."
Whatever this all means (it doesn't make much sense to me at this point), maybe it could hint towards a conversion effect.


RE: The incompatibility of Voynichese with natural human language - -JKP- - 30-05-2020

There are some African languages that have similarities to Japanese. I can't offhand remember exactly which ones because there are many African languages and it's difficult to trace back those that were in general use in the 15th century. Aramaic has some of these properties, as well. Old Ge'ez is abjadic (it has semitic origins, like Hebrew). There are a number of African languages that are syllabic.

Just as some middle eastern languages are still written in both Latin and Cyrillic scripts, or both Arabic and Cyrillic scripts, some north African languages were written with both Ge'ez and Latin scripts.


The Ge'ez script has a form of benching, although the benching is not the predominant feature of the script. To me, what stands out about it is its logical regularity. The benched letter sits on top of the bench (it doesn't cut through it). In both visual and mechanical terms, it is less similar to the VMS glyphs than the form of benching that occurs in Greek.

Ge'ez is used to write a number of languages (Ge'ez is one of the scripts used in Ethiopia and some of the surrounding regions). Ethiopia was on European maps long before most other parts of Africa (with pictures of little European castles that were generally not indicative of the actual architecture). I have frequently seen Ethiopia on medieval mappa mundae. It was a trade center and a Christian pilgrimage destination.


RE: The incompatibility of Voynichese with natural human language - davidjackson - 05-06-2020

(30-05-2020, 04:38 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.There are some African languages that have similarities to Japanese.
You mean similar use of tonal levels, as used in Asian languages? Independent development, surely.


RE: The incompatibility of Voynichese with natural human language - -JKP- - 05-06-2020

Japanese is not tonal (Chinese is tonal and some African languages are tonal and some are not). Japanese is generally easier for westerners to learn than Chinese.

Japanese is similar to some of the African languages in the sense of being syllabic. I don't think anyone has demonstrated any direct linguistic connection between African and east Asian languages, but the subject comes up from time to time.


I mentioned it because many of the researchers who have done computational attacks that indicate similarities with Japanese syllabic structure may not be aware that a number of African languages also share a syllabic structure.


RE: The incompatibility of Voynichese with natural human language - Aga Tentakulus - 06-06-2020

No need to go to Japan

s isch cho, chäschüechli und s chuchichäschtli, casch au cho.  Wink