The Voynich Ninja

Full Version: Hypervector Analysis of the Voynich Manuscript
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Hello All

My name is Darrin Vallis. I'm a computer and electrical engineer, amateur historian and Voynich researcher. I recently implemented a mathematical method of hypervector analysis comparing Voynich to fifty other languages. My results are shown below. Voynich structure puts it in the Caucasian family of languages.

A quick summary of the method is described in You are not allowed to view links. Register or Login to view.

You can read complete details of the mathematics and implementation at You are not allowed to view links. Register or Login to view.

Regards
Darrin Vallis

[Image: pcacomp-1024x622.jpg]
Quote:All files were converted to lower case, punctuation and numeric type removed and text normalized to 120 character lines. Languages in non-Latin characters were machine transliterated before normalization. Takeshi Takahashi’s transliteration of the Voynich manuscript into Latin text using the EVA alphabet was also normalized and included in the files.


I have a couple of questions in terms of interpretating your diagram...

Did you use consonants and vowels to generate the comparison?

Word length?

Exactly what was compared?
A hypervector was generated for each language. They are shown compared on the plot. You didn't read the linked article. I wrote:

Now we apply Kanerva’s theory. Assign random hypervectors to the twenty six letters of the alphabet and a space character. Parse any large text sample with a sliding three letter window, building a series of trigrams. Combining the trigrams yields a hyperdimensional vector signature for the text sample. Languages which are similar will have hypervectors close together. (Like the previous V and W vectors) Dissimilar languages will have orthogonal vectors.
There are some errors in the language categories: Romance is an alternative name for Italic.   Welsh isn't Italic, it's Celtic.   Altaic is no longer accepted as a language family.   Malayalam and Kannada are Dravidian, not Indic.   Georgian is Kartvelian, not Caucasian.   Hittite is Anatolian.    Though, to be fair,  maybe none of that matters if texts in the various languages you analysed cluster in the way you found them to.

For an unknown script, there is no difference between plaintext and simple substitution.

By specifying Mandarin (as opposed Chinese), am I right in assuming you used something like Pinyin instead of Chinese characters (as well as phonetic transcriptions for Japanese and Korean scripts)?

But curiously, you find Voynichese in a Caucasian language cluster, along with the North West Caucasian languages, Adyghe and Abaza.   I reached a similar conclusion (You are not allowed to view links. Register or Login to view., You are not allowed to view links. Register or Login to view.).   The closest match I found was with Abkhaz, another North West Caucasian language.  The most likely explanation, though, is simply that Abkhaz has only two vowel phonemes and there's no clear vowel branch in Voynichese.

Although I think the Voynich manuscript text is probably meaningless, in the event it is a meaningful plaintext/simple substitution cipher, I concluded that it must be in a language phonetically similar to North West Caucasian languages.   The practical problem, as you've pointed out, is that very few people know anything about them.      The best way forward is to try to match Voynichese glyphs to phonemes/Cyrillic letters before involving a native speaker.
(09-08-2020, 05:17 AM)dvallis Wrote: You are not allowed to view links. Register or Login to view.
A hypervector was generated for each language. They are shown compared on the plot. You didn't read the linked article. I wrote:

Now we apply Kanerva’s theory. Assign random hypervectors to the twenty six letters of the alphabet and a space character. Parse any large text sample with a sliding three letter window, building a series of trigrams. Combining the trigrams yields a hyperdimensional vector signature for the text sample. Languages which are similar will have hypervectors close together. (Like the previous V and W vectors) Dissimilar languages will have orthogonal vectors.


Yes, I did read the article. Twice. But EVA is not an alphabet and it is not Voynichese. It is a mnemonic system based on common Latin characters.

I find your graph interesting even without the Voynich-related data, but it's not a graph that includes Voynichese, it's a graph that includes a mnemonic transmutation of Voynichese intentionally made more similar to natural language so that it is easier to type, read, and remember.

Additionally, EVA makes assumptions about which glyphs and symbols are singles and which are ligatures

That's fine, the position of EVAchese on the language graph might be useful information, but it's important to label it in a way that makes it clear that it's not Voynichese. It's EVAchese. Voynichese might end up somewhere else on the graph. It probably wouldn't be far away from EVAchese (specifically Takahashi EVAchese), but it still might be different.
Quote:All files were converted to lower case, punctuation and numeric type removed and text normalized to 120 character lines.


Something I forgot to ask, was Hebrew derived from Romanized Hebrew, vowel-Hebrew, or abjadic Hebrew (without points/vowels)?

I was also curious about the source-text for Mandarin. Mandarin is tonal. Depending on how it is transliterated, the tones are sometimes lost, which creates words that are ambiguous and look the same but have different meanings but which might be written differently from one another in other character systems.
I understand that parsing text according to trigrams is not the same as evaluating it based on consonants and vowels. I brought in the subject of consonants and vowels mainly because there are choices in EVA, and also some mistakes in the Takahashi transcription, based on natural-language assumptions.
Hello Darrin, and thank you for your post. Your analysis looks very interesting. 
Could you please explain your methodology in detail, so that one would be able to replicate and build on your findings? I am particularly interested in how you dealt with comparing Voynich glyphs to alphabetic characters - since we don't have knowledge of the tonal sound of the Voynich glyphs I can instead imagine that etc the trigram-frequency distributions, or something similar, were compared among the languages. It appears to me then that one has to apply some sort of sorting algorithm to allow the vectors to be compared objectively. It strikes me that in doing so, one might be able to deduce some sort of glyph-tonal sound suggestions for the Voynich glyps based on what the most common ones are among the family of languages which are close together in the plot?
(09-08-2020, 12:54 AM)dvallis Wrote: You are not allowed to view links. Register or Login to view.analysis comparing Voynich to fifty other languages
Sorry, but it sounds too good to be true. The author easily finds 50 texts in 50 languages to compare to Voynich? The 15th century Ossetic texts? I hope this is not a distraction with no future.
(09-08-2020, 09:31 AM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.
(09-08-2020, 12:54 AM)dvallis Wrote: You are not allowed to view links. Register or Login to view.
analysis comparing Voynich to fifty other languages

Sorry, but it sounds too good to be true. The author easily finds 50 texts in 50 languages to compare to Voynich? The 15th century Ossetic texts? I hope this is not a distraction with no future.


How do you know it was easy? Much of the research I post has taken me years to find. Perhaps this is also true of Darrin's research.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13