Inside the Voynich Network: graph analysis

Inside the Voynich Network: graph analysis - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Inside the Voynich Network: graph analysis (/thread-4998.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11

RE: Inside the Voynich Network: graph analysis - Philipp Harland - 18-11-2025

Seems like a very interesting method. I don't know if it's in the literature or not but it's fascinating nonetheless. Heart

Is it truly research-grade, i.e. can it produce non-trivial results that couldn't be produced without it? It seems like it's working pretty well for the VMS.

RE: Inside the Voynich Network: graph analysis - Rafal - 18-11-2025

Quimqu, I would have one more question. How your methods treat languages with declension?

Take for example:

Latin: Homo homini lupus est
English: Man is a wolf to man

In English we have "man" word repeated twice. In Latin it is "homo" and "homini".

Do your methods know that "homo" and "homini" are in fact the same word? I guess they don't. They treat it the same as "homo" and "lupus", as totally different words, right?

Would it mean that there are less patterns in languages with declension?

RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 18-11-2025

(18-11-2025, 01:42 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Would it mean that there are less patterns in languages with declension?

Good question! Experiments with the same text in different languages (with very different grammar -- which unfortunately is not the case of Spanish vs. Portuguese) would be a big step in answering this question.

All the best, --stolfi

RE: Inside the Voynich Network: graph analysis - quimqu - 18-11-2025

(18-11-2025, 01:42 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Quimqu, I would have one more question. How your methods treat languages with declension?

Take for example:

Latin: Homo homini lupus est
English: Man is a wolf to man

In English we have "man" word repeated twice. In Latin it is "homo" and "homini".

Do your methods know that "homo" and "homini" are in fact the same word? I guess they don't. They treat it the same as "homo" and "lupus", as totally different words, right?

Would it mean that there are less patterns in languages with declension?

Yes, any difference in words is treated as a different word. It would be good to try to join the declined words (or plurals). I must think how to do it. But the thing is, that we don't know the functions of Voynich's words, so I really don't know if we should do this. Are really qokedy, qokeedy qokeeedy from the same root?

RE: Inside the Voynich Network: graph analysis - quimqu - 18-11-2025

(17-11-2025, 02:50 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I have the following versions of the Pentateuch

Hello Jorge,

I have also searched for the Pentateuch in Spanish, French, English (King James version) and German (Luther's version). I started analysing the graphs, what might take almos 20hours. But what stunned me is following (tokens are words):

English: 157604 tokens, 4703 unique.
French: 150940 tokens, 7570 unique.
German: 143146 tokens, 7317 unique.
Hebrew: 66311 tokens, 20976 unique.
Latin: 96870 tokens, 14001 unique.
Mandarin Other: 193335 tokens, 2267 unique.
Mandarin Union: 174380 tokens, 2178 unique.
Russian: 112011 tokens, 12443 unique.
Spanish: 138777 tokens, 8572 unique.
Vietnamese: 146634 tokens, 4213 unique.

Does this make sense?

RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 18-11-2025

(18-11-2025, 03:29 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Pentateuch

English: 157604 tokens, 4703 unique.
French: 150940 tokens, 7570 unique.
German: 143146 tokens, 7317 unique.
Hebrew: 66311 tokens, 20976 unique.
Latin: 96870 tokens, 14001 unique.
Mandarin Other: 193335 tokens, 2267 unique.
Mandarin Union: 174380 tokens, 2178 unique.
Russian: 112011 tokens, 12443 unique.
Spanish: 138777 tokens, 8572 unique.
Vietnamese: 146634 tokens, 4213 unique.

Does this make sense?

It is expected that both numbers will vary a lot depending on the language.

Hebrew (like Arabic) has many You are not allowed to view links. Register or Login to view. that require paraphrasing when translated into other languages. For instance Arabic "yatakātabūna" = "both men wrote to each other".

Latin has no articles, and uses declensions instead of prepositions "of", "to", "from". English and Romance have articles and use prepositions instead of declensions. (And Portuguese, unlike Spanish, contracts many prepositions+articles into single words) German has articles and is halfway between English and Latin with respect to prepositions.

In English the subject pronoun is mandatory, whereas in most other IE languages it is implied by the verbal inflection: Italian "però canta" = "however he sings". Also it uses auxiliary verbs instead of inflections for some tenses: Italian "canterà" = "[he] will sing", Portuguese "cantara" = "[he] had sung".

In Italian and Spanish, the oblique pronouns are often attached to the verb: Italian "portiamocelo" = "let's take it with us" In Portuguese they may be hyphenated to it. Or sometimes inserted in the middle of it: "cantá-lo-ei" = "I will sing it"

Russian, IIUC, has no articles an it too uses declensions instead of some prepositions.

A large fraction of nouns and verbs that are single words in European or Semitic languages languages are two-word compounds in Mandarin and Vietnamese. In traditional script (and in your files) those compounds are not marked; the two parts are written as separate words/characters. (They are often hyphenated in pinyin tanscriptions, but based on some Western language.) Moreover the words are a single sylalble with a rigid structure. Hence the number of tokens is expected to be higher and the number of lexemes, smaller.

German, on the other hand, has the habit or merging nominal phrases into single words: "Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz" = "Law for the transfer of monitoring duties for the labeling of beef with information about cattle identification".

Also, in Mandarin and Vietnamese there is no clear division of roots into nouns, verbs, adjectives, and adverbs, which is a characteristic feature of Indo-European languages. They are somewhat like English (which I do not consider to be an IE language, partly for that reason), where you can say "that is a big stone", "they were going to stone him", "it is a stone building", "the floor was stone cold".

All the best, --stolfi

RE: Inside the Voynich Network: graph analysis - Rafal - 18-11-2025

Quote:Does this make sense?

It makes a lot of sense. Compare English to Latin and Russian.

English: no declension, more words, less unique words
Latin/Russian: declension, less words, more unique words

I did some tests myself once. Languages with declension use less words because words carry more meaning. From my example "homini" means "to man". English needs two words where Latin makes it with one word in proper form.

Hebrew is weird in that list. I don't know it but Internet says it has no declension. Yet is has little amount of words and lot of unique ones. I suspect some mistake could happen here

RE: Inside the Voynich Network: graph analysis - Rafal - 18-11-2025

Quote:It would be good to try to join the declined words (or plurals). I must think how to do it

I believe you won't do it in an easy way.

There may be several patterns of declension in a language and a lot of exceptions so writing some general rules rather wouldn't work. You would basically need for each language a database of words linking forms of the same word.

I don't know if such things exist.

RE: Inside the Voynich Network: graph analysis - quimqu - 18-11-2025

(18-11-2025, 04:22 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.
Quote:It would be good to try to join the declined words (or plurals). I must think how to do it

I believe you won't do it in an easy way.

There may be several patterns of declension in a language and a lot of exceptions so writing some general rules rather wouldn't work. You would basically need for each language a database of words linking forms of the same word.

I don't know if such things exist.

I was thinking of calculating the word similarity and given a high threshold, join words. If I give a threshold of one or two character change, I could reduce the graph dimension and the number of different tokens. But this needs to be tested and proofed... I really don't know if this can be so easy... (Guess not).

RE: Inside the Voynich Network: graph analysis - Mauro - 18-11-2025

(18-11-2025, 04:49 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.I was thinking of calculating the word similarity and given a high threshold, join words. If I give a threshold of one or two character change, I could reduce the graph dimension and the number of different tokens. But this needs to be tested and proofed... I really don't know if this can be so easy... (Guess not).

It's probably not useful. Ie. in Italian 'casa' and 'case' are related ('house' and 'houses'), but caso ('case') is not. I doubt it can be done without knowing the language.