![]() |
|
Inside the Voynich Network: graph analysis - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Inside the Voynich Network: graph analysis (/thread-4998.html) |
RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 05-11-2025 (05-11-2025, 05:22 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The author is anonymous. Here you can find the entry in Spanish Wikipedia. You are not allowed to view links. Register or Login to view. Ah, thanks! So it is almost certainly a single author. RE: Inside the Voynich Network: graph analysis - quimqu - 05-11-2025 (05-11-2025, 04:59 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The most interesting result is the proximity of the Spanish "head" texts to the Portuguese "tail ones, and vice-versa. Those are essentially distinct texts, in rather different languages and quite different spellings, technically by distinct authors (Machado in the latter, and Tapía translating Machado in the former). What they had in common was the higher-level nature and style of the work (grammatical variety, clause length, predominant verbal tenses, etc.), the general topic (which determined proper names and common concepts and actions) and whatever part of the author's style could survive the translation. I think it is interesting because it says that the graph is getting the structure of the text independently of the language. Maybe a good test would be to pass a text translated into not so near languages like Portuguese and Spanish, and see how the graph behaves. Because then, if the dots are close, we will have a tool to study the Voynich text independently from the alphabet or the "language". RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 06-11-2025 (05-11-2025, 11:37 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Maybe a good test would be to pass a text translated into not so near languages like Portuguese and Spanish, and see how the graph behaves. Because then, if the dots are close, we will have a tool to study the Voynich text independently from the alphabet or the "language". Good idea! Here are some versions of the Pentateuch (first five books of the Old Testament) in various languages. From my You are not allowed to view links. Register or Login to view.:
There is also a file "main.wds" that has the "main.src" digested into a more uniform format, with one word or punctuation per line. The first letter identifies the entry ("a" for text word, "p" for punctuation, "s" for symbol, etc.), with the object itself after a space. This file may be easier to convert to your format than the "main.src" one. At the top of each of these files there are a few "@chars" lines that specify the characters that can appear as parts of words (@chars alpha), of punctuation (@chars punct), and of numbers or other symbols (@chars symbol) The files are all in the ISO-latin-1 encoding, so you must pipe them through "recode latin-1..utf8" if your software expects Unicode. And be sure to use "wget" or the "Save link as..." button of your browser, rather than opening the file in the browser and copy-pasting the contents. Let me know if you need help. (For instance, I think I can convert the Chinese file to phonetic pinyin. I have the recipe saved somewhere. But I don't know whether the conversion will be 100% correct...) All the best, --stolfi RE: Inside the Voynich Network: graph analysis - quimqu - 06-11-2025 To go further into the Voynich graph analysis, I created a directed graph where every Voynich "word" is a node and each link connects words that appear next to each other inside the same paragraph. I avoided to connect words from one paragraph to the next one.Then I looked at how this network behaves. Several things were found. There's not so much new, but a new apporach that gives the same rsults.
With this analysis, we can say the Voynich text behaves like a system that keeps recycling a few basic pieces under certain rules, not like plain gibberish. According to the results, the next word is not random. it depends on the previous two or three words. This means the text follows a kind of rhythm or local grammar rather than fixed repeated phrases. I tested it in three ways:
An additional plot to show the strongest repeating word transitions in the Voynich text, where a few tokens like ol, shedy, and qokedy form the main loop of tightly connected words: RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 06-11-2025 (06-11-2025, 09:57 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.a few tokens like ol, shedy, and qokedy form the main loop of tightly connected words I am guessing that an edge directed from A to B counts the times that A appears before B in some parag. Is that correct? RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 06-11-2025 (06-11-2025, 09:57 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.the next word is not random. it depends on the previous two or three words. ... Conditional entropy drops sharply when more context is known So far this is a property of almost any text written in any natural language. And even of encrypted text, if each original word type is mapped to one encrypted word type. Or to a small number of types. Because of this property, a Markov model of order 2 (that chooses the next word at random, with frequencies based on the last two generated words) can produce pseudo-English (or pseudo-Mongolian) that is pretty much indistinguishable from the real thing -- to anyone who does not know the language. Therefore, the results of any word-level analysis of the VMS should be compared to those obtained from a sample of pseudo-Voynichese generated by a Markov of order 2 or 3. Comparing to simple random stream of words (as produced by a Markov of order zero) is not very useful. A millennium ago, Jacques Guy wrote such a generator, which he called "monkey". Alas I can find neither the program not its output. Maybe some other old-timer kept a copy? All the best, --stolfi RE: Inside the Voynich Network: graph analysis - quimqu - 06-11-2025 (06-11-2025, 10:19 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I am guessing that an edge directed from A to B counts the times that A appears before B in some parag. Is that correct? Yes, exactly. In the graph, a directed edge from A to B means that word A appears immediately before word B somewhere in the same paragraph. RE: Inside the Voynich Network: graph analysis - quimqu - 06-11-2025 (06-11-2025, 10:36 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.So far this is a property of almost any text written in any natural language. And even of encrypted text, if each original word type is mapped to one encrypted word type. Or to a small number of types. Well, that would be good news for the ones who expect that the Voynich have some sense. RE: Inside the Voynich Network: graph analysis - Jorge_Stolfi - 06-11-2025 Here are some additional files which you may find useful. In You are not allowed to view links. Register or Login to view.: Legitimate texts: Code: lines words bytes file Fake texts generated (mostly) from the above by a Markov of order 2: Code: lines words bytes file Same, but one word per line: Code: lines words bytes file Code: lines words bytes file The ".wdp" files have one word per line. The ".txt" files are the same but filled as parags to 72 columns. In all files, the end of a parag is denoted by a word "=" in a line by itself. The file encoding is ISO-latin-1. The Voynichese files are derived from a recent copy of Rene's transcription, which uses lowercase EVA. Line breaks and plant intrusions in the original text are not recorded. Parag breaks were inferred from the "locators" ("P+", "P=", etc.) Unfortunately the sections, especially Herbal-B, are rather short. The English files are derived from Culpeper's Herbal. Only the plant descriptions from the Herbal section proper were taken, omitting the "Place", "Time", and "Vertues" subsections, and the marginal notes. Punctiation, numbers, and symbols (including "&") are omitted. The words were all mapped to lowercase. The word characters are thus [a-z] plus apostrophe "'", "°" for abbreviation period, and "~" for hyphen. Hope it helps, --stolfi RE: Inside the Voynich Network: graph analysis - ReneZ - 07-11-2025 (06-11-2025, 10:36 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.A millennium ago, Jacques Guy wrote such a generator, which he called "monkey". Alas I can find neither the program not its output. Maybe some other old-timer kept a copy? I had a copy for quite a while, but no longer. I have also played with my own: You are not allowed to view links. Register or Login to view. The 'fun' results are in Annex A. |