The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

(09-02-2020, 11:21 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Here the differences between Voynich sections are striking: the most common word in Quire 20 (aiin, 1.8%) is less than half as frequent as the most common word in Herbal A (daiin, 5.0%). As discussed by Timm and Schinner, the fact that the most common words are different in the different sections is even more puzzling.

I have also noticed that the amounts of the most frequent words in the Voynich manuscript are varying more when comparing their partial amounts in the different sections, if I do equal comparisons in texts of known languages. But then the text samples I have analysed is mostly fiction-type and other stories or novels. I have not compared to more technical texts, such as manuals or encyclopedias, so I don't know if this would be significant with regards to determining if the text is meaningless or not.

(08-02-2020, 12:34 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.But then would a medieval scribe have the patience and/or the motivation to generate it?

My answer to your question is that it is the most effective way to generate some dummy text. Imagine that someone in medieval times wanted to write a mysterious book. His goal is to write a book nobody is able to read. What is the best way to generate manually large amounts of dummy text?

A) One possibility would be to copy some text using an exotic script and language. For doing so the writer has to find enough text using this exotic script. But the danger would be that someone knowing the script comes along and reads the text. If this happens the mystery is solved and the book of secrets would be no longer mysterious anymore.

B) Another possibility would be to invent gibberish by generating new words. But to invent something new is never an easy task. This is even true for generating gibberish out of the head. If we take the generation of passwords as an example we can find out that the most common passwords are easy to guess. Most commonly used are words like 'password', 'qwerty', or '12345678'. This passwords are used since it is far easier to copy or to repeat something already known than to invent something new.

C) A third possibility would be to copy some existing text and to obfuscate the copying process by adding various small changes. In a similar way someone writing a plagiarism might purposefully add some changes and mistakes while copying the source text.

The self citation method is a variant of method C). The main difference is that the source words are chosen from previously written lines of text. This method is not only very efficient. There are also further advantages. Since the idea of this method is to modify copied words it is impossible to misspell a word. Moreover it is also possible to generate words of a given length to fill out the lines to the margin. This is exactly what we observe in the case of the VMs. There are no corrections and the text perfectly fits into its margins.

What is now the most effective way to fill a page with some text? The easiest way to test all three methods is to generate some sample texts. This type of experiment could be done by writing three pages full of text and measure the time needed to do so:

In case A) copy some text from an external source. Copying errors are not allowed.
In case B) use arbitrary letter sequences without copying a previously written word and without repeating the word generation method. Copying errors and repeated words are not allowed.
In case C) start with a given word or phrase like 'some repeated words' and copy previously written words while replacing at least one letter with a different one. Copying errors and repeated words are allowed in this case.

(09-02-2020, 02:57 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.I have also noticed that the amounts of the most frequent words in the Voynich manuscript are varying more when comparing their partial amounts in the different sections, if I do equal comparisons in texts of known languages. But then the text samples I have analysed is mostly fiction-type and other stories or novels. I have not compared to more technical texts, such as manuals or encyclopedias, so I don't know if this would be significant with regards to determining if the text is meaningless or not.

Hi Jonas,
I agree about the importance of focussing on technical/scientific texts! Anton recently started a thread about the analysis of a specific type of medieval texts: You are not allowed to view links. Register or Login to view.. We have not made much progress yet, but exploring this "genre" will certainly result in interesting information, if we are persistent enough to collect and compare a sufficient number of documents.

The variability among Voynich sections does not only affect the most frequent words, but goes down to something as basic as digraphs. For instance, the HerbalA and BioQ13 sections have similar length.
The EVA sequence 'ed' occurs 12 times in HerbalA and 1928 in BioQ13.
The EVA sequence 'dy' occurs 510 times in HerbalA and 2089 in BioQ13.
The combined sequence 'edy': 5 vs 1771

The fact that 'chedy' is one of the 3 most frequent words in Currier B sections like Q20 and Q13 and not in HerbalA appears to be related with the most basic differences in digraph distribution. These appear to be a progressive drift in the writing/encoding system and it is extremely unlikely to depend on the contents.

The digraph drift is nicely illustrated in the graphs at the bottom of You are not allowed to view links. Register or Login to view. by Rene Zandbergen (they are based on the CUVA transliteration system).

Marco, after observing a lot of researchers of this book come and go over the past few years, one of the most reliable indicators of whether a new researcher has anything helpful to add to the discussion of the text, is whether or not they understand the statistical drift in the Currier languages, and what it implies. If a new idea doesn't take this well-documented phenomenon into account, or is incompatible with it, it's probably not an idea with much merit.

I think the most parsimonious conclusion that can be drawn from all the statistical properties of the VMS text which aren't like typical written works, is that the writing system as a whole was ad hoc, unique to the VMS's writer. My grasp of the statistical anomalies of the VMS is far from expert, but enough to make me rule out any notion that this writing system was ever used by many people or transmitted across generations. What I see in the early pages (Herbal A) is a new writing system that doesn't have all of its bugs worked out yet. As the author went on, he found his system problematic for his purposes, and made subtle changes to make it a better fit. This still leaves room for the text to be meaningful or meaningless, linguistic or non-linguistic notation.

(09-02-2020, 09:58 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
Quote:But then would a medieval scribe have the patience and/or the motivation to generate it?
My answer to your question is that it is the most effective way to generate some dummy text. Imagine that someone in medieval times wanted to write a mysterious book. His goal is to write a book nobody is able to read. What is the best way to generate manually large amounts of dummy text?

I am not arguing that self-citation isn't the most efficient way to create a hoax text. My argument was that, with my analysis results indicating that the algorithm was probably more complex than this self-citation method, the author must have had a lot of patience/motivation to create a hoax text.

(10-02-2020, 05:12 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.
(09-02-2020, 09:58 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
Quote:But then would a medieval scribe have the patience and/or the motivation to generate it?
My answer to your question is that it is the most effective way to generate some dummy text. Imagine that someone in medieval times wanted to write a mysterious book. His goal is to write a book nobody is able to read. What is the best way to generate manually large amounts of dummy text?

I am not arguing that self-citation isn't the most efficient way to create a hoax text. My argument was that, with my analysis results indicating that the algorithm was probably more complex than this self-citation method, the author must have had a lot of patience/motivation to create a hoax text.

Keep in mind: "We don't argue that the text was created by a computer program and we also don't argue that our program is able to simulate the complexity of human behavior" (Timm & Schinner 2019, p. 15). We have kept the algorithm as simple as possible. Our goal was to demonstrate that even our simple implementation "reproduces the intriguing key properties of the original text, including the presence of long-range correlations, the 'binomial-like' word length distribution, and both of Zipf’s laws" (Timm & Schinner 2019, p. 2). We also say: "Of course, it is possible to pinpoint quantitative differences between the real VMS and the used facsimile text (most likely any facsimile text). An example is the quantitative deviation of the <q>-prefix distribution from the original VMS text" (Timm & Schinner 2019, p. 15).

On the fact that humans, and not an algorithm, created the MS: when Lisa Fagin Davis’ findings on the five hands are published, it would be interesting to see whether one can assign different likings to the different hands, such as: scribe1 prefers these glyphs, scribe2 prefers those methods of substitution, scribe3 likes to cut out things, scribe4 likes to compose words. If one could model “decisions trees” (that the actual scribes would not have followed intentionally) for the different scribes that match the idiosyncrasies of their texts, that would strongly support the theory.

If it was an algorithm, it was executed by humans. If some features turned out to be preferences of the different scribes, I don't think that would tell us much about the writing/ encoding / generating system they used.

(07-03-2020, 10:15 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.If it was an algorithm, it was executed by humans. If some features turned out to be preferences of the different scribes, I don't think that would tell us much about the writing/ encoding / generating system they used.

Maybe the conception of an "algorithm" might be misleading. One could think rather of a set of rules for producing text (or, filling a page with symbols) that is applied in different ways by different persons - they don't even have to be aware of that differences, yet an analysis of their respective work might reveal them (and we can come up with an actual algorithm to mimic them). I think this had consequences for the settings we can assume for the production of the manuscript.

(08-03-2020, 12:34 AM)Ben Trovato Wrote: You are not allowed to view links. Register or Login to view.One could think rather of a set of rules for producing text (or, filling a page with symbols) that is applied in different ways by different persons -

That there must have been rules is very likely with five different scribes. These rules had to be transferable from one scribe to another. One would even suspect a strict set of rules, which is contrary to the variations in the text. There was obviously room for variations for the scribes. This could provide information about the nature of the rules. Which flexible systems can be considered for the beginning of the 15th century ?

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Alin_J

Torsten

MarcoP

RenegadeHealer

Alin_J

Torsten

Ben Trovato

MarcoP

Ben Trovato

bi3mw