The Voynich Ninja
[Conference] Voynich conference - Some questions - Printable Version

+- The Voynich Ninja (
+-- Forum: Voynich Research (
+--- Forum: News (
+--- Thread: [Conference] Voynich conference - Some questions (/thread-3894.html)

Voynich conference - Some questions - Torsten - 11-11-2022

Since I can not attend at the Voynich conference I publish my questions to some of the papers in this thread. I will also send the questions via email to the authors. I suggest that this thread is not used to discuss the questions at least until after the Voynich conference is over.

Claire Bowern and Daniel Gaskell - Enciphered after all? Word-level text metrics are compatible with some types of encipherment.

In the paper from 2021 Claire Bowern explains the differences in word frequencies in the VMS by a) two different methods of encoding at least one natural language, b) different scribes, and c) different topics.
Did you suggest that these three different interpretations are all true or did you suggest that they contradict each other?

In the paper of Sterneck et al. you warn that "topic modeling relies on word frequencies and expects consistency across texts" [Sterneck et al. 2021, p. 4]. However the Voynich text isn't consistent across its sections. If we look into the text itself it becomes evident that "no obvious rule can be deduced which words form the top-frequency tokens at a specific location, since a word type dominating one page might be rare or missing on the next one." [Timm & Schinner 2019, p. 3].
Why do you assume that different word frequencies indicate topics if noticeable frequency changes even occur between folios?

Tokens containing the sequence 'ed' are common in Currier B and an exception in Currier A.
If you assume that the word frequency changes are caused by different topics, how do you explain the differences between folios sharing the same illustrations like Herbal A and Herbal B?

In Timm & Schinner 2019 the Voynich text is analyzed beyond the paragraph level. The paper comes to the conclusion that by reordering the sections with respect to the frequency of token <chedy> replaces the seemingly irregular mixture of two separate languages by the gradual evolution of a single system from "state A" to "state B".
Why this alternative explanation is not addressed?

You argue that some codes could increase the predictability of word formations.
Have you tested if some of the codes would result in text with statistical properties similar to the Voynich text?

In your paper from 2021 you argue that full reduplication is still in the realm of plausibility for for natural language text. The paper states that number of full word repeats goes up to 4.8 % for natural languages.
Given that you now describe the text as extreme predictable (whatever that means), did you still stand behind your statement about full word repeats in natural languages?

Jürgen Hermes - Polygrahia III: The cipher that pretends to be an artificial language.

The Voynich text is changing from page to page, since a token dominating one page might be rare or missing on the next one, they even depend on there postion within a page or line.
How did you explain that the words depend on the page if as you say the words were randomly selected from a code book?

How do you explain that words containing /ed/ like <chedy> are far more common in Currier A than in Currier B?

Kevin Farrugia, Colin Layfield and Lonneke van der Plas: Demystififying the scribes behind the Voynich Mansucript using Computional Linguistic Techniques.

An alternative model is a gradual evolution of a single system from "state A" to "state B", namely be reordering the sections with respect to the frequency of the word <chedy> [see Timm & Schinner 2019].
Why such an alternative model was not used for cross-checking the results?

Andrew Caruana, Colin Layfield and John Abela - An Analysis of the Relationship between Words within the Voynich Manuscript.

You mention the fact that skewed word pairs exists in the Voynich text.
How many of the skewed word pairs result in an existing word, e.g. like the words <ol> <chedy> would result in <olchedy>?
How many of the skewed word pairs consists of similar words like <chol>/<shol>, <daiin>/<dain> or <chedy>/<shedy>.

Massimiliano Zattera - A new transliteration alphabet brings new evidence of word structure and multiple "languages" in the Voynich manuscript.

Only a very limited number of letters occur with each other in certain positions of a 'word. For instance EVA-q is followed in 97.5 % of the cases by EVA-o, and EVA-n occurs in 97.4 % of the cases after EVA-i. A common idea is therefore to interpret glyph sequences like /qo/ and /iin/ as ligatures or letters. But even then the resulting glyph set is very predictable. For instance a group of EVA-i occurs in 94 % of the cases after EVA-a and a sequence /qo/ is followed in 84 % of the cases by a gallow glyph. 
Didn't this behavior suggest that these restrictions are a feature of the Voynich text rather than a question of the transliteration alphabet?

Lisa Fagin Davis - Voynich Paleography

At the BSA Annual Meeting in 2020 you used the fact that "The very common character combination qo is almost completely absent in the zodiac pages and the rosettes page, but appears everywhere else" from René Zandbergens website as second method to cross check your identification of scribe four.
Why did you not mention this fact in your paper written later in 2020? Why did you instead announce to ask Prof. Claire Bowern to search for a pattern, which you already knew?

In your publications from 2020 you claim that you used the software Archetype to identify the five different scribes.
However, your screenshot for Archetype shows only 44 (43 +1) as the number of pages uploaded into Archetype. How did you identify the scribes for the other 180 pages without uploading them into Archetype?

In your paper from 2020 you argue that EVA-k is sometimes written in one stroke and sometimes in two strokes and that only the scribes 2 and 4 wrote EVA-k with two strokes.
How do you explain the observation of instances for EVA-k where an overlapping crossbar or an gap indicates that EVA-k was written in two strokes for your scribes 1, 3, and 5?

By applying Latin paleography to the Voynich manuscript it is assumed that scribes with some experience in writing that script. However, the text in the VMS is the only known example of its kind and represents an unique writing system. It is therefore possible that the writing system was only used to write the text we see in the Voynich manuscript.
Isn't it therefore a possibility hat the scribes were unexperienced in writing Voynichese at the start? Shouldn't we check for a scribe writing slowly and carefully at the start and is becoming more fluent during writing?

RE: Voynich conference - Some questions - cbowern - 18-11-2022

A quick note that these questions (at least the ones directed and Daniel Gaskell and me) don't actually have anything to do with the papers we are presenting at the conference, which investigate the types of character distributions that occur in gibberish and in different patterns of (more and less plausible) enciphered natural language. That is, our papers are agnostic about what Voynichese is (hence "Enciphered after all" vs "Gibberish after all" as titles). Our aim is to establish comparanda, not to advance a particular a priori hypothesis. Too much Voynich study consists of forming an opinion and picking data to support it. More generally, we're looking at the weight of evidence for/against contrasting hypotheses.

Hoping the questions for all papers during the conference are on point and in the spirit of collective inquiry

RE: Voynich conference - Some questions - Torsten - 18-11-2022

My question are about the character distributions in the Voynich manuscript like the fact that a token dominating one page might be rare or missing on the next one, the statistic differences between Herbal in Currier A and Herbal in Currier B, and the differences between Currier A and Currier B in general. What do you want to compare the character distributions with if not with character distributions typical for Voynich text? Even if we disagree over a hypothesis it should be possible to discuss the types of character distributions that occur in the Voynich text.

RE: Voynich conference - Some questions - hermesj - 24-11-2022

Dear Torsten,
I think I chose a misleading term with "randomly selected". After all, a true random process was also difficult to realise in the Middle Ages. What I mean is that the writer can choose any cipher from a number of different words to replace a single letter. This human choice is difficult to simulate (Even more difficult than the autocopying process, which you managed well of course).
Did the codebook consist only of a series of loose pages? Were there only a few of them on the table at any time of the writing process? Were they (partly) replaced for every new page / folio / quire? Were some used over and over again (in the same order), etc.? From these possible selection processes, I think one can also explain the differences between Currier A and B, by simply selecting other (in the Trithemian system columns of) cipher words. 
It is still important for me to say that I am not claiming that the methodology I put into play is actually responsible for the Voynich Manuscript. My point is only to discuss whether it is merely improbable or impossible.

RE: Voynich conference - Some questions - Torsten - 25-11-2022

Dear Jürgen,

 thank you for your response. Essentially, the "Polygraphia" hypothesis is based on the assumption that each token of the Voynich manuscript (VMS) is encrypting a single letter of an underlying clear text. To decipher the text the Voynichese words then must correctly be arranged in a table with 26 rows (corresponding to the 26 letters of the Latin alphabet) and an unknown number of columns. This means the encryption is homographic, i.e., a particular cipher text word always decrypts to the same letter (but not vice versa). This way the encoded text depends on the plain text as well as on the code book. (Side note: Wouldn't this make it already hard to explain the observed patterns in line positioning (line as a functional unit)? Isn't the idea behind the cipher to prevent tells like frequently used words by using all the words in each row equally?).

To illustrate what I mean with my statement that the text is changing from page the following list contains the top frequency words for the pages f103r-f105v:

f103r qokeey  (26), shedy (18),   shey (15),  chedy (11), qokeedy (10) ... chey (7) daiin (2)
f103v shedy   (15), shey  (14),     ol (11), qokeey (11),    chey (10) ... daiin (8) chedy (6)
f104r qokaiin  (9), ar    (8),    okar  (7),   aiin  (7),     chol (6) ... daiin (5), chedy (5), chey (4), qokeey (1), shedy (1)
f104v chedy   (10), aiin  (10),  cheey  (7),     ol  (6),   qokeey (5) ... daiin (4), chey (4), shedy (1)
f105r chedy    (8), al     (7),     ar  (6),     or  (5),    daiin (5) ... chey  (3), qokeey (1), shedy (1)
f105v aiin    (14), daiin (13), otaiin (11),     ar  (7),       al (6) ... chedy (3), chey (1)

The top frequency word for the pages You are not allowed to view links. Register or Login to view. are <qokeey> and <shedy>. Both types are rarely used on f104r, f104v, You are not allowed to view links. Register or Login to view. and missing on f105v. However even if the top frequency words change from page to page all six pages have some words in common. For instance <daiin> as the most frequent word of the VMS occurs on every of the 6 pages but only on You are not allowed to view links. Register or Login to view. it is under the top 3 words. Another word occurring on each of these 6 pages is the <chey>.

If the code tables were indeed replaced for every page shouldn't we expect different set of code words along these changes and not only frequency changes? Maybe we could explain this observation by assuming that the code table was only partly replaced or that the words were chosen differently from the same code tablet. But how such a code tablet could result in the observed binomial word length distribution for types as well as for tokens? For the sake of the argument lets assume that the word length distribution within each code row was indeed binomial. Why words with average word length are used more frequently in the VMS? The scribe could choose any word and to reduce his effort I would expect that he maybe would prefer shorter words but not words of average length.

To the observation that words containing /ed/ are more common in Currier A than in Currier B. The following table list the number of tokens and types containing the glyph combination "ed".

             "ed"-tokens       total  most frequent         "ed"-types      total    most frequent
            count    in %      tokens    "ed"-types        count   in %     types        type
Herbal A       12     0.2%      8,087        1                12   0.5%      2499    403  daiin
Pharma A       17     0.7%      2,529        3 cheedy         15   1.4%      1113     99  daiin
Astro          28     1.3%      2,136        1                28   4.5%       620     12  daiin
Cosmo         257     9.5%      2,691       24 chedy         121  10.0%      1213     56   aiin
Herbal B      528    16.3%      3,233       62 chedy         182  14.4%      1263     73     or
Recipes B    2073    19.4%     10,673      190 chedy         517  16.7%      3093    193   aiin
Bio B        1925    27.8%      6,911      247 shedy         323  20.9%      1546    247  shedy

The table shows that also in Currier A word types containing "ed" exists. There are only a few "ed"-types and they are all rarely used. In Currier B on the other side words types containing "ed" are not only common, some of them are also frequently used. The type <chedy> only occurs twice in Currier A but represents the third most frequent word type in the VMS. This way the frequency counts confirm the general principle: high-frequency tokens also tend to have high numbers of similar words [see Timm & Schinner 2019, p. 6]. This observation indicates that a word like <chedy> was not introduced and then frequently used. Instead as <chedy> is used more and more frequently, this also increases the frequency of similar words, like <shedy> or <qokeedy>. But shouldn't we expect from the "Polygraphia" hypothesis that code words only depend on the encoded text and the code book and should therefore occur independently of each other?

This behavior happens since the VMS text is self similar. There are differences between Currier A and B. In the same way there are differences from quire to quire, from page to page, but also from paragraph to paragraph. And within a paragraph there are the paragraph initial gallow glyphs "p" and "f", and line initial (e.g. "d" and "s") and line final glyphs ("m"). This also repeats on word level with glyphs preferred in word initial and final position ("q" vs. "y").