The Voynich Ninja

Full Version: Discussion of "A possible generating algorithm of the Voynich manuscript"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
I don't know if it can be replaced so easily.
It will depend on the sound. Like, for example,
"f or v," "p, b," "t, d," or "i, j, y," maybe "e as an ä."
But I have to draw the line somewhere.
(28-04-2020, 11:38 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.With other words it is only possible to replace a glyph with a similar one! For instance it is possible to replace [ch] with [sh]. 

This is exactly my point.

There is nothing arbitrary about the appearance of similar words. The term 'similar' in this context is misleading, because the glyphs can be called similar because they appear in similar contexts.

r and s look similar but they can only very rarely replace each other.
r and l can, but they don't look all that similar.

The rules are actually quite complicated.

o and a are similar and can replace each other in some contexts.

y looks similar to both, and can replace the other two in even fewer contexts. If near the end of a word, a can have lots of things following it but y cannot.

The rules for changes are manifold and complex, and are observed throughout the MS.

The summary remains that there is no doubt that similar words appear near each other, or on the same page, but there is no evidence that suggests that this is the result of an intentional process to copy words from previous words while making arbitrary changes.
This is pretty much how I feel.

I think Torsten's papers are full of very good observations that should be read by every person who is serious about studying the VMS.

You can count on two fingers the number of people who are trying to explain not just the composition of a token but how to get from one token to the next. But... I can think of other methods that might result in self-similar text that do not rely on autocopying, so I don't necessarily agree with an autocopying conclusion until those other possibilities are eliminated.
The problem with the auto-copy hypothesis is that, in the end, it does not *really* work.

This is difficult to test, because it is not formulated that precisely. There is the vague notion of 'small changes'. As soon as one makes a more precise definition of what are small changes, it becomes testable.

What one can do is take one page in the MS (and one could repeat it for all pages) and check what is the distribution of Levenshtein differences to try to come up with a definition that could work.

The hypothesis says that every word is created by taking and earlier word, making a small change to it (or no change) and then write that word down.

This is not tested in any of the papers I have seen. Instead there are many statistics. Interesting statistics, no doubt about that. But these statistics do not allow to distinguish between the two options:
- the author deliberately generated words according to the above principle
- the appearance of similar words is a side effect of something else.

So let's do the test. I have taken folio f58r. This has a lot of text in three paragraphs.

For every word, one can make the comparison with all previous words, and find the one with the smallest distance.
Using the most recent ZL transliteration (version 1r, You are not allowed to view links. Register or Login to view. ), and ignoring uncertain spaces (*), this has 344 words. Of these, exactly 25% have a minimum L distances of three or more.
If one does not count the words in the first lines of paragraphs (as per Torsten's latest algorithm), there remain 321 words of which 73 have a minimum L distance of three or more. That is 22.7%

Now one may argue whether an L distance of 3 is 'small' or not. I would argue that it is not. We also have to keep in mind that all changes have to follow complicated rules, and the average word token length is around 5. Some examples of L distance 3 changes on f58r:

olchokal   from  olchear
dShor       from  Sholy
ytalody     from  otaly
airaldy      from  arary
olaraly      from  otaly

There are also numerous words, even further down the page, that do not look like any previous word.
There are 22 words with an L distance of 4 or more.
Line 27 has  Sheetchy
Line 33 has  ShocTHhy
Line 37 has  chkaiinolfcham    (L distance 8)

Now it has been argued that the L distance should be computed over glyphs, not over Eva, but in almost all papers by Torsten it has been computed using Eva, and this actually makes no difference in the majority of cases.

ch and sh differ only by one  (ch and Sh)
s and sh differ only by one (s and Sh)

Note (*): the choice to ignore uncertain spaces is unfavourable to Torsten's hypothesis. Treating these also as word spaces will have a minor impact on the result, but it simply turned out that this was the case for the file I used.
Having looked at this in some detail, the impact is only small, there will still be a large fraction of words with a significant L difference.
(01-05-2020, 07:07 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is nothing arbitrary about the appearance of similar words. The term 'similar' in this context is misleading, because the glyphs can be called similar because they appear in similar contexts.

The changes are not as simple as you suggest and they are for sure not arbitrary. For instance we write that "the shape of a glyph must be compatible with the shape of the previous one and is also influenced by its position within a word or a line" (You are not allowed to view links. Register or Login to view.). This also implies that the changes are context dependent (see also You are not allowed to view links. Register or Login to view.).


(01-05-2020, 07:07 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The summary remains that there is no doubt that similar words appear near each other, or on the same page, but there is no evidence that suggests that this is the result of an intentional process to copy words from previous words while making arbitrary changes.

This is the evidence we present in our paper: The self-citation method "is executable without additional tools even by a medieval scribe and reproduce the statistical key properties of the Voynich manuscript; in particular, both of Zipf’s laws are fulfilled" (see Timm & Schinner 2020, p. 1). This happens since the self-citation method is a recursive process. As a result of a recursive process also long-range correlations are expected. The self-citation method is not only able to reproduce the statistical key properties of the Voynich manuscript but also to explain the long-range correlations that have been uncovered and discussed by several researchers throughout the last decade (see Timm & Schinner 2020, p. 6).

One of the counter arguments against my paper from 2014 was that as a result of the self-citation method each word should be similar to at least one other word. The network of similar words demonstrates that this is indeed the case (see Timm & Schinner 2020, p. 6). With other words the existence of the network of similar words was predicted by the self-citation method. This outcome also illustrates that well reasoned criticism can be helpful to validate and to improve a theory.


(02-05-2020, 08:10 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The hypothesis says that every word is created by taking and earlier word, making a small change to it (or no change) and then write that word down.

The hypotheses says "replacing one or more glyphs with similar ones" (Timm & Schinner 2020, p. 9). There is nothing said about small changes.

I have already rejected your 'small changes' argument You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view..


(02-05-2020, 08:10 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This is not tested in any of the papers I have seen.

See section You are not allowed to view links. Register or Login to view. "Graphs for individual pages" on "You are not allowed to view links. Register or Login to view.".  There you will find the network graph for folio You are not allowed to view links. Register or Login to view. and the gephi project for You are not allowed to view links. Register or Login to view.. It is possible to use gephi to filter the core network and all word types connected to each other. 

The core network with ED=1 for page You are not allowed to view links. Register or Login to view. contains 133 out of 284 words (=47 %). They represent 205 token out of 367 tokens (=56 %).
187 out of 284 word types are connected (ED=1) to at least one other word type (=66 %). They represent 269 token out of 367 tokens (=73 %).


(02-05-2020, 08:10 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There are also numerous words, even further down the page, that do not look like any previous word.
There are 22 words with an L distance of 4 or more.
Line 27 has  Sheetchy
Line 33 has  ShocTHhy
Line 37 has  chkaiinolfcham    (L distance 8)

This words doesn't exist on folio f58r. You probably mean [sheeetchy], [schocthhy] and the sequence [chkaiin olfcham] (see You are not allowed to view links. Register or Login to view.).

[sheeetchy] and [schocthhy] occur only once in the VMS. There exist a few similar words on folio f58r. The source word for [schocthhy] is probably [sheeetchy] and the source word for [sheeetchy] is probably [cheekey] in line 25. The words probably go back to word [shopchy] in line 1 (see You are not allowed to view links. Register or Login to view.).

Line 1    [shopchy]
Line 10  [chepchey]   sh->ch, o->e, ch->che                                  ED=3
Line 25  [cheekey]     che->chee, p->k, che->e                              ED=3
Line 27  [sheeetchy]  ch->sh, ee->eee, e->ch                                ED=3
Line 32a  [schocthhy] prefix [s] added, sh->ch, e->o, etchy->cthhy  ED=4

The source word for [chkaiin] is probably a word like [skain] or [chaiin]. The word [olfcham] probably combines two source words like [ol] and [chal] or [ol] and [cham] by using rule 3 "Combine two source words to create a new word" (see You are not allowed to view links. Register or Login to view.).

Your example demonstrates that changing more than two glyphs is not uncommon and that it would be a mistake to expect only 'small changes'.
I'll play lawnchair academic for a few minutes and tell you how I'd go about testing the autocopy hypothesis. I'd put an ad on college campuses pretending to be a props and costumes designer for a movie set in medieval Europe or a fantasy world like it, and looking for arts students who wanted to make a bit of side money helping create a realistic-looking mysterious codex shown in close-up. An exclusion criterion would be any familiarity with the VMs. Teams of five students would be assembled and given a replica of the VMs with all the images but no text, just faintly indicated boxes where text should go. They'd get taught the glyph set and how to write each VMs glyph with a quill pen, given some examples of ways to combine these to make vords, and given a simple explanation and demonstration of how to use the autocopy method to generate lines of vords. They'd then take turns generating fake text until the codex was full. After collecting the work of five such teams from different colleges who had no knowledge of each other and finished the project while following directions, each codex would be transcribed and parsed statistically, with the same metrics for the original VMs and for the output of Torsten Timm's app. If the VMs differs significantly from these human autocopied specimens on at least one metric that the autocopy hypothesis purports to explain, I'd deem the autocopy hypothesis falsified.

Is there any historical precedent for a large amount of pseudo-text generated by analog methods that would qualifiy as variations of the autocopy method? If yes, how do these pseudo-texts break down statistically? If no, why not? Torsten Timm has consistently said that his autocopy method is one of the few practical pre-modern ways to generate large amounts of gibberish. At the beginning of his first paper he also hints (quite rightly) at some plausible motivations for a medieval person to pull such a scam. If both the method and the motivation are rather common — or at least intuitive — then there should be historical examples of such a thing happening, even if only fragments, or none, of the pseudo-text itself survives. Such a find would bolster the autocopy hypothesis immensely.

I would be very interested to hear what an expert on the history of asemic writing would have to say about the VMs. Also any of the people who design fake languages for fantasy worlds for a living. I would fully expect that both would conclude the VMs was meaningless gibberish; when you hammer for a living, everything looks like a nail. The value would be in the small details they noticed as being similar, or dissimilar, to the material they typically work with or create.
Klingonisch  Wink
(02-05-2020, 10:40 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The hypotheses says "replacing one or more glyphs with similar ones" (Timm & Schinner 2020, p. 9). There is nothing said about small changes.


Torsten, you are quibbling about words, and you are contradicting yourself.

You write:

Quote:Line 25  [cheekey]     che->chee, p->k, che->e                              ED=3


This is not 'replacing glyphs with similar ones'. It is applying small changes (L distance 1) as you have been writing for years.

Whether certain words exist on any page is a question of the accuracy of any transliteration. No two are the same. The differences may be errors in either one. I know only too well that there are many in my own, but there are also many in the Takeshi transliteration.
As I wrote in an earlier post of today, the similarity between the two is 97.5% at the glyph level. The similarity in identifying word spaces is much lower.

I am sure that my figure of 22.7 of all changes requiring 3 or more 'steps' is correct, and for me this invalidates the hyptothesis. Others may judge differently.

Renegadehealer wrote:

Quote:Torsten Timm has consistently said that his autocopy method is one of the few practical pre-modern ways to generate large amounts of gibberish.

Are you sure it is practical? Have you looked at the details of the algorithm inside the app?
This algorithm is what the medieval scribe would have done.
I have not looked into it, but I understand that Nick Pelling has, and he commented that it has quite a number of very specific choices.

In my opinion this concept is modern and would not fit a medieval mind. The best a medieval person could do would be to throw a die, and this is making the process slow.
(02-05-2020, 10:40 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.This words doesn't exist on folio f58r. You probably mean [sheeetchy], [schocthhy]

Indeed, these were typos in my post. I should have used copy/paste Rolleyes
(03-05-2020, 08:30 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Are you sure it is practical? Have you looked at the details of the algorithm inside the app?
This algorithm is what the medieval scribe would have done.
I have not looked into it, but I understand that Nick Pelling has, and he commented that it has quite a number of very specific choices.

In my opinion this concept is modern and would not fit a medieval mind. The best a medieval person could do would be to throw a die, and this is making the process slow.

Fair point, Rene. I did read both of TT's papers, but I must admit I glossed over the appendices with the details of the algorithm. I have no experience coding (unless you count a little bit of BASIC on an Apple ][e back in 1988  Big Grin ), but flowcharts and sets of logical steps aren't completely over my head. I'll have another look at that algorithm, and maybe even try doing for a page or two it if my kids and patients aren't bugging me.

Torsten, you have a general point that I very much agree with: Generating pseudo-language is a known and predictable human behavior. There are some rather entertaining (and often borderline offensive) examples on YouTube and Reddit of strangers being asked, "Give me your best imitation of someone speaking [foreign language you don't understand]". Adriano Celentano's song You are not allowed to view links. Register or Login to view. is probably the most famous example of pseudo-English — listening to this song as a native English speaker is a bizarre experience: I should understand this but I don't.

Koen had a thread where he invited people to compose pages of pseudo-text. I didn't meet the target text length before I ran out of time to finish it, but I must admit, it wasn't that hard to do. It wasn't ridiculously easy either; I had to concentrate to make sure that my text had just the right amount of repetitiveness. Too many or too few unique words, and my text wouldn't look like real language. I did find myself automatically looking back at previous lines I'd composed, looking for old words to repeat, perhaps with slight variations in the spelling. Was this enough to qualify my method as true "autocopying"? I'm attaching the text file to this post for your statistical enjoyment, since I never posted it then.

I'm going to do some research to see if any neuroscientists have approached the question, "What behavioral patterns, and neurological pathways, are involved in the spontaneous production of fake language?" And how are those behaviors evidenced in the properties of the gibberish produced? By answering these questions, it should be possible to design a statistical analysis tool that can tell the difference between real (but not understood) and fake human language, with a fairly good degree of confidence. If textual analysis can tell with a high degree of confidence whether the same author wrote two different pages (You are not allowed to view links. Register or Login to view., for example), I can imagine looking for the patterns seen in fake language, but not typically in real language, would be very doable. I sardonically predict that any such algorithm run on the VMs will turn out wholly inconclusive and ambiguous results — "50% confidence that the VMs is gibberish" or something unhelpful like that.

The problem I see is that different people's idiosyncrasies and life experiences will cause them to favor different micro-patterns in the fake language they compose. One would have to analyze many people's long pseudo-text compositions in order to tease out what patterns were common to nearly all human-composed fake language. Because it seems from what Rene is saying, that there's an apples-to-oranges equivocation being done in the debate about the autocopy theory, at least as currently formulated. The algorithm described is highly specific and nuanced, the way computer programs are and need to be. But the computer program is supposed to be a model for an analog process that occurred inside the head of a medieval human being, probably somewhat unconsciously. The debate inevitably goes something like this:
  • Attacker: "A medieval person could not have formulated or followed those directions."
  • Defender: "Those directions are just a loose guide. The scribe had a lot of freedom to interpret the directions, and didn't necessarily interpret them exactly that way."
Fair enough. But what is the sine qua non of the autocopy process, then? What general behaviors must be present, in some way, for the process to count as true autocopying? And then, what evidence do those core behaviors leave in the pseudo-text outputted, that distinguishes it from pseudo-text (or real text) that was generated a wholly different way? Autocopying defined too broadly becomes meaningless, no pun intended. Autocopying loses utility as a concept — becoming dangerously close to a truism — when it can be stretched to cover pretty much any not-understood pattern that looks like language and contains internal repeating patterns.

Rene, I'm also inclined to think that if the VMs is meaningless and was stochastically generated, its writer(s) used dice, spinning volvelles, or drawing lots to generate the vords. Possibly in combination with autocopying, since analog random event generation is slow. But how much of each? For what parts? And how would we know?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25