The Voynich Ninja

Full Version: Discussion of "A possible generating algorithm of the Voynich manuscript"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(01-09-2019, 11:11 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.This is exactly the point. EVA is a stroke-based transcription alphabet and therefore it is necessary to parse the strokes into tokens. Remains the question how to parse EVA into tokens.

But EVA is glyph-based, not stroke-based. Benched glyphs are interpreted as ligatures because composing glyphs often appear separately. A stroke-based transliteration that would make sense for a study such as yours would be:
EVA-i = \ (1 stroke)
EVA-e = C (1 stroke)
EVA-o = C> or C] or C) (2 strokes)
EVA-a = C\ (2 strokes)
EVA-ch = CT (2 strokes)
EVA-Sh = CT^ (3 strokes)
EVA-cKh = CIPT (4 strokes)
EVA-cTh = CIQT (4 strokes)
etc.

Why would there be a need to tokenize?
Yes, EVA is glyph-based.

The number of strokes for some of the glyphs is debatable. ch might be two strokes or three. It could be long-cee-shape + r-shape or two short-cee-shapes with a bar.

The same goes for the 4 char and the gallows chars. The horizontal stem (the upper stem) might be a separate bar.


If gallows chars were Latin characters, the bar would probably not be counted as a separate bar, it's more like a serif to join one shape to the next or a ligature, but the gallows are probably invented characters (despite their similarity to certain Latin chars) and thus the analogy might not hold for VMS chars, the bar might be its own entity.


Has anyone even done a stroke-based transcript?
(01-09-2019, 04:28 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Has anyone even done a stroke-based transcript?

Philip Neal was working on this but I don't know how far he got.
(01-09-2019, 04:31 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(01-09-2019, 04:28 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Has anyone even done a stroke-based transcript?

Philip Neal was working on this but I don't know how far he got.


Thanks, René. I was curious about whether someone had tackled it. It's not hard to produce a stroke-based font or to set up the code to take an existing transcript and break it into its basic components (this seems to me the easiest way to do it, rather than hand-generating a transcript). The bigger challenge is deciding which parts are considered "separate" from the VMS designer's point of view and then relating that to the text itself.

There's no doubt that there is a curve-line pattern to the glyphs, but whether this is related to meaning, or whether the designer started with common basic shapes like a o i and went from there for simple visual similarity, I don't know. Invented alphabets often have patterns of certain repeated shapes.


And then there is the separate issue of whether the way the glyphs are sequenced is related to shape (something that I've been looking into for a while).
To be precise, EVA is neither a pure stroke transcription nor a pure glyph transcription. *Sigh* It's actually somewhere in the middle, where some component shapes are glyphs and some are strokes.

But even though verbose cipher glyph pairs make sense to my mind, I really struggle to see how such strongly-structured super-common pairs can be explained linguistically. Even though it is so visually tempting to read EVA o and a as vowels (it's why the EVA letters were chosen, after all), the strong pairing is just too mechanical, too unlinguistic to work.

What, then, of Torsten's ligatures? Some, like EVA qo, won't raise many objections. EVA ch and sh were designed in, along with c-gallows-h, so no problem there either. But ar / al / or / ol / am aren't so straightforward: because if you follow these through to their logical conclusion (as I tried to do in Curse), you have to also read o- +g as similar pairs: and from there to y- + gallows. At which point pretty much everything becomes strongly paired.

More generally, the or / ar / ol / al ligature group seems visually inconsistent with the an / ain / aiin / aiiin group. In particular, given that r starts with exactly the same linear downstroke as i, there really should be no stroke harmony adjacency behaviour difference between their usage. But the statistics shout otherwise very loudly. How is it that the hypothetical autocopyist was able to maintain that statistical consistency over so many pages of text, and yet evolve the patterns of usage (from Currier A to B) so subtly?

But even though I can see why it is visually appealing to imagine that EVA -dy can be freely substituted for EVA -dy, it is oddly asymmetric that EVA -dy is almost never substituted for EVA -d . Similarly, if EVA d- starts words with a curve, why do we see EVA dy- so rarely? Even something as apparently obvious as -dy hides a wealth of behaviors that are all linked together.
I'm extremely suspicious of "a" and "o" and have never assumed they are vowels but I've noticed the vast majority of substitution "solutions" do make this assumption (in fact their ideas practically revolve around treating them as vowels).

What is interesting is that almost everyone who offers these substitution solutions tends to treat them as generic vowels (keeping them as vowels, but changing them from a and o to whatever works for their transliteration). I often wonder if this is because they instinctively sense that they don't WORK as substitution for natural languages, but the desire to keep them as vowels causes the solver to mutate them into whatever vowel suits the occasion.


But... if you take every vowel-like shape in the VMS and turn it into a consonant-like shape and vice-versa, you end up with is something that substitution solvers could probably wrestle out a transliteration in much the same way as they do now (different words would arise, of course, but it would probably be the same general result).
(01-09-2019, 08:57 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.What, then, of Torsten's ligatures? Some, like EVA qo, won't raise many objections. EVA ch and sh were designed in, along with c-gallows-h, so no problem there either. But ar / al / or / ol / am aren't so straightforward: because if you follow these through to their logical conclusion (as I tried to do in Curse), you have to also read o- +g as similar pairs: and from there to y- + gallows. At which point pretty much everything becomes strongly paired.


This only demonstrates that it is an error to generalize observations. Instead it is necessary to check each pattern on its own. For doing so it is necessary to look into the details (see You are not allowed to view links. Register or Login to view.) as well as in the statistics (see You are not allowed to view links. Register or Login to view., p. 70). By checking the details it will become obvious that y- + gallows are not used as similar pairs.

[font=Tahoma, Verdana, Arial, sans-serif]
(01-09-2019, 08:57 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.[/font]More generally, the or / ar / ol / al ligature group seems visually inconsistent with the an / ain / aiin / aiiin group. In particular, given that r starts with exactly the same linear downstroke as i, there really should be no stroke harmony adjacency behaviour difference between their usage. But the statistics shout otherwise very loudly. 


That one factor is in place doesn't mean that this is the only factor.

The harmony adjacency behavior is a design preference the scribe had. It is not a rule the scribe could switch on and off. Therefore it is expected that we also see his design preferences in the statistics for <ol>, <or>, <ar>, and <al>.

(01-09-2019, 08:57 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.How is it that the hypothetical autocopyist was able to maintain that statistical consistency over so many pages of text, and yet evolve the patterns of usage (from Currier A to B) so subtly?


The method for generating the text is the same for all pages (see You are not allowed to view links. Register or Login to view., p. 3ff): "when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail" (You are not allowed to view links. Register or Login to view., p. 3).

Concerning the evolution from Currier A to Currier B please read the explanation in You are not allowed to view links. Register or Login to view. (p. 6) and also the answers I have given You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view..

(01-09-2019, 08:57 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.But even though I can see why it is visually appealing to imagine that EVA -dy can be freely substituted for EVA -dy, it is oddly asymmetric that EVA -dy is almost never substituted for EVA -d.


Even if you repeat your oddly asymmetric argument it is still wrong (see You are not allowed to view links. Register or Login to view.). Many things can happen but not all will happen. For example each attending team can win a championship but only one team does win.

Even if there are multiple ways to change a source word it is only possible to write one word at a time. This means we only see the variants the writer has written and not the variants he could have written. 

Moreover, glyphs like <d> are rarely used in word final position. Instead glyphs with a tail like <y> or <n> are preferred in this position.


(01-09-2019, 08:57 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Similarly, if EVA d- starts words with a curve, why do we see EVA dy- so rarely? Even something as apparently obvious as -dy hides a wealth of behaviors that are all linked together.


Currier wrote in 1976 "only a very limited number of letters occur with each other in certain positions of a `word`" (You are not allowed to view links. Register or Login to view.). With other words "the shape of a glyph must be compatible with the shape of the previous one, and is also influenced by its position within a word or a line" (You are not allowed to view links. Register or Login to view., p. 10).
Sorry, but this is all, again, very old stuff, and the discussing is everlasting repeating thing, almost like the Voynich text itself...

You are not allowed to view links. Register or Login to view.
Torsten, would it be possible for you to generate a 35,000 word text with your method and attach it here? I would like it for statistical analysis.
(19-09-2019, 11:30 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Torsten, would it be possible for you to generate a 35,000 word text with your method and attach it here? I would like it for statistical analysis.
I have a file here with 34980 words (3879 lines in the conf.properties of Timm's Generator.)
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25