The Voynich Ninja

Full Version: Discussion of "A possible generating algorithm of the Voynich manuscript"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(29-08-2019, 04:49 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.There is no test that can identify meaninglessness in general. Trying an infinite number of known and unknown codes, ciphers and steganography methods with all possible combinations and variants, keys, parameters, on an infinite number of meaningful texts is impossible.

An infinite number of methods usually would not apply to any given context - unless we once receive something from the infinity of the outer world.
(29-08-2019, 07:18 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Torsten: re-reading your 2019 article and looking at your source code, you term certain high-frequency pairs of glyphs (such as ‘ol’ and ‘dy’) as ‘ligatures’, and yet treat them in the same way as ‘o’ and ‘l’ (i.e. individual glyphs).

* Why was it necessary for you to treat these ‘ligatures’ in a special way in your paper?


The answer to your first question is given in a blogpost at You are not allowed to view links. Register or Login to view.:
In this blogpost you have pointed out that it is not easy to transcribe the VMS. There are different transcriptions alphabets available and none of them is perfect. One transcription alphabet is EVA. Since it is stroke based it is necessary to parse EVA. You also argue that EVA isn’t final and that it would be an error to treat EVA as an exact representation of voynichese (see You are not allowed to view links. Register or Login to view.). 

As an example for parsing a word in EVA you suggest that 'qokeedy' should be parsed into [qo][k][ee][dy] (see You are not allowed to view links. Register or Login to view.). For parsing EVA you obviously define tokens like 'qo', 'ee', or 'dy'. Nothing else happens in the text generator. The program also defines tokens and it even parses EVA 'qokeedy' in the same way as you do. The only difference is that you explain rare words as copying errors whereas I explain the whole text as generated by modifying copied words.

The answer to your question is therefore, that a computer readable representation of the VMS text was necessary and we have chosen EVA for this purpose. In our eyes EVA is the best representation available today. But since EVA is stroke based it is necessary to parse this strokes into ligatures or tokens.


(29-08-2019, 07:18 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.They are not consistently physically joined on the page, so are not actual ‘ligatures’.


Remains the question if stroke groups should be called ligatures. We have already used the term token to distinguish between word-types and word-tokens. Therefore we used the word 'ligature' here. Some of the stroke groups look like ligatures. This is for instance the case for EVA 'ch', 'sh', 'ckh', 'cth', 'cph', and 'cfh'. But since 'ch' consists of two 'e'-strokes connected by a dash it would also be possible to interpret 'ch' as two 'e'-characters. This way it would also be possible to interpret groups like 'chh', 'ckhh' or 'cckhh'. The goal behind EVA was to represent such groups. I therefore would agree that a definition for the term 'ligature' would be helpful.

(29-08-2019, 07:18 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.* Why would the original person doing the auto-copying have treated them in a special way?


Your question suggests that it is possible to distinguish between normal and special behavior. I would argue that there is no such difference. For each common word there is at least another one differing from it by only a single quill stroke. For example, in addition to the word 'daiin' also the words 'dain' and 'daiiin' are present in the text. This means the script to write the VMS uses similarly shaped glyphs since the shape of a glyph is important (see Timm 2015, p. 36).

(29-08-2019, 07:18 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.* Why are there so many manually added substitution cases in your code on github?


It is possible to generate other words, which exist in the VMS, by replacing similar shaped glyphs (see Timm 2015, p. 5). This is in my eyes the main idea behind the VMS and it is the main idea for the text generation method used by the self-citation text generator. The substitution cases define which glyphs or tokens are similar to each other. There is no other way to make the information about similar shaped glyphs available. Even if someone would write a OCR program for the VMS this program would use some definition of how to handle shape differences.

Your other two questions are only used to transport your personal opinion. Even if we disagree about arguments it should be possible to discuss them based on facts.
Ok, I am now attempting to make my point a bit clearer to "the brick wall of it's meaningless".

A thought experiment as Einstein explained, for example, You are not allowed to view links. Register or Login to view.. You should definitely first read at least the beginning.

"Thought experiments are usually rhetorical. No particular answer can or should be found. The purpose is to encourage speculation, logical thinking and to change paradigms."

My concern is that by the dismissal of the VMS text being meaningful, you remove all possibilities that exist for our text.
By doing so, and focusing on the text being meaningless, we will *not* find anything.  In fact you will find this:  nothing.

You can do good research by starting with an open mind, keeping all options open and report your dry findings.
Then, when you are at a point of no return, and there is really no other conclusion possible, you can start drawing conclusions.
But I really do not see that here, there is research and there are conclusions. But those are a whole gap apart from each other.

Furthermore, if you have "A possible generating algorithm of the Voynich manuscript" then that shows you succeeded in creation of some sort of system, 
but it does not justify the conclusion that the text is meaningless, and therefore that conclusion is false based on that system.
And perhaps that conclusion even makes your system in that context of the conclusion false!
Torsten: my proposed explanation for tightly bound pairs of letters begins with a specific, historically attested cipher mechanism - verbose cipher. This is a starting point that requires pairs of shapes to be locked together.

Your autocopying hypothesis, however, has no requirement for letters to be locked together. You have formulated a secondary explanation for this based on stroke harmony adjacency rules: which strokes can follow what other strokes determines, in your secondary system, what can be substituted for what.

Except... these secondary rules only work in those places where you want them to work - i.e. in transitions/replacements you think are 'justified' (because they appear in Voynichese), but not in places where you don't want them to work (because these don't / rarely appear in Voynichese).

An example: if o can happily be followed by r 2723 times, what is stopping other shapes going in between them?

or 2723
oar 85
olr 14
odr 5
oer 1
okr 1

Similarly for o and l:

ol 5488
odl 20
ool 19
okl 15
otl 9
orl 7

Similarly for d and y:

dy 6696
dchy 61
dey 15
dshy 12
dly 7
dky 5
dpy 4
doy 3
day 2

Only some of these glyph insertions are 'prohibited' by the stroke harmony adjacency rules: at the same time, to my eyes, many seem to be completely possible, yet have very low instance counts in the actual manuscript.

My conclusion is therefore that your (well, Elias Schwertfeger's) stroke harmony rules give a very incomplete account of the letter-to-letter statistics we actually see, and hence don't offer a workable explanation in and of themselves why certain glyph pairs are so tightly bound that they resist insertions so vigorously. For is not the idea of autocopying based around shape insertions and removals?
(30-08-2019, 02:15 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Torsten: my proposed explanation for tightly bound pairs of letters begins with a specific, historically attested cipher mechanism - verbose cipher. This is a starting point that requires pairs of shapes to be locked together.

Your autocopying hypothesis, however, has no requirement for letters to be locked together. ..

This is the main difference I have with Torsten's description of the text and my perception of the text, and I'm glad you brought it up Nick, but I still like what Torsten is doing, regardless of whether I agree with an autocopying hypothesis.

I have a different idea for why proximate tokens are so similar so I don't think there's only one possible explanation, I think there are a couple (I can't think of many, but I can think of a couple), but it takes more than a quick forum post to explain them.
I suspect that deletions are also a tad problematic. Why would a hypothetical autocopyist often replace word-final -dy with word-final -y (which surely to anyone's eyes amounts to a deletion) but almost never with word-final -d? There's no obvious secondary stroke harmony restriction here, so I guess this constraint must be an additional (and arbitrary) rule that limits what a hypothetical autocopyist can do.
Additions, too: for even though a word-final -y can be replaced with a word-final -dy in the autocopying schema, presumably there's some other additional rule that prohibits word-initial d- from being replaced by word-iniitial dy- ?
The github source has yet other special cases for page-initial gallows, line-initial letters, and line-final letters (e.g. -m). So from my perspective, the autocopying hypothesis doesn't travel particularly light: it needs a lot of assumptive baggage to help it shape its output closer to what we actually see.
(30-08-2019, 10:14 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.This is the main difference I have with Torsten's description of the text and my perception of the text, and I'm glad you brought it up Nick, but I still like what Torsten is doing, regardless of whether I agree with an autocopying hypothesis.

I also kind of like what Torsten is doing, but at the same time I can't help but suspect that he has significantly oversold his autocopying hypothesis. In my opinion, the only reason that his generator produces even moderately Voynich-like output is that he has had to add in a large number of special cases and conditioning constraints, for which he has no obvious extrinsic justification beyond 'they seem to make things work better' (e.g. his 'ligatures' seem to me to fall exactly in this category).
(30-08-2019, 02:15 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Torsten: my proposed explanation for tightly bound pairs of letters begins with a specific, historically attested cipher mechanism - verbose cipher. This is a starting point that requires pairs of shapes to be locked together.


Nick: What do you mean with letters? You know very well that EVA is stroke based (see You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.). The purpose of EVA is to represent also words like 'coy', 'dcheocy', 'qokhy', 'kccky', and 'chckhhhy'. By using one symbol for 'ch' it is simply not possible to transliterate such words. That a stroke based alphabet is needed to represent the text already says something about the VMS.

[font=Tahoma, Verdana, Arial, sans-serif]
(30-08-2019, 02:15 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.[/font]Your autocopying hypothesis, however, has no requirement for letters to be locked together. ...


The idea behind the self-citation method is that "the scribe generated the text by copying and varying words from previously written sections" (Timm & Schinner 2019, p. 10). Therefore the rules to modify a word or a sequence of glyphs gets important. Self-citation is a dynamic process. Moreover the process is recursive. A word generated by self-citation is at the same time the result of the text generation method as well as a possible source for generating new words.

What you write about the self-citation method doesn't fit the description we have given in our paper. We clearly say that there is no rule to insert glyphs: "The rules to modify a source word normally don't affect the order of the glyphs" (Timm & Schinner 2019, p. 10).

Sometimes it looks like a glyph was added. But in most cases only an already existing element was duplicated. For instance the "ligature <ch> consists of two <e>-glyphs connected by a dash. Since <e> can appear repeatedly, it is also possible to add an additional <e>-glyph, leading to words like <cheol> and <sheol>" (Timm & Schinner 2019, p. 9).

There is one counter example. The introduction of <chedy> on page You are not allowed to view links. Register or Login to view.. The word before <chedy> is <tchey>. To introduce <chedy> it was only necessary to repeat <chey> and to add a <d> before <y>. But <chedy> fits very well to other words in Currier A. It is is similar to <chey> as well as to <cheody>. Therefore <chedy> was used  more and more frequently and Currier A changed into Currier B (see Timm & Schinner 2019, p. 6). See also the details for <chey> and <chedy> on You are not allowed to view links. Register or Login to view..
Torsten: EVA is a stroke-based transcription alphabet, nothing more. It was designed to help communication between researchers who are exploring different groupings of strokes into tokens, e.g. for the kind of edge cases you cite. It was never designed to be interpreted as a decryptive or a linguistic alphabet. Please stay on topic.

I shall have to re-read your 2019 paper more closely, since my understanding of your autocopying hypothesis is clearly based on the previous iterations you posted in previous years.
(31-08-2019, 08:40 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Torsten: EVA is a stroke-based transcription alphabet, nothing more. It was designed to help communication between researchers who are exploring different groupings of strokes into tokens, e.g. for the kind of edge cases you cite. It was never designed to be interpreted as a decryptive or a linguistic alphabet. Please stay on topic.


This is exactly the point. EVA is a stroke-based transcription alphabet and therefore it is necessary to parse the strokes into tokens. Remains the question how to parse EVA into tokens. 

Following Currier, the glyphs used to write the VMS are mostly based on a curve or on a line as first quill stroke: "We have the fact that you can make up almost any of the other letters out of these two symbols <i> and <e>." (You are not allowed to view links. Register or Login to view.). Moreover, for each common word there is at least another one differing from it by only a single quill stroke. For example, in addition to the word <daiin> also the words <dain> and <daiiin> are present in the text (Timm & Schinner 2019, p. 3).

One criteria for parsing EVA into tokens are repeated <i> and <e> strokes. For instance Currier describes a series with <in>, <iin>, <iiin>, <il>, <iil> etc. Another criteria are connected strokes. Currier describes <ch>, <sh>, <cth>, <ckh>, <cph>, <cfh>. But also stroke sequences like <qo>, <ikh>, <cck> or <ckhhh> belong into this category. A third criteria are groups that are used in the same way as <in>, <iin>, <iiin> and also represent a cycle. One example for such a cycle are the tokens <ol>, <or>, <al>, and <ar>.

This means the parsing of tokens is based on observations and it is possible to describe objective criteria for them. Therefore it should't be a surprise that we both parse <qokeedy> into [qo][k][ee][dy] (see You are not allowed to view links. Register or Login to view.).
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25