Voynich text generator - Printable Version

Voynich text generator - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Voynich text generator (/thread-422.html)

Pages: 1 2 3 4 5

RE: Voynich text generator - Sam G - 27-02-2016

(27-02-2016, 12:12 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.@Sam: The VMS is not so homogeneous as you suggest. With pages in Currier A in mind the pages in Currier B would also look strange.

But your results don't match either group. For instance, in B-language texts we tend to see lots of the words chey-cheey-chedy, in A-language we find chol-chor-chy, and in certain parts of both sections we find cheol-cheor-cheody (along with the variants of all these words that begin with sh). This is a greatly oversimplified description of how these words are distributed in the VMS, but it seems that words from this class are common in all portions of the actual text, whereas they are very rare in your automatically generated text.

Quote:René Zandbergen has given a set of features on his website (see You are not allowed to view links. Register or Login to view.):
- The first character of each paragraph is one of a very small subset of characters.
- The first character of each line does not have the same frequency distribution as the rest of the text.
- With very few exceptions, the characters p and f only occur in the top lines of paragraphs.
- The second order character entropy is anomalously low.
- Words in the MS tend to follow certain word patterns, i.e. there are some weak positional rules, and fairly strong rules about character combinations.
- There are almost no repeating phrases.

With my app I only want to demonstrate that it is possible to generate a text with such features with a algorithm simulating my auto copy hypotheses.

Okay, but I think "words almost never begin with <e>" is also an important property of the text, among many other things.

Quote:The main problem was to model human ingenuity in to the algorithm and to keep the algorithm as simple as possible. Therefore the algorithm only produces a pseudo text with features similar to the VMS.

If you can only reproduce some features of the VMS, but not all of them (including features that are found throughout the text), then I don't see how you can seriously claim that your method was used. Reproducing a few of its properties would seem pretty trivial. Any arbitrary text chosen at random could probably be said to possess some properties of the VMS text.

Quote:If I would generate the same text as in the VMS Emma would be right with her objection that my algorithm only reproduces the movements of the scribe.

It should at least blend in with the others. Your text is clearly way different even at a glance, and I'm sure any statistical test would also demonstrate this.

Another thing is that your method requires some "seed" text to begin with, which you are taking from the VMS itself, but where does your VMS creator get his original text from?

RE: Voynich text generator - Torsten - 27-02-2016

@sam

> If you can only reproduce some features of the VMS, but not all of them

What does 'all' mean? With 'all' as definition you can always add a new feature to your 'all'-list as long as the algorithm doesn't produce the same text as in the VMS. Yes, you can argue that the algorithm doesn't produce the same text. But all I want to do is to demonstrate that with the auto copy hypotheses it is possible to generate a text with similar features.

> Okay, but I think "words almost never begin with <e>" is also an important property of the text.

In the transcription such words exist:
You are not allowed to view links. Register or Login to view.
It would be an interesting question if all this words are transcription problems or not. Anyway, it would be no big deal to add a rule to the generator that prevents 'e' at the beginning of a word. But the question was if it is possible to generate a text with features similar to that of the VMS. Did you really think that for the answer to this question such a rule makes a big difference.

> Another thing is that your method requires some "seed" text to begin with, which you are taking from the VMS itself, but where does your VMS creator get his original text from?

The use of the script already requires that the scribe have practiced writing in that script before he started with the VMS. It was necessary to define all the characters used in the script before starting writing in that script. Therefore, I think that the scribe was starting on a page, which was later not added to the VMS. My guess would be that he started with a single word "daiin".

RE: Voynich text generator - Sam G - 27-02-2016

(27-02-2016, 04:52 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.@sam

> If you can only reproduce some features of the VMS, but not all of them

What does 'all' mean? With 'all' as definition you can always add a new feature to your 'all'-list as long as the algorithm doesn't produce the same text as in the VMS. Yes, you can argue that the algorithm doesn't produce the same text.

Properties found in all sections of the text would be a reasonable definition to start with, I think. Then maybe differing sets of rules to get A language text vs. B language, etc.

Quote:But all I want to do is to demonstrate that with the auto copy hypotheses it is possible to generate a text with similar features.

Alright, well I don't think it demonstrates a whole lot. You basically have a system for modifying the VMS that preserves some but not all of its properties. It certainly does not show or even suggest that the VMS was produced by this method. What does it prove?

Quote:> Okay, but I think "words almost never begin with <e>" is also an important property of the text.

In the transcription such words exist:
You are not allowed to view links. Register or Login to view.
It would be an interesting question if all this words are transcription problems or not.

They're not all transcription errors, but some of them are, and in any case such words are very rare in the VMS but very common in your text.

Quote:Anyway, it would be no big deal to add a rule to the generator that prevents 'e' at the beginning of a word. But the question was if it is possible to generate a text with features similar to that of the VMS. Did you really think that for the answer to this question such a rule makes a big difference.

Well, it would be one more similar feature if "words almost never begin with <e>" were a true statement about your text. If you were trying to show that your method was in fact used to create the VMS, then it seems like you would want to match as many features as possible, but I guess you're not actually trying to show that. I actually don't understand what you're trying to do.

Quote:> Another thing is that your method requires some "seed" text to begin with, which you are taking from the VMS itself, but where does your VMS creator get his original text from?

The use of the script already requires that the scribe have practiced writing in that script before he started with the VMS. It was necessary to define all the characters used in the script before starting writing in that script. Therefore, I think that the scribe was starting on a page, which was later not added to the VMS. My guess would be that he started with a single word "daiin".

Well then it seems like text generated earlier should have different properties than that generated by later iterations of this process, right? Have you attempted to show this for the VMS, or otherwise show which portions were used to generate which other portions, i.e. put them in order?

RE: Voynich text generator - Torsten - 27-02-2016

Hello Sam

> What does it prove?

To determine statistical ratios (entropy, random walk etc.) I need some sample text. I could write some sample text on my own. The question is would you trust me if I show you some text and say this text was generated this or that way. With the app I can tell you a start combination and you can verify the sample text.

> Well then it seems like text generated earlier should have different properties than that generated by later iterations of this process, right?

This is a well known fact for the VMS. Currier described two languages in 1976 (see You are not allowed to view links. Register or Login to view.).

Reddy and Knight "show the proportion of words in each page that are classified as the B language" in figure 2 (see Figure 2 in You are not allowed to view links. Register or Login to view.).
[Image: Figure2.png]

Figure 2 suggests that if you would reorder the pages of the VMS according to the proportion of words in B language you would indeed reveal the initial order for the pages.

Montemurro and Zanette describe a "network of relationships between the sections" of the VMS (see Figure 4A in You are not allowed to view links. Register or Login to view.).
[Image: Figure4a.jpg]

There result suggests that you should reorder the sections in the following order: Herbal in Currier A, Pharmaceutical section, Astronomical section, Cosmological section, Herbal in Currier B, Stars and Biological section.

Let me describe it with the most frequent word in currier B as example. This word is 'chedy'. This word is missing for pages in Currier A. (see You are not allowed to view links. Register or Login to view.)

But the last word in f1r.P1.1 is 'sholdy'. (see You are not allowed to view links. Register or Login to view.)

The first step is that the 'l' was removed from 'sholdy' resulting in 'shody' and 'chody'. (see You are not allowed to view links. Register or Login to view.)

The next step is that a 'e' was added in front of 'o'. This step is resulting in 'sheody' and 'cheody'. (see You are not allowed to view links. Register or Login to view.)

The next step is that 'eo' was replaced with 'ee'. This results in 'sheedy' and 'cheedy'.
(see You are not allowed to view links. Register or Login to view.)

The last step is to replace 'ee' with 'e' which results in 'shedy' and 'chedy'. (see You are not allowed to view links. Register or Login to view.)

Since 'chedy' is the most frequent word in Currier B the bigram 'ed' is typical for Currier B. The reason that 'chedy' did not occur on pages in Currier A is that the scribe did not know that he will later write 'ed' instead of 'od'. This is the reason that the change from 'od' into 'ed' is a good marker to distinguish between Currier A and B pages.

RE: Voynich text generator - Sam G - 28-02-2016

(27-02-2016, 08:10 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.To determine statistical ratios (entropy, random walk etc.) I need some sample text. I could write some sample text on my own. The question is would you trust me if I show you some text and say this text was generated this or that way.

Probably.

Quote:> Well then it seems like text generated earlier should have different properties than that generated by later iterations of this process, right?

This is a well known fact for the VMS. Currier described two languages in 1976 (see You are not allowed to view links. Register or Login to view.).

Reddy and Knight "show the proportion of words in each page that are classified as the B language" in figure 2 (see Figure 2 in You are not allowed to view links. Register or Login to view.).

Figure 2 suggests that if you would reorder the pages of the VMS according to the proportion of words in B language you would indeed reveal the initial order for the pages.

Montemurro and Zanette describe a "network of relationships between the sections" of the VMS (see Figure 4A in You are not allowed to view links. Register or Login to view.).

There result suggests that you should reorder the sections in the following order: Herbal in Currier A, Pharmaceutical section, Astronomical section, Cosmological section, Herbal in Currier B, Stars and Biological section.

I can see how that makes sense, especially with Bio having less in common with Herbal A than the Stars section does. It's not how the pages are presently ordered and, I think, cannot be how they were intended to be ordered (Herbal A must have been first and Stars last), though you could argue that they needn't have been written in that order.

Quote:Let me describe it with the most frequent word in currier B as example. This word is 'chedy'. This word is missing for pages in Currier A. (see You are not allowed to view links. Register or Login to view.)

I think it's funny that you're using <chedy> as an example after I just pointed out its rarity in your sample text, but anyway...

Quote:But the last word in f1r.P1.1 is 'sholdy'. (see You are not allowed to view links. Register or Login to view.)

The first step is that the 'l' was removed from 'sholdy' resulting in 'shody' and 'chody'. (see You are not allowed to view links. Register or Login to view.)

The next step is that a 'e' was added in front of 'o'. This step is resulting in 'sheody' and 'cheody'. (see You are not allowed to view links. Register or Login to view.)

The next step is that 'eo' was replaced with 'ee'. This results in 'sheedy' and 'cheedy'.
(see You are not allowed to view links. Register or Login to view.)

The last step is to replace 'ee' with 'e' which results in 'shedy' and 'chedy'. (see You are not allowed to view links. Register or Login to view.)

Since 'chedy' is the most frequent word in Currier B the bigram 'ed' is typical for Currier B. The reason that 'chedy' did not occur on pages in Currier A is that the scribe did not know that he will later write 'ed' instead of 'od'. This is the reason that the change from 'od' into 'ed' is a good marker to distinguish between Currier A and B pages.

No, the word <chedy> alone cannot account for the distribution of <ed>, if that's what you're saying. There are many other common words that contain <ed>, such as <okedy>, <okeedy>. <qokedy>, <qokeedy>, etc., as well as rare words that contain <ed>, and these words are also very rare or entirely absent in the A language text. Also we have <ed> and <od> coexisting in most of the B language text, but <od> only drops out in the Bio section (if I recall correctly). It's difficult to see how your text copying method could have produced such a distribution unless your scribe was consciously thinking about these particular pairs of letters and deliberately adjusting how often they were used. Otherwise, as I understand your method, all these words should have been treated independently and we shouldn't see any correlation like this.

RE: Voynich text generator - Torsten - 28-02-2016

Quote:I think it's funny that you're using <chedy> as an example after I just pointed out its rarity in your sample text, but anyway...

The purpose of the app is to demonstrate that it is possible to write a pseudo text with such a simple method and that the method can result in features similar to the VMS. Its an misunderstanding if you believe that the method always will result in the same text. You can generate as many different sample texts with the app as you like. Some of this texts would have more with the VMS in common and some off them less.

Quote:No, the word <chedy> alone cannot account for the distribution of <ed>, if that's what you're saying. There are many other common words that contain <ed>, such as <okedy>, <okeedy>. <qokedy>, <qokeedy>, etc.

This words are just copies of 'chedy'. A prefix like 'o' added to 'chedy' or 'cheedy' most times will result in 'okedy' or 'okeedy' (see the grid in You are not allowed to view links. Register or Login to view. p. 78 or see pages with many 'qokeedy' words like You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view.)

In the same way a prefix 'o' added to 'daiin' can result in 'okaiin', 'o' added to 'chol' can result in 'okol', 'l' added to 'chedy' can result in 'lkedy' etc.

Quote:Also we have <ed> and <od> coexisting in most of the B language text

Indeed, since the pages in Currier B were written after the pages in Currier A the scribe was able to use also the Currier A form there. This is a hint that the pages in Currier B were written after the pages in Currier A.

Quote: Otherwise, as I understand your method, all these words should have been treated independently

The words in the VMS are related to each other. "Similarly spelled word types occur with predictable frequencies. They occur with comparable frequency, whereas types which contain less frequent glyphs or bigrams in most cases occur less frequently" (see You are not allowed to view links. Register or Login to view. p. 2).

"An example for a path between 'daiin' and 'ol' is 'daiin' - 'dain' - 'dan' - 'dar' - 'ar' - 'al' - 'ol'. An example for a path between 'daiin' and
'chedy' is 'daiin' - 'chdaiin' - 'chedaiin' - 'chedan' - 'chedan' - 'cheda' - 'chedy'" (see You are not allowed to view links. Register or Login to view. p. 7) . Since the words are copied from each other the 'da' in 'daiin' is as 'dy' still part of 'chedy' (Note: 'a' at the end of a word in most cases changes into 'y'). This is the reason for the rigid word structure of the VMS.

Or see the chapter 'Affinities between Words and Sections of the Text' in the Montemurro and Zanette paper You are not allowed to view links. Register or Login to view.:

RE: Voynich text generator - Sam G - 01-03-2016

(28-02-2016, 08:46 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
Quote:Also we have <ed> and <od> coexisting in most of the B language text

Indeed, since the pages in Currier B were written after the pages in Currier A the scribe was able to use also the Currier A form there. This is a hint that the pages in Currier B were written after the pages in Currier A.

So then why does <od> finally drop out almost entirely in the Bio section?

You are not allowed to view links. Register or Login to view.

Is the VMS author consciously thinking about eliminating it?

RE: Voynich text generator - Torsten - 01-03-2016

Quote:So then why does <od> finally drop out almost entirely in the Bio section?

'od' can also occur as result for a prefix 'o' in words like 'odain', 'odol' etc. Therefore I would compare the usage of 'ody' and 'edy':

You are not allowed to view links. Register or Login to view.

There is switch from 'od' to 'ed'. If my auto copying hypotheses is correct this effect was boosted by the copying process. Maybe the reason for this switch was that it is easier to to write 'e' instead of 'o'. The "Bio"-section also contains more repeatings then the other sections. For me it seems that the scribe was working less carefully here.

In my eyes the "Bio"-section was the last section written by the scribe of the VMS. Maybe he was less motivated while writing this pages. After writing so many pages this is understandable in my eyes. But we can't know for sure about his motivation.

RE: Voynich text generator - Sam G - 01-03-2016

(01-03-2016, 02:29 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.There is switch from 'od' to 'ed'. If my auto copying hypotheses is correct this effect was boosted by the copying process. Maybe the reason for this switch was that it is easier to to write 'e' instead of 'o'. The "Bio"-section also contains more repeatings then the other sections. For me it seems that the scribe was working less carefully here.

Well, there are still plenty of instances of <o> in the Bio section, just not <od>, so this explanation does not really make sense to me.

RE: Voynich text generator - Torsten - 01-03-2016

Quote:Well, there are still plenty of instances of <o> in the Bio section, just not <od>, so this explanation does not really make sense to me.

We can't know for sure what his motivation was. Maybe there was no motivation and its just a switch.

Anyway the usage of 'o' and 'e' is different for Currier A pages and Currier B pages:
You are not allowed to view links. Register or Login to view.

The reason for this difference is the shift from 'chol' in Currier A to 'chedy' in Currier B.

Pages in Currier A prefer words similar to 'chol':
You are not allowed to view links. Register or Login to view.

Pages in Currier B prefer words similar to 'chedy':
You are not allowed to view links. Register or Login to view.