Sam G > 27-02-2016, 02:32 PM
(27-02-2016, 12:12 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.@Sam: The VMS is not so homogeneous as you suggest. With pages in Currier A in mind the pages in Currier B would also look strange.
Quote:René Zandbergen has given a set of features on his website (see You are not allowed to view links. Register or Login to view.):
- The first character of each paragraph is one of a very small subset of characters.
- The first character of each line does not have the same frequency distribution as the rest of the text.
- With very few exceptions, the characters p and f only occur in the top lines of paragraphs.
- The second order character entropy is anomalously low.
- Words in the MS tend to follow certain word patterns, i.e. there are some weak positional rules, and fairly strong rules about character combinations.
- There are almost no repeating phrases.
With my app I only want to demonstrate that it is possible to generate a text with such features with a algorithm simulating my auto copy hypotheses.
Quote:The main problem was to model human ingenuity in to the algorithm and to keep the algorithm as simple as possible. Therefore the algorithm only produces a pseudo text with features similar to the VMS.
Quote:If I would generate the same text as in the VMS Emma would be right with her objection that my algorithm only reproduces the movements of the scribe.
Torsten > 27-02-2016, 04:52 PM
Sam G > 27-02-2016, 05:44 PM
(27-02-2016, 04:52 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.@sam
> If you can only reproduce some features of the VMS, but not all of them
What does 'all' mean? With 'all' as definition you can always add a new feature to your 'all'-list as long as the algorithm doesn't produce the same text as in the VMS. Yes, you can argue that the algorithm doesn't produce the same text.
Quote:But all I want to do is to demonstrate that with the auto copy hypotheses it is possible to generate a text with similar features.
Quote:> Okay, but I think "words almost never begin with <e>" is also an important property of the text.
In the transcription such words exist:
You are not allowed to view links. Register or Login to view.
It would be an interesting question if all this words are transcription problems or not.
Quote:Anyway, it would be no big deal to add a rule to the generator that prevents 'e' at the beginning of a word. But the question was if it is possible to generate a text with features similar to that of the VMS. Did you really think that for the answer to this question such a rule makes a big difference.
Quote:> Another thing is that your method requires some "seed" text to begin with, which you are taking from the VMS itself, but where does your VMS creator get his original text from?
The use of the script already requires that the scribe have practiced writing in that script before he started with the VMS. It was necessary to define all the characters used in the script before starting writing in that script. Therefore, I think that the scribe was starting on a page, which was later not added to the VMS. My guess would be that he started with a single word "daiin".
Torsten > 27-02-2016, 08:10 PM
Sam G > 28-02-2016, 06:31 PM
(27-02-2016, 08:10 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.To determine statistical ratios (entropy, random walk etc.) I need some sample text. I could write some sample text on my own. The question is would you trust me if I show you some text and say this text was generated this or that way.
Quote:> Well then it seems like text generated earlier should have different properties than that generated by later iterations of this process, right?
This is a well known fact for the VMS. Currier described two languages in 1976 (see You are not allowed to view links. Register or Login to view.).
Reddy and Knight "show the proportion of words in each page that are classified as the B language" in figure 2 (see Figure 2 in You are not allowed to view links. Register or Login to view.).
Figure 2 suggests that if you would reorder the pages of the VMS according to the proportion of words in B language you would indeed reveal the initial order for the pages.
Montemurro and Zanette describe a "network of relationships between the sections" of the VMS (see Figure 4A in You are not allowed to view links. Register or Login to view.).
There result suggests that you should reorder the sections in the following order: Herbal in Currier A, Pharmaceutical section, Astronomical section, Cosmological section, Herbal in Currier B, Stars and Biological section.
Quote:Let me describe it with the most frequent word in currier B as example. This word is 'chedy'. This word is missing for pages in Currier A. (see You are not allowed to view links. Register or Login to view.)
Quote:But the last word in f1r.P1.1 is 'sholdy'. (see You are not allowed to view links. Register or Login to view.)
The first step is that the 'l' was removed from 'sholdy' resulting in 'shody' and 'chody'. (see You are not allowed to view links. Register or Login to view.)
The next step is that a 'e' was added in front of 'o'. This step is resulting in 'sheody' and 'cheody'. (see You are not allowed to view links. Register or Login to view.)
The next step is that 'eo' was replaced with 'ee'. This results in 'sheedy' and 'cheedy'.
(see You are not allowed to view links. Register or Login to view.)
The last step is to replace 'ee' with 'e' which results in 'shedy' and 'chedy'. (see You are not allowed to view links. Register or Login to view.)
Since 'chedy' is the most frequent word in Currier B the bigram 'ed' is typical for Currier B. The reason that 'chedy' did not occur on pages in Currier A is that the scribe did not know that he will later write 'ed' instead of 'od'. This is the reason that the change from 'od' into 'ed' is a good marker to distinguish between Currier A and B pages.
Torsten > 28-02-2016, 08:46 PM
Quote:I think it's funny that you're using <chedy> as an example after I just pointed out its rarity in your sample text, but anyway...
Quote:No, the word <chedy> alone cannot account for the distribution of <ed>, if that's what you're saying. There are many other common words that contain <ed>, such as <okedy>, <okeedy>. <qokedy>, <qokeedy>, etc.
Quote:Also we have <ed> and <od> coexisting in most of the B language text
Quote: Otherwise, as I understand your method, all these words should have been treated independently
Sam G > 01-03-2016, 08:49 AM
(28-02-2016, 08:46 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Quote:Also we have <ed> and <od> coexisting in most of the B language text
Indeed, since the pages in Currier B were written after the pages in Currier A the scribe was able to use also the Currier A form there. This is a hint that the pages in Currier B were written after the pages in Currier A.
Torsten > 01-03-2016, 02:29 PM
Quote:So then why does <od> finally drop out almost entirely in the Bio section?
Sam G > 01-03-2016, 09:13 PM
(01-03-2016, 02:29 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.There is switch from 'od' to 'ed'. If my auto copying hypotheses is correct this effect was boosted by the copying process. Maybe the reason for this switch was that it is easier to to write 'e' instead of 'o'. The "Bio"-section also contains more repeatings then the other sections. For me it seems that the scribe was working less carefully here.
Torsten > 01-03-2016, 10:39 PM
Quote:Well, there are still plenty of instances of <o> in the Bio section, just not <od>, so this explanation does not really make sense to me.