27-02-2016, 02:32 PM
(27-02-2016, 12:12 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.@Sam: The VMS is not so homogeneous as you suggest. With pages in Currier A in mind the pages in Currier B would also look strange.
But your results don't match either group. For instance, in B-language texts we tend to see lots of the words chey-cheey-chedy, in A-language we find chol-chor-chy, and in certain parts of both sections we find cheol-cheor-cheody (along with the variants of all these words that begin with sh). This is a greatly oversimplified description of how these words are distributed in the VMS, but it seems that words from this class are common in all portions of the actual text, whereas they are very rare in your automatically generated text.
Quote:René Zandbergen has given a set of features on his website (see You are not allowed to view links. Register or Login to view.):
- The first character of each paragraph is one of a very small subset of characters.
- The first character of each line does not have the same frequency distribution as the rest of the text.
- With very few exceptions, the characters p and f only occur in the top lines of paragraphs.
- The second order character entropy is anomalously low.
- Words in the MS tend to follow certain word patterns, i.e. there are some weak positional rules, and fairly strong rules about character combinations.
- There are almost no repeating phrases.
With my app I only want to demonstrate that it is possible to generate a text with such features with a algorithm simulating my auto copy hypotheses.
Okay, but I think "words almost never begin with <e>" is also an important property of the text, among many other things.
Quote:The main problem was to model human ingenuity in to the algorithm and to keep the algorithm as simple as possible. Therefore the algorithm only produces a pseudo text with features similar to the VMS.
If you can only reproduce some features of the VMS, but not all of them (including features that are found throughout the text), then I don't see how you can seriously claim that your method was used. Reproducing a few of its properties would seem pretty trivial. Any arbitrary text chosen at random could probably be said to possess some properties of the VMS text.
Quote:If I would generate the same text as in the VMS Emma would be right with her objection that my algorithm only reproduces the movements of the scribe.
It should at least blend in with the others. Your text is clearly way different even at a glance, and I'm sure any statistical test would also demonstrate this.
Another thing is that your method requires some "seed" text to begin with, which you are taking from the VMS itself, but where does your VMS creator get his original text from?