The Voynich Ninja

Full Version: [split] (lack of) word groups
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
(03-07-2019, 03:16 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
I collected some numbers already, you can see them here (final sheet xMATTR):
You are not allowed to view links. Register or Login to view.

Of the texts I checked so far, here's the percentage that have no repeating strings of the following lengths:

2: 1% 
3: 6% 
4: 20% 
5: 32% 
6: 48% 

So about half of these texts never repeat a string of 6 words exactly (taking into account the 1000-word window I always used, which seems reasonably large). 
What may be more interesting is that 9 of my txt files never even repeat a 4-word string. 


Additionally, one file (attached) does apparently not even repeat 3-word strings. It's only 1762 words, which might explain a bit.

Still, I would seriously question the claim that natural languages cannot be without repeating "phrases". If the phrase is 4 words long, I have 9 counterexamples. Make the phrase 5 words long and 20% of checked texts don't repeat any.
As for the VM data, the full transcriptions do show some repeats of 4-word strings and higher. I suspect this is because of one of the messier parts of the transcription, like "o o o o o o o" or something along those lines. The isolated sections (Herbal A, Q13...) do correctly give no hits of strings longer than 3 words. So it's better to focus on those.

If I take Q13 and sort by the amount of repeated 3-word strings, I have 12 non-VM texts who
- repeat fewer 3-word strings than Q13
- repeat no 4-word strings

In the table, 1 means "no repeats over 1000-word window"

[attachment=3064]
(03-07-2019, 04:15 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Additionally, one file (attached) does apparently not even repeat 3-word strings. It's only 1762 words, which might explain a bit.

Still, I would seriously question the claim that natural languages cannot be without repeating "phrases". If the phrase is 4 words long, I have 9 counterexamples. Make the phrase 5 words long and 20% of checked texts don't repeat any.

The text of the VMS contains many frequently used words like <daiin>, <ol>, and <chedy>. A word can only be part of a repeated sequence if it occurs multiple times. It is therefore necessary to check if your text samples contain enough frequently used words. This is for instance not the case for your 1762 words text sample. The text sample is much smaller than the VMS-text and is using many rarely used words. Within the first 20 words even 14 words occur only once. But even in this text sample it is possible to find repeated two word sequences like "per locum" and "ad senem". It is also possible to find words distributed equally over the entire text like 'et' or 'ad'. They are probably function words (like conjunctions, articles, etc.). For the VMS-text such words don't exist (see Timm & Schinner 2019, p. 5).
I agree that a number of problematic properties should ideally be considered together. But this knife cuts both ways. This thread was prompted by Rugg's statement: "But the words in the Voynich Manuscript You are not allowed to view links. Register or Login to view. in their order. That reason alone is enough to eliminate all known languages from being candidates..."

This reason alone is clearly not enough to eliminate natural languages.
(03-07-2019, 06:16 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I agree that a number of problematic properties should ideally be considered together. But this knife cuts both ways. This thread was prompted by Rugg's statement: "But the words in the Voynich Manuscript You are not allowed to view links. Register or Login to view. in their order. That reason alone is enough to eliminate all known languages from being candidates..."

This reason alone is clearly not enough to eliminate natural languages.

We all see the same manuscript and we all describe the same features. There is no discussion if we describe the VMS-text correctly or if it is possible to describe another interesting feature. If you look into past mail threads this was not always the case:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

Today there is only this fruitless discussion that it is not allowed to eliminate natural languages and that it is not allowed to argue that the only purpose of the VMS-text was to look like language. Why it is unthinkable in your eyes that the text doesn't mean anything? Why it is unthinkable in your eyes that the VMS is just some kind of art like the You are not allowed to view links. Register or Login to view.?
I'm just trying to focus on specific problems, can't answer them all at once. I have showed that the presence of recurring phrases is not completely universal.
22% out of 456 medieval texts don't repeat any 5-word phrase. And my corpus, which I assembled before for different reasons, still contains 25 non-Vm texts which never repeat a 4-word phrase. 

My objection is that people like Rugg say a lot about what natural language can and cannot do without actually having tested this. They are faulty assumptions. Those won't get us very far either...
(03-07-2019, 07:44 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Why it is unthinkable in your eyes that the VMS is just some kind of art like the You are not allowed to view links. Register or Login to view.?
Well, because such a thing would be a complete anachronism in the 15th century.
(03-07-2019, 08:32 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Well, because such a thing would be a complete anachronism in the 15th century.

Exactly. It is possible that we are not able to extract any meaning from it anymore, but to make such a large text purely for art is... well... most members of this forum were born already when the Codex Seraphinianus was made!

Torsten, you do have a point that low phrase repetition will mostly be observed in languages with a higher TTR than the VM. This is demonstrated in the graph below, where I took those examples with a three-word phrase repetition close to that of the VM (horizontal axis). The TTR over 1000-word windows is on the vertical axis. 

[attachment=3065]

As you can see, most "few-phrase-repeating" texts score higher on TTR than the VM. But again, counterexamples exist. I attach the best hit from my corpus (Wolfram Lieder).

(Note that despite +-500 texts, the corpus is still only three big languages: Latin, German and Greek - so far from representative still).
(03-07-2019, 08:32 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.
(03-07-2019, 07:44 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Why it is unthinkable in your eyes that the VMS is just some kind of art like the You are not allowed to view links. Register or Login to view.?
Well, because such a thing would be a complete anachronism in the 15th century.

This is simply wrong. See You are not allowed to view links. Register or Login to view.

(03-07-2019, 08:00 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I'm just trying to focus on specific problems, can't answer them all at once. I have showed that the presence of recurring phrases is not completely universal.

22% out of 456 medieval texts don't repeat any 5-word phrase. And my corpus, which I assembled before for different reasons, still contains 25 non-Vm texts which never repeat a 4-word phrase. 



My objection is that people like Rugg say a lot about what natural language can and cannot do without actually having tested this. They are faulty assumptions. Those won't get us very far either...

For what reason did you need a proof that You are not allowed to view links. Register or Login to view.s is describing language correctly? Why didn't you point to a language where You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. is missing?
Torsten those are small inscriptions, that's a completely different thing than a massive manuscript. Even if the VM was a real enciphered text, it would still be the largest pre-modern example. It's several orders of magnitude larger than common medieval "fake text".

About your other questions, I don't know, like I said I can't solve all of Voynichese problems. But I did show that some of the conceptions Rugg seems to have about what real texts can and cannot do are simply wrong.
Pages: 1 2 3 4