The Voynich Ninja
Sequential word repetitions in the VMS - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Sequential word repetitions in the VMS (/thread-61.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19


RE: Sequential word repetitions in the VMS - Wladimir D - 11-02-2016

I read somewhere that the language of manuscript is a secret language of pharmacists.
In this case, there may be a dictionary (a list of) the technological processes, which includes: production, disease list, and application method. To each, of technological process corresponds to a "code word".
After the "code word" in the text written list of ingredients from a plant in the figure. With these ingredients, is necessary to act according to the technological   process. The rest of text only fills the page.


RE: Sequential word repetitions in the VMS - Davidsch - 22-02-2016

I was doing research on this,  quite some time ago, and i came to the conclusion that this word repetition is normal.

As written above, there are languages where words occur many times, also parts of words, 
also words can appear to be the same but are different in interpretation.

Also, a comma or end of line is a possibility and lastly the text could mean one or more numbers.

So:

abcdef abcdef abcdef abcdef 

could mean:

abcdef abcdef abcdef abcdef 
abc defabc defab cdef a bcdef
abcdef,  abcdef abcdef . Abcdef 
34568 34568 34568 34568
3 3 3 3
million million million million


RE: Sequential word repetitions in the VMS - Davidsch - 01-03-2016

however, this challenge


Quote:ReneZ:
I wonder whether it is possible to create a dictionary, by which each Voynich word can be matched to one plain text word in some language in such a way that the result is a meaningful text in that language. I.e.: can the Voynich MS be translated word for word? 

could be possible  if we could limit ourselves to certain words.

For example the first 3 or 4 words on every page?


RE: Sequential word repetitions in the VMS - crezac - 13-03-2016

(20-08-2015, 08:28 AM)david Wrote: You are not allowed to view links. Register or Login to view.
Quote:Is that too many as compared with natural languages?
I have no idea. But those four words you quoted are some of the most famous Voynich words out there.
I also don't know how this compares to other handcopied manuscripts of the era. It could be a way into resolving the "original or a copy?" question.

Quote:Is not the repetition issue too exaggerated?

I think the issue is not direct repetition, but rather many similar words repeated, as in Timms pairs and [ah-hem] Jackson sequences, which are not features of written languages.


When you worry about repetition you assume that the word as written has the same meaning every time.  There's really no reason to assume this.  Even in English a word like do has at least 8 meanings, more if you consider homophones.  Doo doo and other informal usages increase the count.  Run has around 18 definitions in most dictionaries; some of them nouns, some of them verbs, but most of them modern usages.  This linguistic flexibility could account for repetition in VMS if it were written by someone who enjoyed playing with language.

Side note:  it's this flexibility that also makes word frequency analysis of questionable use in comparative language analysis, or even word distribution within a document written in a single language.  Knowing how many times run is used in a document is interesting, but it doesn't aid understanding in a big way when you don't know which definitions of run are being used most.  You're really doing a character distribution on sets of characters rather than any kind of word analysis.  Unless, of course, the word is guaranteed to be unique by some feature of the language.

(08-02-2016, 05:23 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.@Don
 
Typical for the VMS are strong rules about character combinations and weak rules for the word order. Both features are the result of the copying process. If the scribe was adding a new element to the text generation process one of two things could happen: A.) He could add the new element into his set of copying rules. In this case it was frequently used and we would see it as a rule. Or B.) he could forget about the new element. In this case the new element is rarely used and it would look like an exception to us.
 
...
As last glyph 'a' occurs 63 times. Is the existence of 'cheda' therefore an error or is it a valid word?
The bigram 'ya' occurs 22 times. Is the existence of 'dyaiin' therefore an error or is it a valid word?
 
Whatever the scribe does, for us it always looks like a rule or like an error. But they are the same. There are only too many possibilities for changing a word. It is impossible to use all of them frequently.
 
@Diane: That something can happen within language doesn't mean that it is typical for language.

I don't think that anyone is suggesting that there are any particular rules Voynichese has to follow based upon the written form (at least not yet).  But there are feature sets that are typical for all languages, things like having nouns and using phonemes.  There are probably more possibilities of what you can do in a language than there are languages though if anyone cared to make a comprehensive list.

The trick here is to figure out what's going on in VMS.  Given how the manuscript was produced I think it's safe to assume some errors crept in.  Assuming everything that you don't want to explain is an error though strikes me as lazy.  Work from the assumption that there aren't any errors and the errors will be obvious eventually.  Throw out anything you don't like and you many never figure out what's happening.  I'll grant that's a possibility anyway, but I'm arguing approach not output.

English has "strong rules about character combinations".  We call them spelling rules and learn them as a part of learning to write.  Generally speaking if you can spell something properly you don't misspell it in written English.  Most English spelling errors are the result of trying to figure out which spelling rules are used to generate an unfamiliar work.  It's easier to memorize how a word is spelled than it is to regenerate it; this is why when we are learning spelling rules we memorize many of the really common words need to write.  Spelling has become more standardized but there are still words that require us to memorize their spellings because the rules don't help.  The nice thing though is you can usually tell from context if a word has been spelled or used incorrectly.

I wouldn't say English has "weak rules for the word order" but I might mean the same thing if I say English grammar allows for significant variation in word order which may either modify the semantic content or leave it unchanged depending on both word choice and word order.  The reason for these choices is that written English's primary function is to record semantic content.  If a word is written down it is more important to know what it means than how to pronounce it.  I learned that young, to my chagrin, because I mispronounced chagrin when reading to myself and was embarrassed by how badly I pronounced it when trying to use it in speech.  In speech, it matters.  Mayan emphasizes phonemes.  If you can write Mayan you can write almost anything in it -- even if you don't understand the language you are writing.  If you can read Mayan you don't mispronounce Mayan.

But it might be a mistake to think that strong rules about character combinations are always spelling rules.  It's certainly a mistake to think that the scribe can write anything he likes and produce a meaningful text.  It isn't even clear whether VMS gives priority to semantic contents, pronunciation or something else.  Additional information like tonality, vocal emphasis, negation or whatever else the scribe found important might be encoded by something more subtle than word repetition (the words you perceive as errors perhaps).


RE: Sequential word repetitions in the VMS - -JKP- - 15-03-2016

(01-03-2016, 02:34 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.however, this challenge


Quote:ReneZ:
I wonder whether it is possible to create a dictionary, by which each Voynich word can be matched to one plain text word in some language in such a way that the result is a meaningful text in that language. I.e.: can the Voynich MS be translated word for word? 

could be possible  if we could limit ourselves to certain words.

For example the first 3 or 4 words on every page?

The first 2 or 3 words on every plant page are different from the rest of the text, so you either need a bigger sample of words or you need to choose some of the midtext, if you want something characteristic of the main text as a whole.


RE: Sequential word repetitions in the VMS - MarcoP - 04-09-2017

(19-08-2015, 11:15 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.I counted 114 unique words that can be repeated. It is only 1,7% of the total of 6818 unique words. Is that too many as compared with natural languages?

Is not the repetition issue too exaggerated?

I go back to the first page of the thread and Anton's original comments. I don't think the repetition issue is exaggerated. On the contrary, I think it is an important feature of Voynichese. It might not be a unique feature, but still it very likely is a meaningful feature.


I have checked Virgil's Aeneid, La Divina Commedia and Matthioli.
  • I have considered an excerpt of the Aeneid just slightly longer than the VMS (37K words). It contains 7 exact repetitions (6 unique repeating words). 12236 different words in the text. 0.05% repeat
  • La Divina Commedia has almost twice the words of the VMS. It contains 33 exact repetitions (31 unique repeating words). 9871 different words in the text. 0.3% repeat.
  • The excerpt from Matthioli I used is more than 100K words long. It contains 9 exact repetitions (8 unique repeating words). 21910 different words in the text. 0.04% repeat

I think this kind of phenomenon largely depends on the individual style of writing (so it could not strictly depend on the "language" and could vary greatly in different texts written in the same language). It's interesting to see that Dante used repetitions to stress specific meanings. The most frequent usage seems to me to intensify an imperative.

 Volsersi verso me le buone scorte;
e Virgilio mi disse: «Figliuol mio,
qui può esser tormento, ma non morte.
 Ricorditi, ricorditi! E se io
sovresso Gerion ti guidai salvo,
che farò ora presso più a Dio?» (Purg. 27.19-24)

  My gentle escorts turned to me,
and Virgil said: “My son, though there may be
suffering here, there is no death. Remember
 remember!
If I guided you to safety
even upon the back of Geryon,
then now, closer to God, what shall I do?”

I think the high number of repetitions in Dante doesn't depend on the Italian language, but more on the poet's style. But there could be languages in which some similar pattern is common and not "poetic".


The number of unique repeating words in the VMS is more than 5 times greater than the higher observed figure in these comparisons. But the difference in actual (non-unique) repetitions is much greater: close to 10 times greater than the number of repetitions in longer texts.

I also want to share some data about non-exact repetitions. This is an histogram of 'quasi-repetitions' of the following types:

pW.W
W.pW
Ws.W
W.Ws

Where
W is a generic word at least 4 EVA characters long
p is a 1 or 2 characters prefixed
s is a 1 or 2 characters suffix
. is a space separating two words


Some examples.
q- pre:
<f116r.17,+P0>dain.chey.qokeey.okeey.lain.okeey.qol.chedy<$>

q- post:
<f33r.2,+P0>ytchedy.qokar.cheky.okaldy.qokaldy.otor.oldar.qotar.otar.otardam

o- pre:
<f84v.39,+P0>lshedy.qol.aiin.okey.olchey.lchey.olshedy.shckhy.soly

o- post:
<f86v6.36,+P0>dairal.daiin.qokar.choltal.cthdy.qokeey.lkaiin.olkaiin.araiin

   

You are not allowed to view links. Register or Login to view., I mentioned that qo- words are those that account for most exact repetitions (of the W.W pattern).

This analysis of prefix and suffix quasi-repetitions (of course there are quasi-repetitions that don't match these patterns) confirms the correlation with q-.

Other notes:
  • q- and o- contribute to quasi repetitions both individually and in the qo- combination.
  • suffixes seem to play a minor role in the phenomenon: the top six variants are all prefixes.
  • q- has no preference between the pre- and post- variants (qW.W and W.qW have very similar frequencies)


These are some triple quasi-repetitions that exemplify the mobility of the q- prefix.

<f31r.10,+P0>tolshso.okedy.okedy.qokedy.qokeedy.dar.shedshey-
<f84v.14,+P0>dshey.olkeey.dol.ol.otedy.okedy.okedy.qokedy.dal.dar.ol.chedy.sain-

<f75r.45,+P0>sshedy.shckhy.qokey.okedy.sarol.oty.otedy.qotedy.otedy.okaiin-

<f99v.38,+P0>ol.cheey.qokeol.okeol.okeol.shokol.ykey-
<f108v.8,+P0>ysheedy.okeedy.oteedy.qokeedy.okeedy.okeedy.chedal.okar.qoteedar.oty-

I find all these repetitions and quasi-repetitions extremely fascinating. Both the correlation with q- and the high frequency with which these repetitions occur make clear that they are not random or coincidental.

One should also consider that there are 27 words that occur less than 30 times and still present exact consecutive occurrences.

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

The probability of each single one of these cases to happen by coincidence is less than 1 in a thousand. Given the number of distinct words in the ms, one could expect a few such cases, not 27.

Cases like qopchedy (having a total of 32 occurrences and 2 exact repetitions) are also noteworthy.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.


RE: Sequential word repetitions in the VMS - Koen G - 04-09-2017

Wonderful work again, Marco. 
As you indicate, it seems clear that in literary language (in the languages we are familiar with) we have to count on poetic license to get this many duplications

Now I haven't read through this entire thread which predates my activity as a Voynich researcher, but this phenomenon reminds me of reduplication. This is a grammatical phenomenon in many languages. For example, to form a plural.

Note that in a pre-modern setting, there were few spelling rules, so in theory it doesn't matter whether these words are written as one or two words. For example if [pin] means chair and [pinpin] means chairs, then the spelling of the plural could be "pinpin" or "pin pin" depending on scribal preferences.

The wiki has a huge amount of examples from many language families: You are not allowed to view links. Register or Login to view.
I'm not saying that it must be one of the languages listed in the wiki of course. I think there is a fair chance that Voynichese is a pidgin, which will almost certainly be undocumented (forgotten to the ages) but is likely to adopt a process like reduplication. In Afrikaans, for example, reduplication is well documented.

It may be interesting to see if we can check a text in one of the listed languages just to get an idea. Too bad nobody on the forum can understand these.

Alternatively of course, the text type (scientific/practical/instructional?) might cause the reduplication in Voynichese. If the contents describe various procedures, you can get reduplication which would lack elegance in a literary text. But I can't think of a good concrete example at the moment. Something like directions for example: "to reach the church, you go left, left and then right". But I'd think that the first option (linguistic process) might be more likely.


RE: Sequential word repetitions in the VMS - ReneZ - 04-09-2017

(04-09-2017, 05:41 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.The wiki has a huge amount of examples from many language families: You are not allowed to view links. Register or Login to view.
I'm not saying that it must be one of the languages listed in the wiki of course.
[...]
It may be interesting to see if we can check a text in one of the listed languages just to get an idea. Too bad nobody on the forum can understand these.

Thai uses reduplication for two cases, one to make a particular kind of plural, and the other resulting from the fact that some classifier words (something European languages tend not to use) are the same as the nouns they classify.

Conveniently, Thai usually does not repeat the word in writing, but uses a special symbol (ๆ) which just means: previous word repeated.

- - -

Edit (addition): however, very inconveniently, Thai does not use spaces to separate words. They are just strung together. So, while it would be very easy to count the reduplications in a text, it is not easy to automatically count the different words...


RE: Sequential word repetitions in the VMS - MarcoP - 04-09-2017

(04-09-2017, 05:41 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Wonderful work again, Marco. 
As you indicate, it seems clear that in literary language (in the languages we are familiar with) we have to count on poetic license to get this many duplications

Thank you, Koen!
I am glad you are interested in the subject!
I am not sure it's possible to find these many duplication in a modern European text. In particular, matches for the quasi-duplication seem to be hard to find. But I am ignorant and I may be wrong.

(04-09-2017, 05:41 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Note that in a pre-modern setting, there were few spelling rules, so in theory it doesn't matter whether these words are written as one or two words. For example if [pin] means chair and [pinpin] means chairs, then the spelling of the plural could be "pinpin" or "pin pin" depending on scribal preferences.

The wiki has a huge amount of examples from many language families: You are not allowed to view links. Register or Login to view.


Yes, I read the wiki page following Emma's mention of Reduplication in You are not allowed to view links. Register or Login to view..
It's quite interesting that Reduplication sometimes involves a prefix that could work vaguely like EVA:q-
See for instance this paper by Annie Montaut about Reduplication in Hindi (paragraph 2.1 in particular).
You are not allowed to view links. Register or Login to view.

shadi-vadi / atma-vatma seem interesting examples.

(04-09-2017, 05:41 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.It may be interesting to see if we can check a text in one of the listed languages just to get an idea. Too bad nobody on the forum can understand these.

Understanding the text might not be strictly necessary to compare statistics.

(04-09-2017, 05:41 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Alternatively of course, the text type (scientific/practical/instructional?) might cause the reduplication in Voynichese. If the contents describe various procedures, you can get reduplication which would lack elegance in a literary text. But I can't think of a good concrete example at the moment. Something like directions for example: "to reach the church, you go left, left and then right". But I'd think that the first option (linguistic process) might be more likely.

Everything is possible, but without actual examples it is mostly speculation. Again, I find quasi-duplication difficult to explain in the languages I am familiar with, but the Hindi paper suggests that similar things happen in natural languages. Of course, I have no idea if the phenomenon in Hindi might be as extensive as it appears to be in Voynichese.


RE: Sequential word repetitions in the VMS - Koen G - 05-09-2017

(04-09-2017, 06:17 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Edit (addition): however, very inconveniently, Thai does not use spaces to separate words. They are just strung together. So, while it would be very easy to count the reduplications in a text, it is not easy to automatically count the different words...

Yeah, I was looking around a bit to see if I could find a text in one of the relevant languages but bumped into the same problem (Hindi for example). 
This does not exclude such languages from Voynichese candidates since we'd be dealing with a transcription effort, so changes in spelling custom can occur. It just makes it hard to find a suitable text for testing.