The Voynich Ninja
The location of <aiin> and <ain> groups - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The location of <aiin> and <ain> groups (/thread-1517.html)

Pages: 1 2 3


RE: The location of <aiin> and <ain> groups - nickpelling - 12-02-2017

(12-02-2017, 01:04 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.
(12-02-2017, 12:21 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.If this is correct for all lines (not just paragraph-initial and page-initial lines), then it strongly suggests that (a) it is not the first word in a line that gets shortened into some second word by having some putative prefix removed and/or by autocopying, but instead (b) that the first word is typically longer than all the other words because an entirely separate process is going on there, one that prepends an extra letter to the first word of each line.

It doesn't have to be every line, though, does it? Even every other line having an added character would up the average.

For sure: or it could just as well be some pages rather than others, as opposed to some lines rather than others. But the average 1st / 2nd / 3rd / etc word length on paragraph-initial lines as opposed to the average 1st / 2nd / 3rd / etc word length on non-paragraph-initial lines is a fairly straightforward and theory-neutral statistic to work out, if you discard label and zodiac pages etc.


RE: The location of <aiin> and <ain> groups - KnoxMix - 12-02-2017

The average length of line-initial words excluding paragraph-initial words is shorter than expected if the lines were wrapped. If they were not wrapped, it is suspiciously co-incidental that second words are short on average. 
You are not allowed to view links. Register or Login to view.
See the bottom of this page for average word lengths on unwrapped lines.
You are not allowed to view links. Register or Login to view.


RE: The location of <aiin> and <ain> groups - stellar - 12-02-2017

(12-02-2017, 02:38 AM)KnoxMix Wrote: You are not allowed to view links. Register or Login to view.The average length of line-initial words excluding paragraph-initial words is shorter than expected if the lines were wrapped. If they were not wrapped, it is suspiciously co-incidental that second words are short on average. 
You are not allowed to view links. Register or Login to view.
See the bottom of this page for average word lengths on unwrapped lines.
You are not allowed to view links. Register or Login to view.

Do you think Nostradamus is the closest match to the VMS?

[Image: Nostradamus_NOT_WRAPPED(English).JPG]


RE: The location of <aiin> and <ain> groups - Torsten - 12-02-2017

(12-02-2017, 12:21 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.
(11-02-2017, 12:34 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.This observations can be explained as an unintended side effect of the autocopying method[font=Trebuchet MS]. The source for the first word in each line could only be found within the previous lines. Since the first and the last word in each line are easy to spot, the most obvious way is to pick them as a source for the generation of a word at the beginning or at the end of a line.  For the second word it is also possible to select the first word as a source. Since the first word in a line usually has a prefix the simplest change is to remove this prefix.[/font]

But isn't there a completely different statistical result (I vaguely recall Mark Perakh mentioning it some years ago, but I suspect that even by then it was a commonplace) that says that the first word of each line is on average slightly longer than all the other words in the line, not just the second word?

The proposal that this kind of thing holds true for the first word of a page or paragraph is now well-established in Voynich analysis, but less so for non-paragraph-initial and non-page-initial words.

If this is correct for all lines (not just paragraph-initial and page-initial lines), then it strongly suggests that (a) it is not the first word in a line that gets shortened into some second word by having some putative prefix removed and/or by autocopying, but instead (b) that the first word is typically longer than all the other words because an entirely separate process is going on there, one that prepends an extra letter to the first word of each line.

This is the same statistical result. Statistically the first word of a line is longer than average. The second word is shorter than [font=Trebuchet MS]average.[/font] [see You are not allowed to view links. Register or Login to view.]

There is no contradiction. The second word in a line is shorter on average, since the first word is longer on average.


RE: The location of <aiin> and <ain> groups - MarcoP - 12-02-2017

The Occitan poem "Le Breviari d'Amor" (discussed by Stephen Bax You are not allowed to view links. Register or Login to view.) features 274 occurrences of words beginning with om-
Most of them are just “om” (“a man” or “one”) or the plural “oms”. Another frequent om- word is “omnipoten” (36 occurrences). 

None of the 274 occurrences of om- words is line initial.

These numbers are based on You are not allowed to view links. Register or Login to view..

From these simple statistics, one cannot reliably infer if something is or isn't some written form of a natural language.

In my opinion, a major problem in studying line effects is that they could not be language-dependent but largely scribe-dependent, i.e. more paleographic than linguistic. 
To use Torsten's expression, in several cases, “text responds to the page it is written on” because the scribe produces the written text on the basis of the page he is writing on. But precise transcriptions of manuscripts, at a detail level comparable with that of the Voynich EVA transcription, are not easy to find, so it's difficult to make comparisons at the paleographic level.

For instance, these are 3 phenomena that are frequently observed in manuscripts. They certainly affect word statistics, sometimes they could cause “line effects” and they are typically lost in transcription:
  • hyphenation – words are split on two lines, with or without a mark. This has been recently discussed by Pelling with reference to EVA:m. Obviously, hyphenation can produce “words” that only appear at the end or at the start of lines.
  • abbreviation – scribes are not consistent in their use of abbreviations. One can often notice a tendency at abbreviating more when coming nearer to the right margin of the page. This results in different word statistics when considering word positions on a line. Moreover, the same word can appear in unabbreviated and variously abbreviated forms, appearing as different words to someone who cannot read the text.
  • spacing – I have previously discussed this aspect You are not allowed to view links. Register or Login to view.. The scribe can arbitrarily and inconsistently omit the spaces between words. This also produces pseudo-words which actually are the concatenation of two or more actual words.
Of course, the three can combine: two abbreviate words are joined and split on two lines in the middle of one of the two original words.
The first of the attached two lines (from Add MS 17738 f23v) ends with “ades” the second starts with “cendetib9.” If you are dealing with a language you don't know, you will not match the two words with “a descendentibus” ending up with two new entries in your list of “hapax legomena.”

I am sure there are other phenomena typical of manuscript texts, I am not a paleographer. The unavailability of “line-by-line”, “character-by-character” transcriptions makes it difficult to compute statistics on what happens in other manuscripts. So we are left with unsatisfactory comparisons between the EVA transcription and printed text. We should be aware of the limitations of the data we are considering, taking the results of the comparison with printed text as provisional and not 100% reliable.

About the “Autocopying hypothesis,” I see it as a special case of the “meaningless gibberish hypothesis”. Personally, being fond of medieval parallels, I find this hypothesis both uninteresting and anachronistic, but I think it is “not impossible” from a rational point of view. In my opinion, the only way to dismiss this idea is the convincing production of a meaningful reading of the content of the manuscript. Until we can read the manuscript, the “meaningless gibberish hypothesis” cannot be entirely dismissed.


RE: The location of <aiin> and <ain> groups - Torsten - 12-02-2017

(11-02-2017, 05:54 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.It's a good observation to be starting from. The implication is that at least some of the single characters before aiin at the start of a line are in some way nulls.

For me, the interesting question is whether the proportion of saiin to (non-s)aiin at the line start is the same as the proportion of aiin to (single-letter)aiin. If it is , it would give support to the suggestion that line-initial s- is a null.

I didn't know if I understand you right. But maybe you mean a calculation like this:

line initial: saiin ( 59) / daiin (158) = 0.37 
in line     : aiin  (469) / daiin (705) = 0.66

[font=Courier New][font=Courier New]total       : s[/font][font=Courier New][font=Courier New]aiin ([font=Courier New]144) / daiin (863) = 0.17[/font][/font][/font][/font]
total       : [font=Courier New]aiin  (469) / [font=Courier New]daiin (863) = 0.54[/font]
[/font]

total       line initial

aiin  (469) aiin  (  0)
ain   ( 89) ain   (  0)

[font=Courier New]total       line initial[/font]
saiin (144) saiin ( 59) 
sain  ( 68) sain  ( 36)

[font=Courier New]total       line initial[/font]
daiin (863) daiin (158)
dain  (211) dain  ( 49)


RE: The location of <aiin> and <ain> groups - Anton - 12-02-2017

Marco:

A poem is a special case where the text is more or less structured by lines. Here in the VMS we can't assume apriori that the underlay plain text is structured by lines, there is no evidence to state that. If there were a one-two-one correspondence between lines and sentences, then, of course, the case would be trivial. But one cannot say whether this is or is not the case.

But the explanation might be quite simple if there are no vords beginning with aiin and ain, except for these two vords themselves. Unfortunately, I don't know if they are there or they are not, and nobody could advise.


RE: The location of <aiin> and <ain> groups - Torsten - 12-02-2017

(12-02-2017, 01:58 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.About the “Autocopying hypothesis,” I see it as a special case of the “meaningless gibberish hypothesis”. Personally, being fond of medieval parallels, I find this hypothesis both uninteresting and anachronistic, but I think it is “not impossible” from a rational point of view. In my opinion, the only way to dismiss this idea is the convincing production of a meaningful reading of the content of the manuscript. Until we can read the manuscript, the “meaningless gibberish hypothesis” cannot be entirely dismissed.

It is not required that the VMS is meaningless for the autocopy hypothesis. In fact even for the autocopy hypothesis it is possible that the VMS contains meaning. For instance with the You are not allowed to view links. Register or Login to view. it would be possible to use autocopied words to transport information. For such a cipher it is for instance thinkable that a word could stand for a plain text letter. It is, for instance, conceivable that it was only necessary to count the number of e or i strokes to encode a VMS word.

Let us assume that the VMS contains meaning. In this case we didn't know the language and the script used in the VMS. Therefore we can't be sure if a VMS word stands for a word, a syllable or a letter. Let us therefore check what happens if we assume that a VMS word stands for a plaintext word. In this case we should find repeated phrases like "For instance" or "In this case within the VMS. But they are missing. Since phrases are typical for language this is a problem (see You are not allowed to view links. Register or Login to view.).
A second problem is that a dictionary of VMS words would contain 8026 similar words like qokeedy, qokedy, qokeey, qokey, qokeody, okeodyokeokeokeody etc. Therefore it is not possible to detect an error of the scribe or a misidentified letter from its context. Moreover since all words are more or less similar we didn't know if similar words have similar meaning or not.
Another problem is that the VMS contains monotonous sequences such as:
<f108v.P.39>     qokeedy qokeedy qokeedy qotey qokeey qokeey otedy
If similar words share the same meaning, such sequences would only repeat the same information multiple times.

If we now assume that a VMS word stand for a syllable or letter not much is won. In this case we would expect that a word used multiple times would lead to repeated word sequences in the VMS. But as we know repeated sequences are missing. We could solve this problem if we assume that a plaintext word, which occurred multiple times was encoded differently each time. 

Johannes Trithemius described such a method in his book Polygraphiae in 1508. (see You are not allowed to view links. Register or Login to view.). Trithemius method uses a code table to assign multiple words to each letter (see You are not allowed to view links. Register or Login to view.). The use of such a method would explain the occurrence of similarly spelled words. On the other hand, 8026 is a large number of different words for such a code table. Even if we assume that a VMS word stands for a plain text syllable we would get approximate 100 different encoding variants for each syllable. To search the right cipher word within a code book with at least 8026 code words is far from being easy.

Another conceivable hypothesis is that the paragraph or line initial letters are used as markers for a change in the encoding procedure. In such a case, one and the same word would stand for a different meaning if the paragraph or line were marked by a different initial letter. It is unreasonable to assume that such an ambiguous encoding method can be used without making any errors. Moreover it seems that we cant expect such a complex method for the 15th century.

If we assume that the text of the VMS contains meaning we will end with a complex encoding method. Therefore we should expect numerous encoding errors. Even under the assumption that an already encoded text was copied from another source, numerous copy errors would be expected. But places where a letter or word was deleted are missed for the VMS. Moreover the fact that the end of the text lines nearly always fit into the available space indicate that the text was generated during writing. Therefore the scribe was probably using a rather simple method. 

Did you really think that a method in which You are not allowed to view links. Register or Login to view. was used to add meaning to autocopied words is less believable then the method described by Trithemius or an unknown encoding method in which one and the same word would stand for a different meaning all the time?


RE: The location of <aiin> and <ain> groups - Oocephalus - 12-02-2017

Anton: yes, there are a few vords beginning with ain or aiin. According to Takahashi's transcription, they are (with number of occurrences): 
aiiny (4)
aiinog (1)
aiinal (1)
aiinod (1)
ainy (2)
ainarals (1)
ainam (1)
aindar (1)
ainaly (1)
Most of these occur in the circular diagrams or line-finally.


RE: The location of <aiin> and <ain> groups - KnoxMix - 12-02-2017

@stellar message #13

My answer would have no significance unless you know why I give it. If you compare the graphs you can make your own determination.