The Voynich Ninja

Full Version: From "decryption" to "translation"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
For a rough cursory check, without any postprocessing, here are twenty most frequent Gvords, together with their supposed "forms operated upon".

One can see that in all cases, all vord forms are occurring: o-, qo- and y-.

o-Gvords are generally occurring significantly more often than their respective Gvords, but that may be partly attributed to their being used as labels. Unfortunately, I can see no quick way to exclude labels from these stats, leaving calculations over narrative parts only. But qo-Gvords are very rare in labels, however they also generally occur much more often than Gvords. On the contrary, y-Gvords occur less ferquently than Gvords.

It is worth noting that the total of occurrences across this table (that is, for the supposed twenty nouns) is 5440, which would stand for 14% of all corpus (according to voynich.nu which in turn refers to Reddy & Knight).

[attachment=2729]
(19-03-2019, 02:07 PM)Anton Wrote: You are not allowed to view links. Register or Login to view....

So Gvords are very, very popular to start a sentence, for whatever reason.

Next we check if paragraph-initial vords exhibit high degree of uniqueness. It turns out that (in the said folio range) 51,9% of all paragraph-initial vords are unique. We must be careful about possible inflexions, which may make these vords "not-that-unique", but generally this looks like what is a high figure. It would be less common, I think, to use unique verbs or unique words of other parts of speech for beginning of your sentence, so it is reasonable to suppose that paragraph-initial vords are, in their majority, nouns. Since they are mostly Gvords, then that makes gallows a noun-marker, be that explicit (like an article) or implicit (like a reference to a particular nomenclator), does not matter.

Now we move on to labels, the discussion of which in this thread stimulated this idea. We discussed that labels - at least some of them - do not look like plain designators. Instead they look like referrers. But very many labels start with "o", then immediately followed by gallows. Considering that a Gvord is a noun and the label is a referrer, this makes the "o" prefix a referral operator, something like "to", "related to" or "for", "intended for", "appropriate for". With this approach, the notorious "otol" is no more "otol", but instead it is "o tol" - that is, "related to tol". Notice that I don't touch the question of whether "o" is "to" or "for" in English, or "zu" or "fur" in German, I simply suggest a relational operator.
..

Hi Anton,
what you are doing here seems promising to me. This is something that can really produce new insights!

I would like to point out Emma May Smith's You are not allowed to view links. Register or Login to view..
Grove words are paragraph-initial Gvords. These words appear to be the result of adding an initial gallows to an ordinary word. Similar observations were made by You are not allowed to view links. Register or Login to view. (and John Grove himself I think).

In particular Emma observes that:
  • "the structure of some [Grove] words, such as <fchoctheody, kchdaldy, pcholky, tchokedy> suggest that the characters immediately after the gallows are part of a ‘Fore’ section (as discussed in You are not allowed to view links. Register or Login to view.) as so should not have anything in front of them. The same goes for a small number of Grove Words where the second character is <y>, which usually does not come in the middle of a word."
  • more Grove words match ordinary words by removing the initial gallows, rather than adding the o- prefix. [i.e Grove word gX is likely related with word X rather than ogX]

Another point that could support Emma's analysis (I am not sure it is mentioned in her post) is that Grove words averagely are longer than ordinary Gvords. I think others have observed this, but I cannot find a reference at the  moment. I suggest independently checking the correctness of this statement.
(20-03-2019, 09:05 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view....

Grove words are paragraph-initial Gvords. These words appear to be the result of adding an initial gallows to an ordinary word. Similar observations were made by You are not allowed to view links. Register or Login to view. (and John Grove himself I think).

...

That's how I see them... in most cases, they are ordinary vords with a gallows prepended (that's why my mind keeps wanting to interpret them as pilcrows, but I try not to assume that they are).

Even those that appear to be unique vords after the gallows has been removed are usually common vords if you break them in two.
(19-03-2019, 09:56 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Unfortunately, I can see no quick way to exclude labels from these stats, leaving calculations over narrative parts only.

This is actually very easy to do, with any of the transcription files at the  You are not allowed to view links. Register or Login to view. .
in combination with the IVTT tool You are not allowed to view links. Register or Login to view. .

The not-so-easy part of this is that you would have to install the tool (download and compile),
and read and understand the manual.

It would be very easy for me to send you (one-off) two texts files, one with all the regular text and one with all the labels.
Just let me know.
Quote:in most cases, they are ordinary vords with a gallows prepended (that's why my mind keeps wanting to interpret them as pilcrows, but I try not to assume that they are).

Even those that appear to be unique vords after the gallows has been removed are usually common vords if you break them in two.

Yes, and I'm afraid that moving in this direction we may come to what those famous WW2 cryptographers thought it to be - a primitive form of synthetic language (did they say "very primitive"?)


Let's suppose that gallows are some modifiers. That they are potential modifiers follows from the fact that gallows coverage is there. Although coverage of p and f can be theoretically written off to embellishment (although certain occurrences speak against that), coverage of t is just plainly there and cannot be written away. But let's go further and suppose that all gallows are some modifiers. If all Gvords are nouns (which, by the way, does not yet preclude other types of vords from being nouns), then there are two possibilities. Either gallows make nouns from nouns, or they make nouns from some other parts of speech.

In the first case, the most evident assumption is that gallows make plural of singlular, like articles in some languages do. However, while it is perfectly fine to start most of your sentences with nouns, starting all of those with nouns in plural would be quite a cornercase, I'm afraid. Supposing the opposite, namely that gallows make singular of plural, this looks excessive. However inefficient the script is (in terms of information per chacracter), I don't think it likely for a script inventor to take plural as basis, instead of singular. I don't know if there are natural languages which use plural as the base noun form and thus could suggest such an idea.

Making nouns from other parts of speech, such as verbs, would mean a strictly limited range of those. Very many nouns just cannot be made of other parts of speech. Upon consideration, though, many other parts of speech can be made of nouns. For example, "house" or "hand" are both nouns and verbs, "silver" and "gold" are both nouns and adjectives. But noun is a base element in all cases, while verbs and adjectives are derivatives. Again, maybe any natural language has it upside down, which would suggest this inverted logic for this invented script?

Actually, referring again to my parallel with extraterrestrials, the case with Voynich should be simpler, because the inventor of the script was human, who was thinking in a natural language, and thus logic and grammar of a natural language would suggest, if not dictate, rules of the Voynichese script.

Quote:This is actually very easy to do, with any of the transcription files at the  You are not allowed to view links. Register or Login to view. .
in combination with the IVTT tool You are not allowed to view links. Register or Login to view. .

The not-so-easy part of this is that you would have to install the tool (download and compile),
and read and understand the manual.

It would be very easy for me to send you (one-off) two texts files, one with all the regular text and one with all the labels.
Just let me know.

Thanks Rene, well, yes, I just should get comfortable with the IVTT tool. There should be no problem for me to compile it, but yes, I'll need to dig through the documentation. I guess it could help to automate many extractions. Let me try, and if I fail, I'll ask you for the files. Smile
(20-03-2019, 02:29 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:in most cases, they are ordinary vords with a gallows prepended (that's why my mind keeps wanting to interpret them as pilcrows, but I try not to assume that they are).

Even those that appear to be unique vords after the gallows has been removed are usually common vords if you break them in two.

Yes, and I'm afraid that moving in this direction we may come to what those famous WW2 cryptographers thought it to be - a primitive form of synthetic language (did they say "very primitive"?)

Yes, but you have to be careful about moving *too* far in this direction. I think Rene's first exercise in the "Voynich text generation" thread was very instructive in this regard, even though none of us have been able to decipher it yet. The point is, in attempting to compose his text according to the kinds of typical Voynich patterns that you are discussing here, he actually reduced the conditional entropy of his resulting text far *too* much, down to only about 1.2, whereas the Voynich MS itself in Cuva transcription actually has a conditional entropy of about 2.1. (Typical Latin/Italian/German are slightly over 3, and several examples of natural languages are in the 2.4 to 2.6 range.)
(20-03-2019, 06:48 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view....
I think Rene's first exercise in the "Voynich text generation" thread was very instructive in this regard, even though none of us have been able to decipher it yet. The point is, in attempting to compose his text according to the kinds of typical Voynich patterns that you are discussing here, he actually reduced the conditional entropy of his resulting text far *too* much, down to only about 1.2, whereas the Voynich MS itself in Cuva transcription actually has a conditional entropy of about 2.1. (Typical Latin/Italian/German are slightly over 3, and several examples of natural languages are in the 2.4 to 2.6 range.)

I haven't even had time to LOOK at it yet, other than a first glance that told me it wasn't legal Voynichese but it was close.
It wasn't really meant as a riddle. Also, the second example is 'closer' in a way, since the inverse of the operation also allows compression of real Voynichese.

However, this is *not* how the real MS text was generated. At best it could give a hint into the direction.

The reduction in entropy with respect to the Italian text is the result of a verbose substitution.
The word patterns arise by chopping up the original text in syllables (sort of), in combination with the verbose substitution. A syllable in this case is defined by :
- a consonant cluster followed by a vowel cluster
I chose Italian because very many words end in a vowel, so the above definition of syllables largely (but not completely) keeps word boundaries intact.

The conditional entropy of the compressed Voynich text goes up to a staggering 3.1
The problem is with the single character entropy, which is 'only' 3.7.
It is not just the value that is still too low. It is the frequency distribution as a whole that is wrong, and this is not easy to fix.
(20-03-2019, 02:29 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Let's suppose that gallows are some modifiers. That they are potential modifiers follows from the fact that gallows coverage is there. Although coverage of p and f can be theoretically written off to embellishment (although certain occurrences speak against that), coverage of t is just plainly there and cannot be written away. But let's go further and suppose that all gallows are some modifiers. If all Gvords are nouns (which, by the way, does not yet preclude other types of vords from being nouns), then there are two possibilities. Either gallows make nouns from nouns, or they make nouns from some other parts of speech.

In my opinion, the evidence discussed by Stolfi, Emma (and certainly others) suggests that Grove words are different from other gVords. In particular, the gallows in (most) Grove words appear to be attached as a prefix to an ordinary word. In this case, the gallows could be modifiers and some candidate meanings could be:
  • an analogous to uppercase-markers
  • something like the Latin You are not allowed to view links. Register or Login to view., which in some texts appears at the beginning of most chapters
  • pilcrows (not exactly modifiers, but paragraph markers)
  • ....

I think that pilcrows are unlikely, since the gallows appear to be homogeneous with the text: the function of pilcrows (often written in red ink) was to highlight paragraph starts, and in the VMS paragraphs are clearly separated by the indentation of the last line. The gallows do not seem to me to really stand out.

In other gVords (i.e. most gVords), the gallows might very well not be modifiers, but be part of the root of the word (which can take the optional o- / qo- modifiers).

An interesting task could be the definition of a procedure separating Grove words from other gVords: I mean something solely based on word morphology, ignoring the position of a word in a paragraph. I wonder how much accurate such a procedure would be...
Pilcrows and capitula were frequently in line with the main text, not just at the beginnings of paragraphs.


Also, a pilcrow sometimes looks like a letter (especially in the middle ages). The Capitula symbol was a "C". Sometimes it had a vertical stroke through it, but not always. Sometimes the only way you could distinguish it from the letter C was because it was in a different color. In other words, the C shape could be both a pilcrow and a letter. Some pilcrows look like t without the loops.

I've posted two blogs with examples:

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.


In the second blog, there is a pic that shows how pilcrows sometimes stretch over more than one letter, just as some of the VMS gallows glyphs sometimes stretch over more than one glyph.


Here's an example of pilcrows that are in line with the main text:

[Image: PilcrowsInline.png]
Pages: 1 2 3 4 5 6 7