The Voynich Ninja

Pages: 1 2

Statement

The predictability of glyph placement within label vords is in concordance with that of vords in the main corpus

Explanation

Some vords appear as "labels", single or double vords apparently identifying images within the manuscript. These labels have the same grammar as those vords in the main body of the corpus.

The text of the manuscript is divided up into clearly defined word-like glyph groups (dubbed vords on this forum). These glyph groups have a non-trivial internal structure which is manifest in the severe restrictions imposed upon the positioning of glyphs within the word groups.
In other words, Voynichese has a very strict phototactic structure – morphemes appear in predefined places within vords, and only there.

A morpheme is the smallest grammatical unit in a language.
Morphemes in the corpus are easily identifiable. Voynichese glyph combinations are very positional aware within vords – glyph groups are non-trivial in their internal positioning. We can identify, and have identified, a long list of suffixes and prefixes within Voynichese. We know that certain glyphs only appear as suffixes; we know that certain glyphs only appear as prefixes; and we know that other glyphs are free form. We have also identified (via the CLS theorem) that glyphs appear in a certain pattern.
We assume these are bound morphemes because they obey certain rules of positioning. (We can make no assumptions about words that do not include such bound morphemes as we are unable to identify a meaning for such unbound morphemes, but such vords are relatively few in nature).

And analysis of the labels (see links below) show that the corpus of labels has a notable level of concordance with the morpheme placement of vords in the main corpus.

Further reading

You are not allowed to view links. Register or Login to view. MarcoP on the Voynich.Ninja.

Quote:Summary: Marco found that almost 70% of all labels matched words in the main corpus. The rest were unique.

VMS language DNA variations. Davidsch

Quote:My research shows visually that the labels, as defined,
follow the same rules for the letters in the remainder of the text that are not labels, with some exceptions:

'a' occurs proportionally more in the "label text"

the 'q' (only posA) occurs much lesser in the "label text"

the 'h' occurs much lesser in the "label text"

the 't' on posB is higher in the "label text"

You can check by You are not allowed to view links. Register or Login to view. "CAB NST" & "CAB labels only".

You are not allowed to view links. Register or Login to view.. Prof. Stolfi

Stolfi notes [You are not allowed to view links. Register or Login to view.] when attempting to create a "grammar" for Voynichese that (italics mine):

Quote:It should be noted that that normal words [in his attempt to create a grammar] account for over 88% of all label tokens, and over 96.5% of all the tokens (word instances) in the text. The exceptions (less than 4 every 100 text words) can be ascribed to several causes, including physical "noise" and transcription errors. (Different people transcribing the same page often disagree on their reading, with roughly that same frequency.). Indeed, most "abnormal" words are still quite similar to normal words, as discussed in a You are not allowed to view links. Register or Login to view..
[..]
The words that do not fit into our paradigm [..] These words comprise 1295 tokens (3.7%) in the main text, and 127 tokens (12.4%) in the labels. The vast majority are rare words that occur only once in the whole manuscript.

TheYou are not allowed to view links. Register or Login to view. by Brian Cham and David Jackson describes how Voynich glyphs can be divided into three categories that interact with one another in a pre-defined manner.

Notes

Statement changed from "The morpheme construction of labels is in concordance with that of the main corpus"
Added Davidsch to further reading

Would not "orthography" or "morphology" be a better term than "grammar"?

POSCIT PETITAE
[font=helvetica, arial, sans-serif]pag 151
POSCIT PETITAE
[/font]
poscit = I beg, I demand, I request, I desire. = implor, solicit, solicit, eu doresc
[font=helvetica, arial, sans-serif]= Latin-Verb-third-person singular present active indicative of poscō
poscō or pōscō ‎(present infinitive poscere or pōscere, perfect active poposcī or popōscī); third conjugation, no passive
petitae = sought = căutat
= Latin-Participle-nominative feminine plural of petītus
petītus m ‎(feminine petīta, neuter petītum); first/second declension[/font]
[font=helvetica, arial, sans-serif]You are not allowed to view links. Register or Login to view.[/font]

Another poll I can't vote on. I find the word "grammar" a little too specific.

As far as I can tell so far, the labels are different in some ways and the same in others.

It's like a Venn diagram with about 60 to 70% of overlap if you assess several factors together (glyph combinations, distribution, length, level of repetition, and relationship to other sections in the manuscript).

[Edit - delete]

(04-10-2016, 09:19 PM)david Wrote: You are not allowed to view links. Register or Login to view.A morpheme is the smallest grammatical unit in a language.
Morphemes in the corpus are easily identifiable. Voynichese glyph combinations are very positional aware within vords – glyph groups are non-trivial in their internal positioning. We can identify, and have identified, a long list of suffixes and prefixes within Voynichese. We know that certain glyphs only appear as suffixes; we know that certain glyphs only appear as prefixes; and we know that other glyphs are free form. We have also identified (via the CLS theorem) that glyphs appear in a certain pattern.
We assume these are bound morphemes because they obey certain rules of positioning. (We can make no assumptions about words that do not include such bound morphemes as we are unable to identify a meaning for such unbound morphemes, but such vords are relatively few in nature).

I don't think it's so obvious what's a morpheme and what isn't. For instance, in English, "faster" can be broken into "fast" and "er", "singer" can be broken into "sing" and "er", but "lumber" can't be broken into "lumb" and "er" - it's just one morpheme, "lumber". That words in the VMS can be divided into subunits that recur in many different words does not necessarily mean that these subunits constitute affixes or morphemes in a grammatical sense (although I suspect that they do in many cases).

Also, if there's no problem with terms like morpheme, grammar, prefix, suffix, phonotactic structure, etc. - then is it really necessary to speak of "vords" instead of simply words?

This is a hard one, David. It seems to me like both studies indicate that labels are not very different from the main text, but still a bit different.

Basically we get 30% unique vocabulary and more than three times the amount of grammar that doesn't match.

Sam G Wrote:I don't think it's so obvious what's a morpheme and what isn't. For instance, in English, "faster" can be broken into "fast" and "er", "singer" can be broken into "sing" and "er", but "lumber" can't be broken into "lumb" and "er" - it's just one morpheme, "lumber". That words in the VMS can be divided into subunits that recur in many different words does not necessarily mean that these subunits constitute affixes or morphemes in a grammatical sense (although I suspect that they do in many cases).

I agree with Sam G - this was exactly my thought also. As David pointed out, "morpheme" means "unit that bears meaning" - but since we don't know that these bigrams / ngrams bear grammatical meaning (there are alternatives as Sam pointed out), we could easily be wrong saying that they are morphemes.

I agree with Sam, that's a good point.

A term is required here which does not have linguistic flavour.

Quote:For instance, in English, "faster" can be broken into "fast" and "er", "singer" can be broken into "sing" and "er", but "lumber" can't be broken into "lumb" and "er" - it's just one morpheme, "lumber".

This is true if you have a knowledge of English. However, if you don't have an understanding of the meaning of the words but are simply looking for patterns to clarify the rules of the language (imagine an alien trying to dechiper English, or JKP trying to understand Voynichese Smile

) then the -er suffix becomes a morpheme that fits in nicely with your list of rules (in the same way that we say 4o- tends to be a prefix in Voynichese). Because it's a common suffix that appears all over the place in clearly definable rules (it's a suffix that is the regular formative of agent nouns, or designator of nouns from occupation or characteristic).

But to take another example, a bigram such as "ez" isn't a morpheme for our alien because it's usually an n-gram pattern such as in sneeze, breeze, jeeze, tweeze, freeze, etc (unless he postulates that the 4gram "eeze" is a morpheme suffix!).

Quote:I agree with Sam, that's a good point.

A term is required here which does not have linguistic flavour.

I couldn't think of one, any suggestions? ngram and its derivatives could be an alternative, but fail to convey the meaning of letters with a potential sense meaning.

Quote: Basically we get 30% unique vocabulary and more than three times the amount of grammar that doesn't match.

I'm going to copy and paste the comments of Prof Stolfi here:

Quote:Abnormal words
The words that do not fit into our paradigm are collected in the gramamr under the symbol You are not allowed to view links. Register or Login to view.. These words comprise 1295 tokens (3.7%) in the main text, and 127 tokens (12.4%) in the labels. The vast majority are rare words that occur only once in the whole manuscript. They were manually sorted into a few major classes, according to their main "defect" as we perceived it:
You are not allowed to view links. Register or Login to view.: words that do not have a properly nested layer structure, and seem to be two more normal words joined together (716 tokens, 55% of the abnormal words). These can be subdivided into:

You are not allowed to view links. Register or Login to view.: words with two or more gallows (208 tokens). The most common is oteotey (3 occurrences).

You are not allowed to view links. Register or Login to view.: words with crust letters surrounded by core or mantle letters (278 tokens). The most common are chodchy and cholky (4 occurrences each)

You are not allowed to view links. Register or Login to view.: words which contain the A.IN groups in non-final position (206 tokens). The most common are daiidy and dairal (5 occurrences each).

You are not allowed to view links. Register or Login to view.: abnormal words which contain the y letter in non-final, non-initial position; or the letter q in non-initial position (24 tokens). The most common is oykeey (2 occurrences).

You are not allowed to view links. Register or Login to view.: this class was defined by John Grove, who noticed that the rare words often found at the beginning of lines, such as polchedy, could be interpreted as normal words prefixed with a spurious gallows letter. Of the abnormal tokens in the text, 213 (16%) fit this description.

You are not allowed to view links. Register or Login to view.: the remaining 366 abnormal tokens (28%) are not easily interpreted as joined words or Grove's gallows-prefixed words. We have sorted them into:
You are not allowed to view links. Register or Login to view.: words that have one of the letters m or g not preceded by a circle (57 tokens). Apart from the letter m by itself (13 occurrences), the most common is dm (4 occurrences).

You are not allowed to view links. Register or Login to view.: words that contain letter i in any context other than an IN group (68 tokens). The most common is dairin (2 occurrences).

You are not allowed to view links. Register or Login to view.: abnormal words that contain isolated e after an s (28 tokens). The most common is shese (3 tokens).

You are not allowed to view links. Register or Login to view.: abnormal words that did not seem to fit in any of the above categories (213 tokens). Apart from isolated letters like v (7 tokens) and c (4 tokens) --- mainly in the circular text on page You are not allowed to view links. Register or Login to view. --- the most common are da (6 tokens), ackhy, sa, and sha (3 tokens each). Note that the latter are probably the result of misreading y as a in otherwise normal (and common) words.

It is quite possible that, when the VMS is deciphered, we will discover that some of these abnormal words are in fact quite "normal". Indeed, although most "abnormal" words occur only once, some classes of abnormal words may be sufficiently frequent and well defined to deserve recognition in the grammar. One such candidate, for example, is You are not allowed to view links. Register or Login to view., the set of words that have A.IN groups in non-final position.
Conversely, the grammar is probably too permissive in many points, so that many words that it classifies as normal are in fact errors or non-word constructs. See the section about circle letters, for example. For instance, there must be many apparently "normal" tokens which are in fact "Grove words". These could result from prepending a spurious gallows letter to a crust-only normal word (e.g. p + olarar = polarar), or prepending a spurious non-gallows letter to a suitable normal word (e.g. d + chey = dchey). Indeed, it is quite possible that most ot of the normal-looking line-initial words are in fact such "crypto-Grove" words.

Pages: 1 2

david

Anton

sidanno

-JKP-

ThomasCoon

Sam G

Koen G

ThomasCoon

Anton

david