david > 04-10-2016, 09:19 PM
Quote:Summary: Marco found that almost 70% of all labels matched words in the main corpus. The rest were unique.
Quote:My research shows visually that the labels, as defined,Stolfi notes [You are not allowed to view links. Register or Login to view.] when attempting to create a "grammar" for Voynichese that (italics mine):
follow the same rules for the letters in the remainder of the text that are not labels, with some exceptions:
You can check by You are not allowed to view links. Register or Login to view. "CAB NST" & "CAB labels only".
- 'a' occurs proportionally more in the "label text"
- the 'q' (only posA) occurs much lesser in the "label text"
- the 'h' occurs much lesser in the "label text"
- the 't' on posB is higher in the "label text"
Quote:It should be noted that that normal words [in his attempt to create a grammar] account for over 88% of all label tokens, and over 96.5% of all the tokens (word instances) in the text. The exceptions (less than 4 every 100 text words) can be ascribed to several causes, including physical "noise" and transcription errors. (Different people transcribing the same page often disagree on their reading, with roughly that same frequency.). Indeed, most "abnormal" words are still quite similar to normal words, as discussed in a You are not allowed to view links. Register or Login to view..
[..]
The words that do not fit into our paradigm [..] These words comprise 1295 tokens (3.7%) in the main text, and 127 tokens (12.4%) in the labels. The vast majority are rare words that occur only once in the whole manuscript.
Anton > 04-10-2016, 10:21 PM
sidanno > 04-10-2016, 10:52 PM
-JKP- > 04-10-2016, 11:16 PM
ThomasCoon > 05-10-2016, 02:28 PM
Sam G > 05-10-2016, 03:13 PM
(04-10-2016, 09:19 PM)david Wrote: You are not allowed to view links. Register or Login to view.A morpheme is the smallest grammatical unit in a language.
Morphemes in the corpus are easily identifiable. Voynichese glyph combinations are very positional aware within vords – glyph groups are non-trivial in their internal positioning. We can identify, and have identified, a long list of suffixes and prefixes within Voynichese. We know that certain glyphs only appear as suffixes; we know that certain glyphs only appear as prefixes; and we know that other glyphs are free form. We have also identified (via the CLS theorem) that glyphs appear in a certain pattern.
We assume these are bound morphemes because they obey certain rules of positioning. (We can make no assumptions about words that do not include such bound morphemes as we are unable to identify a meaning for such unbound morphemes, but such vords are relatively few in nature).
Koen G > 05-10-2016, 03:21 PM
ThomasCoon > 05-10-2016, 05:03 PM
Sam G Wrote:I don't think it's so obvious what's a morpheme and what isn't. For instance, in English, "faster" can be broken into "fast" and "er", "singer" can be broken into "sing" and "er", but "lumber" can't be broken into "lumb" and "er" - it's just one morpheme, "lumber". That words in the VMS can be divided into subunits that recur in many different words does not necessarily mean that these subunits constitute affixes or morphemes in a grammatical sense (although I suspect that they do in many cases).
Anton > 05-10-2016, 08:39 PM
david > 06-10-2016, 08:59 AM
Quote:For instance, in English, "faster" can be broken into "fast" and "er", "singer" can be broken into "sing" and "er", but "lumber" can't be broken into "lumb" and "er" - it's just one morpheme, "lumber".
Quote:I agree with Sam, that's a good point.
A term is required here which does not have linguistic flavour.
Quote: Basically we get 30% unique vocabulary and more than three times the amount of grammar that doesn't match.I'm going to copy and paste the comments of Prof Stolfi here:
Quote:Abnormal words
The words that do not fit into our paradigm are collected in the gramamr under the symbol You are not allowed to view links. Register or Login to view.. These words comprise 1295 tokens (3.7%) in the main text, and 127 tokens (12.4%) in the labels. The vast majority are rare words that occur only once in the whole manuscript. They were manually sorted into a few major classes, according to their main "defect" as we perceived it:It is quite possible that, when the VMS is deciphered, we will discover that some of these abnormal words are in fact quite "normal". Indeed, although most "abnormal" words occur only once, some classes of abnormal words may be sufficiently frequent and well defined to deserve recognition in the grammar. One such candidate, for example, is You are not allowed to view links. Register or Login to view., the set of words that have A.IN groups in non-final position.
- You are not allowed to view links. Register or Login to view.: words that do not have a properly nested layer structure, and seem to be two more normal words joined together (716 tokens, 55% of the abnormal words). These can be subdivided into:
- You are not allowed to view links. Register or Login to view.: words with two or more gallows (208 tokens). The most common is oteotey (3 occurrences).
- You are not allowed to view links. Register or Login to view.: words with crust letters surrounded by core or mantle letters (278 tokens). The most common are chodchy and cholky (4 occurrences each)
- You are not allowed to view links. Register or Login to view.: words which contain the A.IN groups in non-final position (206 tokens). The most common are daiidy and dairal (5 occurrences each).
- You are not allowed to view links. Register or Login to view.: abnormal words which contain the y letter in non-final, non-initial position; or the letter q in non-initial position (24 tokens). The most common is oykeey (2 occurrences).
- You are not allowed to view links. Register or Login to view.: this class was defined by John Grove, who noticed that the rare words often found at the beginning of lines, such as polchedy, could be interpreted as normal words prefixed with a spurious gallows letter. Of the abnormal tokens in the text, 213 (16%) fit this description.
- You are not allowed to view links. Register or Login to view.: the remaining 366 abnormal tokens (28%) are not easily interpreted as joined words or Grove's gallows-prefixed words. We have sorted them into:
- You are not allowed to view links. Register or Login to view.: words that have one of the letters m or g not preceded by a circle (57 tokens). Apart from the letter m by itself (13 occurrences), the most common is dm (4 occurrences).
- You are not allowed to view links. Register or Login to view.: words that contain letter i in any context other than an IN group (68 tokens). The most common is dairin (2 occurrences).
- You are not allowed to view links. Register or Login to view.: abnormal words that contain isolated e after an s (28 tokens). The most common is shese (3 tokens).
- You are not allowed to view links. Register or Login to view.: abnormal words that did not seem to fit in any of the above categories (213 tokens). Apart from isolated letters like v (7 tokens) and c (4 tokens) --- mainly in the circular text on page You are not allowed to view links. Register or Login to view. --- the most common are da (6 tokens), ackhy, sa, and sha (3 tokens each). Note that the latter are probably the result of misreading y as a in otherwise normal (and common) words.
Conversely, the grammar is probably too permissive in many points, so that many words that it classifies as normal are in fact errors or non-word constructs. See the section about circle letters, for example. For instance, there must be many apparently "normal" tokens which are in fact "Grove words". These could result from prepending a spurious gallows letter to a crust-only normal word (e.g. p + olarar = polarar), or prepending a spurious non-gallows letter to a suitable normal word (e.g. d + chey = dchey). Indeed, it is quite possible that most ot of the normal-looking line-initial words are in fact such "crypto-Grove" words.