The Voynich Ninja

Full Version: labels as words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
(23-08-2017, 09:40 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Or does it just mean that initial o favors nouns? Or even 'names'?

For a long time I've been wondering if some of the "o" shapes are modifiers or markers. If they were, and IF the rest of the token is a word (in the linguistic sense) then, taking spaces as literal, the VMS vords become even shorter.


If you look at a vord like okor, which is both a label and a vord, with initial "o", one sees some interesting characteristics.
  • It occupies the "south" section of the shape that resembles the T-O map on the rosettes folio.
  • It occurs twice in rosette 1, and at the beginning of the rosette 5 ring.
  • It is in the spokes of a "star" folio (68r2).
  • It is on 76v as the second vord.
  • It is in almost a dozen of the big-plant folios, usually in the middle of the line.
  • It is in most of the small-plant folios.
  • It is on an early page of the starred-text pages, and on the second-to-last.
  • It keeps company with a wide variety of vords (unlike some vords that are usually preceded or followed by specific kinds of vords).
  • It occurs frequently with prefixes and suffixes (such that the "o" is no longer the initial character). Most of the "prefixes" are short, most of the "suffixes" are 2, 3, or 4 characters.
It does not appear in the zodiac-symbol pages, or the second chunk of cosmo pages.


So this particular vord stands out because
  • it is both a label and a main-text vord.
  • It is short, which has implications for whether the "o" is used as glyph, letter, modifier, or marker.
  • It occupies a couple of prominent slots on the rosettes page.
  • It skips over the zodiacs and appears sparingly in sections other than the plant pages, both as label and vord.
  • It appears a few times in big-plant pages and frequently in small-plant pages, but never more than once per folio.
  • The "o" is retained when "prefixes" are added (I'm reluctant to use the word "prefix", but it's convenient, and I mean it as a positional prefix and not necessarily a linguistic prefix).
  • If you put q in front of it, its behavior changes. In addition to a different set of big-plant pages (three times on 93r), it is on quite a few pool pages, and more of the starred-text pages. As with okor, it skips over the zodiac pages, but it also skips the rosette pages. So qokor is similar to okor in some ways (both lean toward big- and small-plant pages) and different in others (one appears on the rosettes folio, the other on the pool folios).
  • If you remove the initial "o", kor appears mostly on plant pages, f1r, and as the first vord of f58r. It only makes brief appearances elsewhere.
So, on the surface, at least with this specific vord, I'm less inclined to think that the "o" is a modifier/marker (possible, but it behaves less-so here than in other vords) and suspect it may be composed of two chunks ok and or (as mentioned in previous blogs and described in more detail in You are not allowed to view links. Register or Login to view.*). That doesn't necessarily mean they are biglyphs, but they fit at least some of the patterns of biglyphs OR of syllabic languages that include two-character syllables. This pattern is very prevalent in labels.


But then how does one explain kor which can function on its own and appears approximately the same number of times as okor or qokor? If there are biglyphs, it might mean that ok and k represent two different kinds of units (one a biglyph, the other a monoglyph?). This is not uncommon in medieval substitution codes (one-to-many and many-to-one relationships are found together in many of the ciphers documented by Tranchedino) and MIGHT explain a positionally rigid cipher. If you use the same glyphs as both mono- and biglyphs, then you need a space or null or marker OR positional alert in order to distinguish one from the other.




     *By the way, I haven't read Stolfi, Friedman or Tiltman's take on this, so I don't know if there's any overlap between their ideas and mine or if their writings have any relevance to this thread.
Thank you all for your comments!
I am glad you find the graphs useful!

(22-08-2017, 09:08 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Now these are interesting and clear statistics, Marco.
Of course, since these offer for a direct way to compare Voynichese to other languages, I wonder what the stats would look like for a language without common endings in nouns (-us, -a, -um). Do you have an automatic way to count the initial and final letters in a text or does this have to be done manually?

The number crunching is done by an automatic script, but  some manual work is needed to prepare the data and draw the graphs. If a good transcription is available, it's not much work.

A language that has no specific suffixes should generate a flat histogram for the main text. If the labels are the usual Greek-Latin names, the label suffix histogram will have the spikes we have already seen for Latin and Italian: the two histograms should be quite different. If the labels are not borrowed, the two histograms could be identical, but of course the best thing would be to see what happens with an actual case. If you can suggest a transcribed text that you think relevant, I will do my best to produce the graphs.

(22-08-2017, 09:15 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.There is almost certainly going to be a larger element of borrowing among plant names than the text as a whole. (It should be possible to isolate structurally different words as the most likely loanwords.)

However, the lack of [q] in plant names is also seen with star names and zodiac labels. Were all three to have the same source language, that would be fine. Yet it is less likely, if they come from two or more different sources, that they would lack the same characteristics.

I agree, Emma!
There is hope that plant names are partly borrowed (and hence potentially easier to recognize). Yet the single-character graphs do not provide information supporting this idea. It is quite possible that if and when names were borrowed, they were “Voynichized” like in Greek->Latin->Italian.

kyparissos cupressus cipresso
narkissos narcissus narciso
huakinthos jacintus giacinto

But it's also possible that the level of detail of these last histograms is not sufficient.

(23-08-2017, 09:40 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Or does it just mean that initial o favors nouns? Or even 'names'?


As You are not allowed to view links. Register or Login to view. commenting this graph:

"[ot, ok]: the increase in these should be a result of the lower [qo]."
[Image: attachment.php?aid=1591]

There's plenty of evidence that suggests a relationship between qo- and o-. The most obvious one possibly is that, if you remove the starting q from a word that occurs at least twice, 90% of the times you get an o- word that actually appears in the ms.
So, if o- favor nouns, also q- likely does.  

Emma's observation suggests that the increase in o- might be a consequence of the disappearance of q-. For all I know, this might still be related to nouns, but we currently don't have much evidence to support this idea. Emma is currently researching You are not allowed to view links. Register or Login to view.: we can hope to understand more of this subject in the near future Smile

As I wrote above, the single letter diagrams don't provide much detail with such a small alphabet. But the two letters diagrams highlighted some differences which might prove interesting (e.g. the higher -ry -ly frequencies in labels).
Right! I missed Emma's comment but that's certainly the best explanation.
So that means that one of the most pertinent questions remains: what is q? Your stats strongly suggest that compared to labels, it is "added" to words in fluent text. 

In her blog post Emma argues for a sound value for q, in which case the only possibility I see is that it's a sandhi effect. This makes me wonder: can sandhi evoke sounds which lie outside of the language's phoneme inventory? Maybe something like a glottal stop?
I have made a new campaign of histogram stats, trying to find a subset of Voynichese that behaved similarly to Labelese. I only found partial matches, still there are data that I found interesting. I don't a simple solution to discuss, but several minor points that seem hard to explain clearly, but I'll try.


The main features of Labelese when compared with average Voynichese are:
  • the frequency of prefix o- is doubled
  • the frequency of prefix q- is about one third of the normal


Time ago we discussed You are not allowed to view links. Register or Login to view.. Examining the graphs for o- and q-, it is clear that no word ending provides a context that doubles o- (even if -r, -s and -t provide a reduction of q- occurrences close to that observed in labels).
[attachment=1622]


I then turned to subsets identified not by the character ending of the previous word, but by the position of the word inside the text. The results I consider interesting are illustrated in the following complex histogram. I will discuss each subset individually, comparing prefiexes with Labelese (the Orange bars).
I have used Zandbergen's  IVTT ZL transcription instead of Takahashi's, so there are a few very minor differences with the graphs I previously posted.

[attachment=1623]
  • [Green] All text. These are the basic numbers we have already discussed. I have here only considered paragraph text, but the exclusion of labels doesn't have a considerable impact. The general statistics mainly differs from Labelese because it has a much higher qo- frequency and much lower ot- and ok- frequencies.
  • [Dark Blue] First word of the first line of a paragraph (i.e. first word of a paragraph) This is the only subset that gets close to the low qo- frequency in Voynichese. Labels have a 2.5% of qo- prefixes, the first words of paragraphs 3.9%, the whole text 10.3%. But o- occurs quite rarely at the beginning of paragraphs, while there are several occurrences of otherwise rare prefixes like po-, pc-, to-. This is a major difference from Labelese. The other major difference is that ch- and sh- are almost completely absent, while in Labelese they are well represented (even if much rarer than in normal text).
  • [Pink] Last word of the first line of a paragraph. This subset seems uninteresting at first sight, but it turns out it has the best overall correlation with Labelese (with a very small margin). I guess its strength is the close match for op-. This prefix is frequent in the first line of a paragraph, but as the first word of the paragraph p- seems to be preferred. This subset also is the one which gets closer to Labelese ch- stats.
  • [Light Blue] First word of the last line of a paragraph. These stats forced me to include several rare suffixes in the graph, because it has spikes in very uncommon prefixes (dc- yc-). As Emma pointed out above, there are cases in which Voynichese favors prefixes which are typical of line starts (e.g. yt- yk- so-) but other line-start prefixes are uncommon in Labelese.
  • [Brown] Last word of the last line of a paragraph (i.e. last word of a paragraph). This is the subset of Voynichese that gets closer to the high ok- rate of Labelese. Yet its more striking feature is the huge spike for ch-. About 25% of the paragraphs end with a ch- word: look at paragraph endings You are not allowed to view links. Register or Login to view.. Labels have a slight dislike for ch- (less than 10%) so this is a huge difference.

The suffixes histogram is much more uniform. The clearest spikes correspond to the well known line ending -m. The last word of the first line of a paragraph [Pink] has a large -am spike, but it compares well with Labelese because -ly and -ry are common suffixes both at line end and in Labelese. -in also is a good match between last-paragraph-words and Labelese. These good matches compensate the difference due to the -am spike when computing correlation.
[attachment=1624]


Conclusions
  • The most extensive Labelese peculiarity is the high frequency of the o- pefix (in particular, ot- and ok-). This feature isn't paralleled by any language subset I examined (neither subsets following a specific ending, see radar chart, nor text position subsets, see histograms).
  • The prefix anomaly qo- is approximated by the subset lacking a left context (first words of paragraphs). A similar distribution also occurs after some specific endings (-r, -s, -t).
  • The suffix anomalies of Labelese -ly and -ry are approximated by subsets lacking a right context (line endings and paragraph endings). The same subsets also correlate with the da- prefix stats.
  • As observed by Emma, some Labelese common prefixes are also frequent as line starts (yt-,yk-). But there are frequent line-start prefixes that aren't common in labels (e.g. dc- yc-).
  • The relatively high frequency of op- in labels is only matched by first-paragraph-line statistics, but not by the stats for the very first words of paragraphs (since they dislike o- and tend to start directly with p- and other gallows).  
Here is an additional information about qo- (not really new in itself, but possibly new in this form). 

This is a histogram of prefixes (occurrences of words are counted multiple times):
the green bar is the percentage of words staring with that prefix 
the red bar is the percentage of exactly repeating words

About 30% of the exactly repeating words start with qo-  (e.g. qokedy.qokedy, qokain.qokain)
About 15% of all words start with qo-

The stats on unique words and repeating pairs are similar:
20% of unique repeating pairs start with qo-
9% of unique words start with qo-

To summarize, this are some peculiarities of qo- (and the corresponding o- behavior):
  1. very rare in labels (while o- is common)
  2. almost absent in first words of paragraphs (o- is also rare)
  3. common in exactly repeating word pairs (o- is neutral)
This is not directly related with labels, but it's possible that there are reasons that explain why qo- is rare in labels and frequently repeated.
1 and 2 could be explained by q- needing a "left context" (i.e. an immediately preceding word).
3 evidently must have some other explanation.
Pages: 1 2 3 4 5 6