The Voynich Ninja

Full Version: Identifying function words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
All this linguistic talk has made me think of a thought experiment.

For this thread, I am going to postulate the following:
  1. Voynichese can be pronounced
  2. What is more, Voynichese is a created script that was first spoken then written
  3. This is an Indo-European daughter language (an artificial constraint I include simply to reduce the possibilities. If we don't get anywhere, we change this point and start again)
If this is true, then can we take Bax's method and apply it from a different angle - instead of trying to identify nouns, we try to identify function words?


Quote:Function words (also called functors)You are not allowed to view links. Register or Login to view. are You are not allowed to view links. Register or Login to view. that have little You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view. or have You are not allowed to view links. Register or Login to view. meaning, and they express You are not allowed to view links. Register or Login to view. relationships with other words within a You are not allowed to view links. Register or Login to view., or specify the attitude or mood of the speaker. They signal the structural relationships that words have to one another and are the glue that holds sentences together. Thus, they serve as important elements to the structures of sentences. //Wikipedia[/url]


Now, function words have the almost universal tendency of being short in nature. English has the "three letter rule", in which function words generally have fewer than three letters (i, am, is, of) and content words have three or more. The Cervantes Institute [url=https://cvc.cervantes.es/ensenanza/biblioteca_ele/aepe/pdf/boletin_34-35_18_86/boletin_34-35_18_86_24.pdf]notes
that similar happens in Spanish (a,con,para,de,por although exceptions such as parecer, luego exist). The same effect exists in many different languages.
Why? Without delving into the theory, language erosion. Function words are very common and people have the tendency to shorten them over time, to expend less effort.

In The naïve language expert Drs Claudia Männel & Jutta L Mueller note that many European languages are functor initial - the function extends the sentence (I am going to Rome) (die Amyrillis blüht auf [the flower bursts into bloom]), etc.

To cut a long conversation short: can we identify functors, short words that appear to be giving function to the following words?

Let us take a lexical page, ie, 104r, from the Voynich extractor, transcription H (T.T.).

I identify all words shorter than three glyphs in length. I discard any words with minims in - so aiin is arbitrarily discarded.

You are not allowed to view links. Register or Login to view.
And here are all the extracted functors, ranked by frequency of appearance:
You are not allowed to view links. Register or Login to view.
The most popular are all variants of one another - chol / [font=EVA Hand 1][b]chey / [font=Eva][font=Eva][b]char[/font][/b][/font]  ar / ol  al / [font=EVA Hand 1][b]or / os / ar / sar
[/b][/font][/b][/font]
So what are these words? Now we move into the word of fantasy.

The most common words in European languages tend to be short indication words. Here's the You are not allowed to view links. Register or Login to view.:
6.18% the
4.23% is, was, be, are, ’s (= is), were, been, being, ‘re, ‘m, am
2.94% of
2.68% and
2.46% a, an
1.80% in, inside (preposition)
1.62% to (infinitive verb marker)
1.37% have, has, have, ‘ve, ’s (= has), had, having, ‘d (= had)
1.27% he, him, his
1.25% it, its
1.17% I, me, my
0.91% to (preposition)
0.86% they, them, their

And other European languages that I've quickly looked up are basically the same (although Roman languages have pronoun propositions up at the top as well).
Now, 104r contains 438 words. The most common word, Chol, appears 6 times (1.36%) which is way below the average English frequency. But we don't know what this page is about. A dry medical text will not contain main indicators and it appears to be obvious that the text does not run in a "we take it and we dry it and we pound it and we stick it" format.
The words tend to be clustered in groups upon the page. If I were to guess, it is almost as if a word crops up and is repeated several times within the same topic. Or appears three times in the same line and is a suffix for other words in the same line:

olcheear chedar or aror!sheey olkeechy or char cheeol sor or aiin ot!am

Let's think of a different angle of attack. Can we fit these proposed functors as suffixes? In other words, are they functor final - acting upon a stem? Could the stem be a verb with the functor acting upon them?
Well:
chol, which appears six times by itself, merges with 8 other words:
cholfor okechol  chol!cham  pchol  cholxy  qokchol  pcholor cholkar

chey appears four times by itself and a further five times as a larger word.
dam 2 / 5.  char 4 / 7. sar 2/ 2. air 2 / 11.
The two glyph words are really common as part of larger vords, but this can partially be discarded simply because we know they are common.
ar appears 57 times; or 65; etc.

Sadly I have to cut this experiment short here for time reasons - I'll post it here to see if anyone has any feedback. Always a dangerous thing to do!
I remember in university working on a project with Jana, who spoke Czech, and Gabija, who spoke Lithuanian. One of my tasks was the go through the written report (in English) and check that the definite and indefinite articles (the, a, an) were in the right places, because neither of their languages had them. Even though they both knew English, articles were basically something pretty alien to their way of thinking about language, despite being bleedingly obvious to me. Both Czech and Lithuanian are, of course, Indo-European.

So while function words have the potential to be easily identified, we can't be sure which of them exist. Indeed, from the list you provide, none are truly universal as separate words.
Just to set the record straight, it's not Bax's method. He says so himself, and explains where he got it in his video.

I suppose one could call it Bax's method in the sense that he applied a subset of that original method (and yes, he did focus on nouns).


David, I can't read your whole post right now, but the part I did read looks interesting. I'll get back to it this evening when I'm done with work.
Yes, many Indo-European languages do not have articles - Russian of course, and notably Sanskrit.
Articles are actually an innovation of Germanic and Latin languages - both ancestors didn't have them.
They evolved from the numeral one and the demonstrative, this is still very clear in German for example.

In languages like Russian, which have not undergone this change, we might still expect a number of demonstratives which behave similarly to articles.
Yes, well, I wasn't just looking for the definite article of course, but generic function words that might pop out. Emma, I didn't realize Czech doesn't have them either.

Quote:Just to set the record straight, it's not Bax's method.

Perfectly correct - I was thinking of other things when I typed it and didn't word it correctly. Anyway, I haven't gotten to actually trying out the methodology yet....

Anyhoo - I'm bored of my original thrust now (surprise, surprise). Instead, I've delved down an intellectual side alley. Will a shortcut or a dead end await me? Or a metaphorical mugging in a back alley?

Let us constrain our thoughts to this one page, taken in isolation from the corpus (f104r), and examine how these words function solely on this page, without regard to the corpus.

We take the most common word, chol <chol>:

[attachment=1715]

It appears 14 times on this page:
You are not allowed to view links. Register or Login to view.
So we have prefixes  oke p qok  and suffixes for cham xy or kar.
On this one page we have the following correspondence:
kar
oke
p
qok
chol cham
xy
or
If we try combinations of prefix, functor and suffix, the only word that is viable is pcholor, which is unique to this page, and whose deconstruction led to the original fix inclusion. So that could be an error on someone's part.

The second most common word is chey.
Chey appears as opchey (twice), olkeechey, lkechey, okolchey.
Giving us only prefixes op, olkee (unique to this word, although is we assume a space between ol and keechey, the latter appears several times in the corpus), lke, okol (although to my mind this could easily be two vords again). So let's assume the spaces should be there: Chey is now prefixed with:
op, kee, lke

The third most common word is char.
Char appears as charaiin *, opchar and qopchar.
Giving us prefixes char, op and qop.

I skip over the bi- and uni-grams as they are too common.

sar only appears on this page by itself.

dam appears as chedam, qodam. Giving us prefixes che and qo.

Enough already! You cry. We know qo is a popular prefix.

And anyway, all this was worked out long ago by Stolfi in his core theory.

But I mention one thing which I'll refer to again in a minute: there are almost all prefixes,

So let's go back to my original idea of word identification. What sort of a word can appear by itself or as a modifier? Well, function words, but generally depending upon the construction of the language they are either single or fusional.

But - nouns can occupy that function. They can appear by themselves (flower) or be modified (yellow flower) and even categorised (big yellow meadow flower). And languages have different ways to expressing and joining these classes of words - either in a sentence, or fusing them together or smashing them into one whopping big agglutinative word.

Voynichese is unlikely to be agglutinative, as it doesn't have the word length necessary. But it could very well be You are not allowed to view links. Register or Login to view.- where the language uses a simple morpheme to denote semantic features.

So far, I've proposed nothing new. So let's propose another thought experiment and see where it takes us:

Are these words nouns (or more correctly, content words) which are being modified mainly by prefixes?

Which, if any, languages work in this way, concentrating on prefixes with the occasional suffix for certain cases?
(And I'm going to add a collorary here - I'm not expecting this to lead to any one-to-one "decoding" of the script. )
Some more information about chol chol and chey chey

chol is much more frequent in Currier-A than B (261 vs 99)
chey is less frequent in A than B (78 vs 249)

chol is the most frequently reduplicating Currier-A word. Even if it is only half as frequent as daiin (261 vs 500) it reduplicates more frequently (19 vs 12). See graphs You are not allowed to view links. Register or Login to view..
There also are two occurrences of chol.chol.chol.
<f8v.P.8;H>        okchol.ksh-chol.chol.chol.cthaiin.dain-
<f47r.P.7;H>       schesy.kchor.cthaiin.chol.chol.chol.chor.ckhhy-

chey only reduplicates once, and that occurrence is immediately followed by chol
You are not allowed to view links. Register or Login to view.
Marco, have you ever tried to see the affixes of chol and chey? It would be interesting to see if they form larger words in the same fashion. Thought I'd ask you before running any tests myself.
What I would suggest offhand is to look for the word "and". It is a universal word (not like an article), it is frequent (sometimes, and the more so in the past, being used in exchange for words such as "then", "so" etc.). And it should be ubiquitous - namely, it should be found even in texts of highly conspective nature which may lack articles or prepositions.

How can it be located? I can think of searching it in between vords relating to homogenous objects - such e.g. as the "Voynich stars". One could just map all the f68r1 and f68r2 objects and see what are the vords joining them (if any).
Pages: 1 2 3 4 5 6 7 8 9