The Voynich Ninja
The Constructed Language Hypothesis - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The Constructed Language Hypothesis (/thread-4226.html)

Pages: 1 2 3 4


RE: The Constructed Language Hypothesis - ReneZ - 02-04-2024

(02-04-2024, 10:50 AM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.Do you count the presence of null text within your idea of a meaningful solution?

Null text is strictly a cipher thing.

In the context of a constructed language, there is no place for nulls.
Or at least, it would not make any particular sense.

Now, in the frame of my question about a word-by-word substitution:
*if* a word-by-word translation can be ruled out on some basis (that we do not yet have),
then, the text could still be meaningful, if:
- there is a verbose substitution
- the person(s) doing the text conversion had some degree of freedom (which would really explain quite a lot!)

A verbose substitution and some degree of freedom would both arise if the person(s) doing the text conversion was using nulls.


RE: The Constructed Language Hypothesis - MarcoP - 02-04-2024

Since a few years, I am very fond of the constructed language idea, but I must say that this preference is more irrational attraction than well-grounded reasoning. I spent some time studying Dee and Kelley's Enochian, but it certainly deserves more attention. I think it's worth remembering that both Ildegard's Lingua Ignota and Enochian came with their own custom alphabets (though I think the corpus of Enochian was written with the Latin alphabet and we don't really have a corpus of Lingua Ignota). I know absolutely nothing of the later artificial languages mentioned by Patrick.

(02-04-2024, 12:56 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view....my favourite 'key question'  about the Voynich MS may provide some insight.
That is: in case we could answer this key question.

This is:
Is it possible to do a word-by-word substitution of the Voynich MS and come up with a meaningful text?

I see more reasons why the answer should be "no"  rather than "yes".

If it is "no", then we cannot create a dictionary of Voynichese to some known language.
A constructed language is most easily conceived in the form of a dictionary.

If it is "no", then also all types of ciphers are excluded, even the more complicated diplomatic ones.

Essentially all past proposed meaningful solutions assume that some form of dictionary should exist.

I think this is a great line of investigation, though I prefer to look at the question as the search for grammar, rather than meaning. Grammar has the advantage of being less subjective and easier to express formally.

The idea that there is a one-to-one word-by-word substitution that could result in grammatical text is not easy to defend. Two obvious problems are line effects and the different "dialects".

As we You are not allowed to view links. Register or Login to view. several times, natural languages have a limited set of "function words" which typically include the top ten most frequent words. These words are very frequent in all texts written in any specific language.
But You are not allowed to view links. Register or Login to view. are different: there are no words that we can expect to encode e.g. "and" or "of" in both HerbalA and Quire13. Similarly, there are line effects: why should some words have a preference to occur at the start of a line?
You are not allowed to view links. Register or Login to view., this suggests some kind of transformation:

Emma Wrote:In the Voynich text, words which might have an ‘ideal’ spelling in isolation are altered in relation to their neighbouring words and their place in a line. This results in a word having multiple different spellings, but which are regular and predictable in a known environment.

My examples, just to illustrate how I understand the process: the word that occurs as Shedy "inside" a line can be transformed into yShedy at the start of a line; or p-words in the first line of a paragraph (e.g. opchedy) correspond to t-words in the other lines (otchedy).

These supposed transformations are of course a huge problem and imply that there is no one-to-one word-to-word mapping. There could be a one-to-many mapping (where each plain-text word appears as different Voynich words on the basis of paragraph position, line position, dialect). Focussing on grammar has the advantage that one can search for structure, independently from any assumption about the underlying natural or artificial language. This would be similar to doing at word level what Stolfi did at character level. I made some attempts at finding rules describing how Voynich words follow each other, but I couldn't find anything solid.

The only grammatical features that can be easily detected do not appear to be language-like: line effects, consecutive repetition of the same word, similar words appearing next to each other. You are not allowed to view links. Register or Login to view. proposed a cipher system (mod2) that results in the last feature, but of course the frequency of the consecutive repetition of identical words is a big stumbling block for word-to-word mapping with a natural language.


RE: The Constructed Language Hypothesis - Torsten - 02-04-2024

(02-04-2024, 12:56 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Is it possible to do a word-by-word substitution of the Voynich MS and come up with a meaningful text?

I see more reasons why the answer should be "no"  rather than "yes".

Indeed, there are numerous compelling arguments against a word-by-word substitution.

The strongest "pro" argument is that "the pages that are nearest neighbors in topic modeling tend to be adjacent to one another in the manuscript" [Bowern & Lindemann 2021]. However this observation only means that the text is structured in some way, it does not imply that Voynich words are indeed words or that the text possesses a linguistic structure. 

Moreover, words used on the same pages together "share similar morphological patterns, either in their prefixes or suffixes." [Montemurro et al 2014]. A well-known example of this observation is the prevalence of final EVA-edy in Currier B but its rarity in Currier A. Less recognized is the possibility of describing similar patterns throughout the Voynich text. For instance, EVA-qo rarely occurs as part of labels; in Herbal in Currier A You are not allowed to view links. Register or Login to view. is less frequently used than in the Pharma section in Currier A; You are not allowed to view links. Register or Login to view. only frequently occurs on some Herbal A-pages, and so on. "The closer two words are (with respect to their edit distance), the more likely these words also can be found written in close vicinity (i.e. on the same page)" (Timm & Schinner 2019, p. 6). It seems as if the distribution of words is caused by the distribution of glyphs combinations, rather than the other way around. 

Upon closer examination, even this "affirmative" argument essentially transforms into a "negative" argument.

There are also numerous counterarguments:

Tiltmann wrote in 1976 "My analysis, I believe, shows that the text cannot be the result of substituting single symbols for letters in the natural order. Languages simply do not behave in this way. ...  And yet I am not aware of any long repetitions of more than 2 or 3 words in succession, as might be expected for instance in the text under the botanical drawings" (Tiltmann 1976).

D'Imperio wrote in 1978: "Also the strange lack of parallel context surrounding different occurrences off the 'same' word as shown by word indexes. In the words of several researchers ' the text just doesn't act like natural language'" (D'Imperio 1976, p. 30).

Timm & Schinner 2019: "In natural languages there will be frequent words distributed equally over the entire text, the so-called function words (like conjunctions, articles etc.). They do not appear contextual, but rather serve to implement grammatical structures, and they normally do not have co-occurring similar words of comparable frequency. In the VMS frequently used tokens differ from page to page. With the exception of repetitive prayers or poems, words in natural languages are chosen because of their meaning, and not of their similarity with previously written words" [Timm & Schinner, 2019, p. 6].

Currier wrote in 1976: "The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally." (You are not allowed to view links. Register or Login to view.). Because of Curriers observation even Claire Bowern concludes: "All of these observations lead to generalizations that seem typographical rather than linguistic in nature. ... A comprehensive linguistic analysis needs to take seriously the possibility that, for example, paiin, saiin, aiin, and am are all positional variants of the same word." [Bowern & Lindemann 2021].

Stolfi: "In fact, the word length distribution matches almost perfectly a binomial distribution ... This coincidence suggests that the length of a word chosen at random ..." (You are not allowed to view links. Register or Login to view.)


RE: The Constructed Language Hypothesis - cvetkakocj@rogers.com - 03-04-2024

(02-04-2024, 10:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.A comprehensive linguistic analysis needs to take seriously the possibility that, for example, paiin, saiin, aiin, and am are all positional variants of the same word." [Bowern & Lindemann 2021].
The paiin, saiin, aiin, am are not the same words, but only the suffix is the same. In Slovenian language, this suffix stands for the 1. person singular present tense. The words aiin and am  can be both read as am or as aiw. Aiw is a suffix for adjectives. This means that the minims have to be read as Dr. Bax suggested. In some words, this can be done based on the short space between the minims, but the best reading is in the context.
To turn EVA transliteration letter p to transcription letter, it has to be changed to SV. Slovenian language at the time did not have letter F, however the sound 'sv', like German SW, or Latin SUE does sound a bit like SF. Readint these words with Slovenian alphabet, we get the words SVAM (to all), SAM (I AM, and also ALONE, BY ONESELF), AM (I am).

The reason most VM researchers are unable to find function words are different spelling rules that applied in the 15th century Slovenian and other Slavic languages.

1. Because the Glagolitic writing had several letters for the half-sounds, those were dropped when the words were written with Latin letters, since the Latin alphabet had no equivalents to the semi-vowel letters. Because of that, the VM writing looks like semi-abjad, as Dr. Bax had pointed out. By the 16th century, the missing semi-vowels were replaced with full vowels. Since Slovenian  language does not use diacritic markers, the knowledge of language is required to know where in the VM words the vowel would be missing. There are also some Slovenian dictionaries, where the short stress vowels are indicated. The insertion of different vowels for the missing semi-vowel resulted in many different spelling of the same words, like DY - DAY (DAJ, DEJ). The letter Y was also replaced in the 16th century according to a general rule that where a vowel is needed, Y became I, and where a consonant is needed, Y became J.

2. The second rule deals with so called function words as part of the so-called word blocks. They were described by Crnković as short unstressed words attached to the main word. They often include prepositions, such as S, K, V, H, Z, and conjunctions such as CH(E)  (if), Y (and), pronouns, and even short verbal forms. In phonetic speech, such words are pronounced as one word. In the VM, sometimes, such function words can be written together with the next word or separately. They can also occur at the end of the word.

3. Heavy use of prefixes, particularly PO (EVA QO), which indicates finish action and can be attached to verbs, nouns or adjectives. A prefix O- had a similar meaning. A frequently used Slovenian prefix is the word DOL.

3. Inflectional suffixes - Slovenian nouns, adjectives, pronouns inflect for three numbers, three persons, three genders and six cases;  the Verb conjugate for 3 numbers, three persons, three genders. The VM vocabulary clearly reflects this. From the frequency of the verbs DAM (EVA daiin), DAL (EVA dal) and DY (EVA dy), it can be 
assumed that the most writing is done in 1. person singular present tense indicative mood (DAM - I give), or in 3. person singular masculine, present or future tense - (DAL - gave, will give, would give), and DY (2. person singular imperative or conditional mood). Because of the less frequently used inflectional forms of the words there are many words in the VM that differ for only one or two letters.

4. There are also rules for the sound-changes, which can create similar words with the same meaning.

5. The rules for the use of Latin letter for Slovenian sounds  were not consistently followed, which creates additional problems.

6. Even if the transliteration alphabet was perfect, there are still problems with ambiguous words (sloppy written, ink faded, etc.)

Considering all this, I was able to compile over 2000 different  VM words that can easy be translated into Slovenian by using one-to-one letter substitution, which closely matches to Slovenian spelling. For the individual words, I am counting the words that  are spelled differently, although many may belong to the same word family. The vocabulary and grammar is consistent with Slovenian language.
Because of the complicated morphology, and the complexity of Slovenian language, the VM is resistant to computer analysis.

I believe that eventually it will be possible to expand the 'dictionary'. 

I am not trying to push Slovenian theory, but rather show you that there are other linguistic possibilities that can have more merit than Latin.


RE: The Constructed Language Hypothesis - Koen G - 03-04-2024

Cvetka, your post is experienced as off topic here, please do not insert your Slovenian theory into other threads. You are welcome to discuss it in its own thread.


RE: The Constructed Language Hypothesis - nablator - 03-04-2024

(02-04-2024, 12:43 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Now, in the frame of my question about a word-by-word substitution:
*if* a word-by-word translation can be ruled out on some basis (that we do not yet have)

But we do, in the case of a one-to-one substitution. You are not allowed to view links. Register or Login to view., which I replicated, show abnormally low values for a European natural language: vord pairs do not "work" together at any distance except distance 1 where there is a small but measurable effect. It is possible to write such a (short, like Cato's Distichs) text when the author is careful to never reuse the same pair of words at the same distance (>1) at a frequency superior to what would be found in a random shuffle, but it is extremely unlikely to have a long meaningful text like this.

In the article by Andrew Caruana, Colin Layfield and John Abela You are not allowed to view links. Register or Login to view., presented at the Malta Voynich conference 2022, they used a similar metric on word pairs, but less sensitive because of an arbitrary threshold for deciding which pairs are 'skewed'.


RE: The Constructed Language Hypothesis - merrimacga - 03-04-2024

In all likelihood, the solution will be some combination of natural language, constructed language and/or cipher text. It is also possible, if not probable, that a macaronic/hybrid language, code-switching, plurilingualism or multilingualism is also involved. However, the greater the complexity of the language and script used in the VM, the less likely the author(s)/scribe(s) would have been fluent enough in it to be able to write it fluidly.

Can we all agree that the VM is written fluidly by more than one hand? If so, then we can also assume a high degree of fluency by each of those hands in a commonly used and mutually understood, if not completely identical, script and language.

So far as I have seen, no one has been able to fully equate the VM glyphs with any particular, known writing system, alphabet (including runic and Arabic), syllabary or logography/lexigraphy or combination thereof. Partially, yes, but fully, no. It is reasonable to start with the premise that Voynichese is not the product of one or more constructed scripts and one or more constructed languages. Otherwise, the VM's fluidity of multiple hands is at least improbable, if not impossible. Could it be the product of some yet undiscovered, or at least not yet considered, writing system but one using one or more known natural languages or a constructed one? Perhaps, though most known writing systems are not completely unique. Distinct, yes, but unique, no, as resemblances to each can be found in other known writing systems. Technically, one could argue that a cipher or code is also a constructed writing system, though this is not generally considered the case, however, the same questions would apply, perhaps more so since multiple persons are even less likely to be so practiced a hand as to write cipher or code fluidly.

Simplicity and fluidity and probability and the capability to research all suggest that we should first look at either a contemporaneous constructed script (or code or cipher) used with one or more contemporaneous natural languages OR a previously existing but currently discontinued, and perhaps undiscovered, contemporaneous natural script used with one or more contemporaneous natural languages. And when considering natural languages, we must consider all possible rural dialects and other variations. I would not expect us to exhaust these possibilities anytime soon as there are currently almost 300 known writing systems worldwide, including historical ones, at least 80 known constructed scripts (not counting ciphers and codes), and more than 7,000 currently spoken languages amongst a low estimate of about 31,000 total languages historically. Any of these that didn't exist at the time the VM is assumed to have been written, based on the radiocarbon dating, may be eliminated for the purposes of narrowing research but this only gets us so far. Once we exhaust all the possibilities of the one line of research, then we exhaust all the possibilities of the other before we move on to the more complicated, less likely to be fluent and fluid possibilities, including that of a constructed language.

If we could accurately and positively date and geographically pinpoint the VM and if we could accurately profile the VM's author(s) if not also its scribe(s), we could narrow this search significantly. Until then, we're looking for one very tiny needle in one very large haystack.


RE: The Constructed Language Hypothesis - nablator - 03-04-2024

(03-04-2024, 05:11 PM)merrimacga Wrote: You are not allowed to view links. Register or Login to view.Can we all agree that the VM is written fluidly by more than one hand?
Certainly not. Smile

I wonder why anyone would describe the text as "fluidly written". If fluidly means written "You are not allowed to view links. Register or Login to view." then the VM's writing is the opposite, a mass of glaring irregularities: inconsistent spacing between words, so much that it is often impossible to decide where words are, with small unexpected gaps every few glyphs just because it is not weird enough (look at the first "word" fachys, it has 3 half-spaces), a wobbly baseline incomprehensibly changing direction often several time per word, often to avoid a gallows glyph on the next line, frequent unidentifiable glyphs on a continuum between r-s, o-a-y, etc.


RE: The Constructed Language Hypothesis - Hermes777 - 03-04-2024

The starting point for Leibniz was Ramon Llull's Ars Combinatoria. He saw the seeds of an artificial language in Llull, and this provided impetus for later growing interest in artificial languages. This would be a good starting point for an adventurous mind in the early 1400s too. There is certainly an abundance of evidence in Voynichese suggesting combinatorics. (The other possible source would be the Prophetic Kabbalah of Abulafia (also from Spain) - a system of combinatorics and an incipient 'language' (vocabulary) from the 72 Letter theonym.) 

Modern conlangs, Esperanto, Tolkein's tongues etc, are properly linguistic, credible imitations of natural languages. Voynichese seems far more mechanical than organic. If the question is how our author might have approached creating a constructed language, Llull's systems and similar suggest themselves in the context: there is a Llullian revival in the first half of the 1400s.


RE: The Constructed Language Hypothesis - Hermes777 - 03-04-2024

One of the reasons a Constructed Language Hypothesis has not had more air-time in Voynichese research is that, like hoax scenarios, it tends to be an argument of last resort. Researchers turn to a constructed language scenario in despair, when everything else fails. It is where we arrive by a process of elimination. And then it invites crazy speculation, so it is best avoided if possible.

But since this is where many seasoned researchers have ended up, it deserves more consideration in its own right.

Here is an idea - bordering on crazy speculation - I have explored for Voynichese as a constructed language (of sorts). I think it has more merit than meets the eye.

The model for it is the SATOR magic square, the mystique of which was certainly current in our historical window.

I take it that this square demonstrates that - if you put the right letters in the right stations - the movements of the Sun spell Latin words and make a Latin sentence.

Admittedly, not quite grammatical Latin, and you can read the 'sentence' various ways, and the square requires a mysterious neologism to work, but the 'magic' of it is that it shows, in principle, that there is a celestial text and that Latin has celestial roots and so is an adequate tool for capturing the celestial text.

Voynichese, then, is an extension of this idea.

If you ascribe a creative scheme of Latin letters, bigrams and abbreviations to the right stations (see f57v) - and, like the SATOR square, you have a bedrock of consonant/vowel alternation - then the movements of the heavens can be shown to spell out a text.

The text is not nearly grammatical, but it is impressive enough that it makes lots of Latin words, along with Latinish neologisms.

I propose that the way this is done is - like the SATOR square - to create keywords (palindromes) that are templates containing all the necessary letters and so are pregnant with possibilities.

Additional to your creative ensemble of Latin letters, you set up a framework of four (eight) markers for the quarters of the year, directions etc. (Like the four corners of the square.) In Voynichese, the system of gallows glyphs.

Again: the text is generated by the movements of the heavens, and - magic! - it is full of Latin words or forms that suggest or could be construed as Latin words depending on the abbreviations etc. (which are flexible and easily manipulated into words by design.)

As we know, Voynich words are very suggestive. They are short and tend to have CV alternation. People from Latinate languages see meaningful words everywhere! This ambiguous property could be by design.

An extra level, allowing far more creative results, (not in the SATOR model). would be if the system was alphanumeric.

We don't find the SATOR square in the manuscript, but then the magic square was regularly conceived as a wheel (ROTAS) and we find various letter wheels in the Voynich.

It would be a bold experiment, but perhaps not altogether succesful? Perhaps what we see in Currier A and B, and scribal variants, are attempts to "tune in" the system - to make the correspondences create more and better Latin?

Perhaps, in any case, it is an adventure in the Humanist fetishization of ancient Latin and rests on the conviction - implicit in the SATOR square, which offers a model for the construction - that Latin, like Greek, Hebrew and Arabic - is a sacred or primordial language that came from On High. Voynichese is an experimental constructed language meeting that need? The constructed language might be styled as "original celestial Latin."