The Voynich Ninja
Cipher or unknown language - historical perspective - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Provenance & history (https://www.voynich.ninja/forum-44.html)
+--- Thread: Cipher or unknown language - historical perspective (/thread-4874.html)

Pages: 1 2 3 4


RE: Cipher or unknown language - historical perspective - Jorge_Stolfi - 30-08-2025

(29-08-2025, 11:34 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.If you gave me someone's historical attempt at writing Catalan in a modified alphabet, I could tell you pretty quickly that we're looking at a member of the Romance language family, then take it from there.

Yeah, but Albanian? Or even Armenian?

Quote:Were the three Albanian[? Caucasian?] alphabets really created from scratch? What do they look like?

Armenian is currently spoken mostly in Armenia, a landlocked country East of Turkey and South of (Caucasian) Georgia; but at one point in late Middle Ages the Armenian kingdom of Cilicia extended to the Mediterranean and interacted with the Crusader States of the Levant.

The language is Indo-European but not Romance.  You are not allowed to view links. Register or Login to view. it is an isolated branch of Indo-European, with no established connections to other branches. Its You are not allowed to view links. Register or Login to view. deviated a lot from the prototypical IE one.  It completely lost the IE grammatical gender, even in pronouns.  Nouns are inflected for number (singular or plural) and according to seven grammatical cases, which distinguish between animate and inanimate nouns. Only ~1500 words of the lexicon are derived directly from Indo-European, the rest being loans from other languages.  Verbs are inflected for person, number, and several tenses and moods, including some that do not exist in other IE languages, like "optative" and "necessitative".  Negation is formed by prefixing a letter (something like "ch") to the verb.

The Armenian script was supposedly designed by a Christian monk in the 5th century.  While the concept was taken from other European alphabets, the letter shapes are original.  Some letter pairs are distinguished only by small serif-like strokes.  The written language has some long consonant-only clusters, but in the spoken language they are broken up into smaller clusters by schwas.

(Balkan) Albanian has long been spoken in (Balkan) Albania, just north of Greece, on the Adriatic coast. It too is Indo-European but not Romance.  You are not allowed to view links. Register or Login to view., it too is an isolated branch of Indo-European, with some You are not allowed to view links. Register or Login to view..  Nouns have three grammatical genders and four syntactical cases.  There is a definite article (only singular?) but it is a vowel suffix that depends on gender and final consonant.  It lost many IE grammatical features, such as verb inflections. Little of the original IE lexicon survived; maybe 50% of Albanian words are loans from Latin.   Since the earliest surviving document (from 1462) it has generally been written in the Latin alphabet, with a few extra letters.  However some communities in the past have used modified Greek and Cyrillic alphabets.

(Balkan) Albanians were mostly Christian (first Byzantine, later partly Roman, partly Eastern) until the Ottoman conquest of the country in the 1400s.  Refugees from that conquest settled in Southern Italy, starting ~1450, and their descendants are a population of ~100-200 thousand Albanian speakers (Arbëreshë).

Caucasian Albanian was the language of the ancient kingdom of Caucasian Albania, which existed between 100 and 700 CE North of the Caucasus Mountains, in present-day Azerbaijian.  There is no relation to the Balkan country or language; the shared name is just an unfortunate coincidence. ("Alwan" is a more accurate version of the name, but still not widely used.).  The population was Christianized in the 4th century but most of it converted to Islam in the 8th century.  In the 11th century the region was conquered by Turkic peoples, which is why the modern population of Azerbaijan speaks an unrelated language very similar to Turkish.

The Caucasian Albanian language is known almost exclusively from a ~180-page palimpsest from ~500 CE found in a monastery in Mount Sinai in the Sinai Peninsula.  The date of its extinction is unknown; but it is believed to be the ancestral of the nearly-extinct Udi language still spoken in a couple of villages in Azerbaijian.  

Caucasian Albanian was not an Indo-European language, but a member of the Northeast Caucasian family, which has only a few members anyway.  The language is agglutinative, forming long words by adding multiple suffixes to a stem, or inserting infixes into the stem.  It has 11 grammatical cases and grammatical number, but no gender. There seem to be only two stem classes, nouns and verbs; adjectives and adverbs are derived from them by suffixes.  It has many verbal tenses and moods, including uncommon ones like "prohibitive" and "adhortative".  

I was surprised to see that this obscure language in the Far Bunnies has, among half a dozen "indeterninate pronouns", the word "fulano" = "a certain", exactly like the Portuguese and Spanish word meaning "indeterminate person".  But the latter comes from Arabic, so presumably it is not a coincidence: Udi must have borrowed their "fulano" from Arabic too.

The You are not allowed to view links. Register or Login to view. has 57 letters.  It has been said to be similar to the Coptic one, and allegedly was designed by the same monk who designed the Armenian alphabet.

(Caucasian) Georgian or Kartvelian is spoken in the country of (Caucasian) Georgia.  You may be surprised to know that it has no relation to the American state of Georgia; the name is another unfortunate coincidence.  Between ~300 BCE and ~500 CE part of the region was the Kingdom of Iberia, which -- you guessed -- has nothing to do with the Iberian Peninsula or the Iberian peoples, which in turn have nothing to do with the Kingdom of the Iberians which existed in the territory of the former Kingdom of Iberia between the 9th and 10th century.  

(Caucasian) Georgian is not an IE language either, and is not related to Caucasian Albanian even though the countries were essentially neighbors. Cauasian Georgia has been Christian since the 4th century, whith its own Eastern-rite Church.  I mention it here only because its script is the third "Caucasian" alphabet designed by that same monk.

All the best, --jorge

PS. Since the old Kingdom of Albania interacted with the Crusades, and Azerbaijian was dominated by the Mongols in the 12th to 13th centuries, the theory that the language of the VMS is is a (Caucasian) Albanian-Mongol Jargon used by the Knights Templar cannot be entirely dismissed a priori.


RE: Cipher or unknown language - historical perspective - Antonio García Jiménez - 30-08-2025

Since I am a son of the Iberian Peninsula that has been mentioned by Stolfi, I am going to venture to give a hypothesis as to why the kingdom of the Caucasus was also called Iberia.

The Greeks called the planet Venus Hesperus when it is seen at sunset, hence the name Hesperia which became Iberia, the land that was in the West. The Iberian Peninsula is at the same latitude as the Caucasus, where the planet Venus was visible in the morning. The Greeks must have known at some point that it was the same celestial object and also called that part of the Earth Iberia.


RE: Cipher or unknown language - historical perspective - Koen G - 30-08-2025

(30-08-2025, 04:02 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(29-08-2025, 11:34 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.If you gave me someone's historical attempt at writing Catalan in a modified alphabet, I could tell you pretty quickly that we're looking at a member of the Romance language family, then take it from there.

Yeah, but Albanian? Or even Armenian?

That's not my point - I use Romance languages as an example because I have experience with them. And we have other people in the world experienced with other language families. So say we have in front of us a medieval attempt to build an alphabet for a language. My suspicion is that a person well-acquainted with the relevant culture could still recognize the context in which this alphabet emerged. At the same time, they'd probably already know the language or language family. And only then, specific dialects may become relevant.

For any larger text written in and around Europe between let's say 1200-1500, I think this preliminary identification would take place in a matter of hours, not a century (and counting) like in the case of the VM. Of course, after that, scholars could spend an eternity studying the text. But we would know very quickly the broad strokes of what we're dealing with. Not so for the VM.


RE: Cipher or unknown language - historical perspective - Jorge_Stolfi - 30-08-2025

(30-08-2025, 09:52 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.So say we have in front of us a medieval attempt to build an alphabet for a language. My suspicion is that a person well-acquainted with the relevant culture could still recognize the context in which this alphabet emerged.

Well, we know that the VMS alphabet was systematically constructed by combining a small set of simple strokes in almost every possible way. Thus the final letter shapes probably have no connection whatsoever to the cultural context of the Author, of the intended readers, of the language, or of the place where it was conceived. 

Quote:At the same time, [a person well-acquainted with the relevant culture would] probably already know the language or language family. [...] For any larger text written in and around Europe between let's say 1200-1500, I think this preliminary identification would take place in a matter of hours, not a century (and counting) like in the case of the VM.

I don't quite agree.  Starting with the "in and around Europe" bit.  However, for argument's sake, let's assume that the native range of Voynichese and the place where the script was created were indeed "in and around Europe between 1200-1500".

For the language to be identified, we first would need someone who can read that language as it was in that time frame.  There are plenty of such such people for languages that are still "alive", even peripheral ones like Armenian, Balkan Albanian, and Georgian.  But for languages that are extinct (like Caucasian Albanian, Bjarmian, Bulgar, Cuman, Crimean Gothic, Jassic, Khazar, Merya, Norn, Novgorodian, Pantesco, Polabian, Sabir, Sudovian, Ubikh, Yola, ..., and the Basque–Icelandic pidgin --- yes there was such a thing!) or nearly so (like Bats, Budukh, Cimbrian, Cypriot, Karaim, Kryz, Laz, Limousin, Livonian, Manx, Mocheno, Sami, Svan, Tsakonian, Udi, ...), the people with the necessary knowledge will be only a handful of linguists and language buffs, at best.

Then, that person would have to believe that Voynichese could be the language in question -- enough for him to be willing to make a serious effort at checking that possibility.  This is a big obstacle, because the "expert consensus", based on material and artistic evidence, is that Voynichese must be a West European language, Romance or South Germanic.  Knowing that, why would an expert on Udi or Khazar spend any time trying to see whether Voynichese is an exotic language well outside that range?

Even if the guess "maybe Sudovian" is correct, checking it will probably require a lot of work.  Voynichese is not just an original alphabet; it must be an original spelling system too.  The Alwan or Ubikh spelling that the expert knows may differ from the Voynichese one in many ways -- like the use of digraphs vs. single letters, the splitting and joining of affixes and auxiliary verbs, the handling of sandhi, allophones, tones, and epenthetic vowels, the use of abbreviations, etc..  

And it may be in a dialect with phonology and morphology different from those of the dialect he knows.  

How would he tackle the problem, considering that we still cannot guess the meaning of a single word in the whole book?

And then the Author may have had only an imperfect knowledge of the language, and made many errors of spelling and grammar.

And then Voynichese may be encrypted.  Even a simple trick, like writing the letters of each word backwards or inserting random nulls, would almost certainly prevent the identification of the language.

Just consider our difficulty in identifying the languages of the month names and of the "michton" marginalia.  Yet they are surely Western European, in the plain, written in the Latin alphabet, presumably with spelling typical for the language at the relevant time and place...

All the best, --jorge


RE: Cipher or unknown language - historical perspective - tavie - 30-08-2025

(30-08-2025, 01:52 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Well, we know that the VMS alphabet was systematically constructed by combining a small set of simple strokes in almost every possible way.

We don't know that.  Characters similar to most of the Voynichese glyphs can be found in various manuscripts, often as abbreviations as JKP has shown before.


RE: Cipher or unknown language - historical perspective - Jorge_Stolfi - 30-08-2025

(30-08-2025, 02:35 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.Characters similar to most of the Voynichese glyphs can be found in various manuscripts, often as abbreviations as JKP has shown before.

But those other manuscripts also use many glyphs that are not of that form, because they are letters of the Latin or other traditional alphabet. The Voynichese script is entirely made of those systematic 2-stroke glyphs, plus the 3-stroke Sh and very very few other exceptions like v or x .


RE: Cipher or unknown language - historical perspective - Stefan Wirtz_2 - 09-09-2025

(30-08-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.[..]The Voynichese script is entirely made of those systematic 2-stroke glyphs, plus the 3-stroke Sh and very very few other exceptions like v or x .

How do you get to 2-strokes? I see nearly all VMS characters made with exact one stroke (which is very unique in alphabets), apart from those exeptions you named, like the combined characters and the x


RE: Cipher or unknown language - historical perspective - oshfdk - 09-09-2025

(09-09-2025, 12:02 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.
(30-08-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.[..]The Voynichese script is entirely made of those systematic 2-stroke glyphs, plus the 3-stroke Sh and very very few other exceptions like v or x .

How do you get to 2-strokes? I see nearly all VMS characters made with exact one stroke (which is very unique in alphabets), apart from those exeptions you named, like the combined characters and the x

I think the only single stroke characters are e and i. The rest appear to be executed at least in two strokes, apparent from the way strokes occasionally don't touch at their meeting point. I've collected a few samples of misaligned strokes from You are not allowed to view links. Register or Login to view. below.

   


RE: Cipher or unknown language - historical perspective - Jorge_Stolfi - 09-09-2025

(09-09-2025, 12:33 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think the only single stroke characters are e and i. The rest appear to be executed at least in two strokes, apparent from the way strokes occasionally don't touch at their meeting point. I've collected a few samples of misaligned strokes from You are not allowed to view links. Register or Login to view. below.

Agreed.  And I think that Ch is two strokes, namely one e first, and then the h in one stroke.

My theory is that these glyphs were designed to minimize the strokes in the general NW direction.  For a right-handed Scribe (which seems to be the case for the VMS), writing is smoother if the quill is held tilted towards SE.  Then pressing down the quill lightly during each main stroke causes its two tines to spread apart, pumping ink down to the nib as needed.   But then pushing the pen in the NW direction, or even sideways, is risky because it may snag into rough spots of the parchment.   Strokes in those directions had better be light and short -- like the plumes of r, s, and n, the loops of d, m, l, gallows, and ligatures -- using whatever ink still remains in the nib.  Thus it is better to write an o as two strokes, both in the general SE direction, than as a single stroke.  Likewise, a t or k is better written as down the left leg in one stroke, then the loop(s) and down the right leg in another stroke.

All the best, --jorge


RE: Cipher or unknown language - historical perspective - Stefan Wirtz_2 - 09-09-2025

You all seem to count direction-changes and stops as different strokes.
I define "stroke" as a stop-offsetting-onsetting-start, like adding the  -  at "t" of "f".

The last series of close-up images shows more jumps, scratches and maybe lost ink particles on the pages' material, but these are exceptions and I would see all
8, 9, a, k, b, f, t, s, r, l etc. as natural one-strokers, designed for just that reason.