scripts with character variants and ambiguities

scripts with character variants and ambiguities - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: scripts with character variants and ambiguities (/thread-2293.html)

scripts with character variants and ambiguities - MarcoP - 13-02-2018

I guess that one of the problems with statistical analyses of the VMS is that, when comparing with other sources, one typically only has modern texts available.
My impression is that some of the strange features of Voynichese might be caused by the script, rather than by the language.
For instance, there are medieval European scripts in which the same character is written differently on the basis of the nearby characters. I expect this could result in lower entropy (but it's clear to me that this phenomenon should be very extensive to result in second-order entropy comparable with the VMS).

This is an example of a script in which 'r' has three different shapes:
* similar to uppercase R (but smaller) at the beginning of words [red]
* similar to '2' or 'z' when midword and immediately following a "round" character ('o', 'p', 'd') [green]
* 'r' in other cases [blue]

Obviously, to an hypothetical transcriber having no knowledge of Latin languages and alphabet, these three would look like different characters and each would be transcribed as such. He would have to deal with a character that only occurs at the beginning of words and another one that only occurs in the presence of a restricted left context.

UPenn You are not allowed to view links. Register or Login to view. - Virgil - [Le livre des Eneydes] - France, late XV Century
Other interesting features in this manuscript are that 'v' only occurs word-initial (in all other cases, the same character as 'u' is used) and 's' has two different shapes (this is actually quite frequent), one only occurs at the end of words, the other elsewhere.
The example of 'v' is a simple case of ambiguity: a single symbol sometimes used for unrelated sounds. This same manuscript typically omits the dot upon 'i', with the result that 'm' and 'ni'/'in' are often indistinguishable. Of course, something similar might be happening with VMS EVA:i and EVA:e sequences (see also this You are not allowed to view links. Register or Login to view., by Stephen Bax).

RE: scripts with character variants and ambiguities - -JKP- - 14-02-2018

There are quite a few initial/medial/final shapes in languages that use Latin alphabet and abbreviations and many of the abbreviations resemble letters.

A few examples of initial/medial/final-character ambiguity:

In many texts, there is a medial and final "s" Often the Greek sigma is used for final "s" but some scribes use the esszet shape (ß) for the final "s" and do not always mean it to be read as a double-ess as became the convention later. Some even used a different "s" for the beginning of words or for capital-S (similar to our modern "s"), but this is less common than long-ess or straight-ess in initial and medial positions.
As mentioned by Marco, V and U are used interchangeably, but sometimes the "v" is used for initial position or for a capital (for both v and u).
Also as mentioned by Marco, "r" was variable. Usually two forms were used, but sometimes three (especially if the manuscript has capitals, which many didn't).
The letter "d" has two forms in many manuscripts and there isn't too much rhyme or reason in how they use them (many scribes simply alternate somewhat randomly). One is a simple "d" (like the first one in "oladabas" without a loop, and one is figure-8 shape (similar to the last glyph in "oladabas" except it's usually slightly asymmetric to distinguish it from the number 8).
The n and m sometimes have final forms in which the scribe adds a tail. One might think it would be obvious that they are "n" and "m" but in many texts, when a tail is added to "n" at the end, it often looks like "y" or like the y-form of the letter thorn (the one in "hear ye" which was not pronounced like "y" but like "th" as in "thee"), so it is quite easy to mistake n-tail for a y or a thorn.
In the final position, the letter "i" is often written with a long descender tail (and sometimes misinterpreted as "j" by modern viewers). When "i" is written after "i" at the end and has a long tail, some people misread it as a ÿ (y with umlaut) and it did eventually evolve into "y" but in the earlier texts, it stood for ij (which was "i" followed by another "i" with a descender).
The letter "m" in the initial position was often written so it looks like "an" (witness the discussion about michiton/anchiton). Usually it is known by context whether it is "m" or "an" but USUALLY (not always, but usually) the scribe will add a tail to show that it is the letter "m" rather than "an". If the tail is short (no descender), it is USUALLY "an". This holds up about 90% of the time, but there were a few scribes who didn't lengthen the tail very much, making it hard to tell the difference.

A few examples of common scribal abbreviations that resemble letters:

The most common one is "9" (at the ends and sometimes beginnings of words). I've noticed many people read it as "g" but it's not a "g", the "g" is usually drawn a little differently (although not always)—it's an abbreviation. Some scribes superscripted it to distinguish it from "g" but many did not. It stands for "con-" or "com-" at the beginnings of words and usually "-us" or "-um" at the ends of words, and occasionally for other things.
The letter that looks like "z" at the ends of words is an abbreviation for "-us", "-em", "m", and occasionally "-rum" (by scribes who don't use the more flourished "rum" character that looks like a big 4).
The abbreviation for "-ur" and "-tur" can look like a "2" or like EVA-r, depending on the angle it was written. The same character was sometimes used for "et" (and). Sometimes it was superscripted but most of the time it was written inline. It could stand alone as a word, or be used within words.
The shape that looks like a big open-4 at the ends of words almost always stands for "-rum".

The abbreviation for -ris/-tis/-cis is often mistaken for a "j" by modern viewers but it is an abbreviation that combines the r/t/c with the abbreviation for "-is" (which is a stick with a loop). It looks like EVA-m and most of the time is used at the ends of words. By itself, the abbreviation for "-is" can be used in many positions in the word but that is less common than using it for -ris/-tis/-cis.

Other forms of ambiguity:

Another thing worth noting is that long-s and f are often confused by modern viewers and probably were by viewers from other medieval cultures. The older forms of long-s or straight-s sometimes had a little tick in the middle similar to the stem on the f and if the tick crossed the stem, it looks like "f". You have to be able to read the language to know whether it's "s" or "f" in cases where scribes drew them almost the same.
Modern viewers (and possibly those from other cultures) often ignore the little dots and ticks above the letters in Latin texts, but you can't. These are various forms of apostrophes and the text doesn't make any sense unless they are expanded properly.
Minims, of course, are easy to misinterpret. Even experts have problems working out minims in some of the early medieval texts.

-----
In terms of statistics (which is an interesting topic, misinterpretation of the shapes as different letters probably would create a few positional constraints (like a shape that only occurs in the initial or final position) but the medial forms would probably still move around more than they do in the VMS and they would probably still not be "tied" to companion glyphs as much as they are in the VMS, so sample text would have to be analyzed to determine to what extent positional patterns would be similar to VMS text.

As for the perceived character set, if a scribe were from a different culture and didn't recognize the difference between initial/medial/final forms, their perception of the size of the basic alphabet might increase, rather than decrease (they might perceive more than 30 shapes as basic characters and if they were from an African or Asian culture, this would not seem unusual), especially if some of the abbreviations that resemble letters were also misperceived as letters).

RE: scripts with character variants and ambiguities - Paris - 14-02-2018

Quant tout fut prest sur lerbe se poserent
Ou leurs corps laz et tristes reposerent
De divers vivres et de doulce liqueur
Comancerent a reprandre vigueur
Quant ilz eurent leur aspre fain chassee
P?? viande quil avoient pourchassee
Et que de table furent trestous levez
Eulx qui estoient lassez et agravez
Comancerent lors par parolle maincte
De leurs consors faire regret et plaincte
Et eulx piteux entre espoir et grand doubte

RE: scripts with character variants and ambiguities - nablator - 14-02-2018

(14-02-2018, 10:45 AM)Paris Wrote: You are not allowed to view links. Register or Login to view.P?? viande quil avoient pourchassee

Par.

Very readable and understandable even without training except for the iv/ui and ni/in/m ambiguities.

A full transcript and critical study (PDF, French) : You are not allowed to view links. Register or Login to view.

A century older script, in Nicole Oresme's Le livre du ciel et du monde (BNF Fr. 565) is more challenging to read. You are not allowed to view links. Register or Login to view.

The first paragraph of the famous illustrated page:

Quote:Au nom de dieu ci commence le livre d'aristote appelle du ciel et du monde lequel du commandement de tres souverain et t[re]s excellent prince Charle quint de cest nom, par la grace de dieu Roy de france, desirant & amant toutes noblez sciences, je nichole oresme, doyen de l'eglise de rouen, propose translater et exposer en francois.

- Some 'r's are difficult to make out, e.g. in Roy (with s-like prefix because it's a capital R I suppose) and france (ligature looks like 'fi'),
- 'a' sometimes looks like 'ri', e.g. souverrain looks like souuerrim,
- Latin abbreviations are used, such as 'prop' written 'pp' with a tail, 'par' written 'p' with a short horizontal bar underneath, etc.

On the same page, a curiously open 'o' similar to Voynichese V101-A that can sometimes be confused with 'a':

[Image: selon_10.png]

"selon soy"