The Voynich Ninja

Full Version: [split] (lack of) scribal mistakes / corrections
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
(26-03-2026, 04:38 PM)LisaFaginDavis Wrote: You are not allowed to view links. Register or Login to view.That's why I put "mistake" in quotes - no way to know what was intended until we can read it. There are lots of examples in Scribe 1's work where they write a glyph that is unique or unusual but that MIGHT have been intended to be something more typical. Examples can be found on 7r line 1 [looks like qko], 18r line 1 [looks like cf], the 3-shaped character on 10r, at the beginning of the last and second-to-last line, and more.

Unusual or rare forms don't necessarily indicate mistakes. For instance the word <chedy> occurs only once in Currier A but is the most frequent word in Currier B. A single occurrence of a form that looks unusual in one section may simply reflect a stage in the text's evolution rather than a scribal mistake.

Regarding folio 10r, I discussed the two unusual "3"-glyphs back in 2014 (p. 33 in You are not allowed to view links. Register or Login to view.). The lines f10r.P.8–12 start with qo, oy, oq, 3o, and 3o — a cluster of unusual word-initial combinations in adjacent lines. This looks more like local experimentation during writing than scribal mistakes.
(26-03-2026, 07:38 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Unusual or rare forms don't necessarily indicate mistakes. For instance the word <chedy> occurs only once in Currier A but is the most frequent word in Currier B. A single occurrence of a form that looks unusual in one section may simply reflect a stage in the text's evolution rather than a scribal mistake.

Isn't that a totally different point though? Spelling mistakes aren't necessarily 1 way and they should tend to be 1 or two letters difference. Like Jorge's example, strings like air may be misspelt as ain, but that works in reverse too. 

If we're talking about someone transcribing/ retracing from a difficult to read text that reads "ai?", a guessing scribe will end up writing both. 

As for the idea that mistakes are widespread, there are some experiments that could be done. For example, if you hypothesise that the following are common mistakes:

a <-> o 
r <-> n

It could be the case that the % of mistakes between a -> o is quite consistent throughout the text and across words. If that is so, there should be an "a variant" and a "o variant" of various words across the manuscript. Comparing the ratio between various a/o variants may show a pattern. Perhaps common words containing "a" have an "o" variant with ~20% the frequency on average, who knows.

However, if there are extremely common words containing "a" which have 0% "o" variants frequency, it points against those letters being mistaken for eachother imo.
(26-03-2026, 09:29 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.If we're talking about someone transcribing/ retracing from a difficult to read text that reads "ai?", a guessing scribe will end up writing both.

That is indeed a possibility, that could explain the apparent high frequency of "rhyming" word pairs like otedy.otody  lkeedy.lkeody okar.otar. (However, I have not actually counted such pairs -- maybe their 'high frequencey" is just a mistaken impression.)

Quote:It could be the case that the % of mistakes between a -> o is quite consistent throughout the text and across words. If that is so, there should be an "a variant" and a "o variant" of various words across the manuscript. Comparing the ratio between various a/o variants may show a pattern. Perhaps common words containing "a" have an "o" variant with ~20% the frequency on average, who knows.

That is a good idea.  We should test it.  But keep in mind that the Author's handwriting may have been very uneven, being more sloppy on certain pages or sections than on others.

I cannot push aside the idea that the Author was old and poor and with failing eyesight when he decided to put his notes to vellum.  And that the Scribe may have been a bright 13-year-old nephew who agreed to do his uncle this favor for a pittance.  And thus the old man had to accept all the errors that the boy made.  And probably not even the Author himself could read his own handwriting of 30 years earlier...

All the best, --stolfi
(27-03-2026, 08:06 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I cannot push aside the idea that the Author was old and poor and with failing eyesight when he decided to put his notes to vellum.  And that the Scribe may have been a bright 13-year-old nephew who agreed to do his uncle this favor for a pittance.  And thus the old man had to accept all the errors that the boy made.  And probably not even the Author himself could read his own handwriting of 30 years earlier...

All the best, --stolfi

This argumentation is problematic for several reasons. First, it is circular reasoning to assume that we can detect exceptions for rules we still want to discover. Secondly, by using this hypothesis we would start the analysis with the presumption that we can't trust our observations.

Last but not least the hypothesis doesn't fit with well known facts:

1) The VMS doesn't contain any corrections in form of deleted glyphs. If there are misspelled glyphs the scribe didn't care to scrape them out. This clearly contradicts the idea that we can determine errors. In my eyes the idea that the text was copied from a draft and that the scribe didn't understand what he was writing illustrates, how a problematic starting hypothesis leads to even more problematic conclusions.

2) Words with high mutual similarity are typical for the VMS. For each common word there is at least another one differing from it by only a single quill stroke. For example, in addition to the word <daiin> also the words <dain> and <daiiin> are present in the text. The existence of words with high mutual similarity to other words is quite normal for the VMS. To explain them as errors is therefore more than problematic.

3) The shift from Currier A to Currier B demonstrates that the VMS has an evolving vocabulary with no stable baseline. There's no "correct" form of the text against which errors can be measured — because the text is a process, not a product. You can't have errors in a system that has no fixed target.

4) The text perfectly fits into the available space. This is even the case for holes within the parchment or if a drawing of a plant separates a line into multiple parts. This indicates that the text layout responds to the layout of the page. Either, the text layout was made during writing or in addition to the holes, the text layout was copied. Therefore, this observation indicates that the text was adapted during writing and also that the scribe was the author.
An alternative idea would be that the VMS is a facsimile of the original manuscript. But this would mean that besides the text layout, the "errors" were copied.
(27-03-2026, 10:25 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.This argumentation is problematic for several reasons. First, it is circular reasoning to assume that we can detect exceptions for rules we still want to discover. Secondly, by using this hypothesis we would start the analysis with the presumption that we can't trust our observations.

There can be markers that potentially indicate systematic mistakes, even if the rules aren't fully known. For example, checking the relative frequencies of variations (based on the assumed mistranscription of a character, not on missing or additional characters). If there is a consistent pattern across many words, it could indicate something. 

(27-03-2026, 10:25 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.1) The VMS doesn't contain any corrections in form of deleted glyphs. If there are misspelled glyphs the scribe didn't care to scrape them out. This clearly contradicts the idea that we can determine errors. In my eyes the idea that the text was copied from a draft and that the scribe didn't understand what he was writing illustrates, how a problematic starting hypothesis leads to even more problematic conclusions.

It doesn't contradict it at all. In a scenario where a scribe does not know they are making mistakes, or what mistakes look like, why would they cross out what they have done? 

(27-03-2026, 10:25 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.2) Words with high mutual similarity are typical for the VMS. For each common word there is at least another one differing from it by only a single quill stroke. For example, in addition to the word <daiin> also the words <dain> and <daiiin> are present in the text. The existence of words with high mutual similarity to other words is quite normal for the VMS. To explain them as errors is therefore more than problematic.

Why is it problematic? Perhaps the mutual similarity is the result of mistakes. We don't know. 

(27-03-2026, 10:25 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.3) The shift from Currier A to Currier B demonstrates that the VMS has an evolving vocabulary with no stable baseline. There's no "correct" form of the text against which errors can be measured — because the text is a process, not a product. You can't have errors in a system that has no fixed target.

Again, you are conflating gradual vocabulary change with spelling mistakes, which may have different markers that can be detected.

One idea would be to consciously mark the most common words as "correct", and to search for potential variations that could represent mistakes. That is a good fixed target: common words/letter clusters that do not seem to gradually change throughout the manuscript. 

Of course we don't know if the form is correct. But we can guess, and conduct experiments following from those guesses. Are you not doing the same by assuming that vocabulary and writing style is gradually evolving? You only know it has "changed" because it is different from your previous baseline assumptions on the text.

(27-03-2026, 10:25 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.4) The text perfectly fits into the available space. This is even the case for holes within the parchment or if a drawing of a plant separates a line into multiple parts. This indicates that the text layout responds to the layout of the page. Either, the text layout was made during writing or in addition to the holes, the text layout was copied. Therefore, this observation indicates that the text was adapted during writing and also that the scribe was the author.
An alternative idea would be that the VMS is a facsimile of the original manuscript. But this would mean that besides the text layout, the "errors" were copied.

Or it means that the original text was retraced, so the positioning of everything would remain the same.
(27-03-2026, 10:25 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(27-03-2026, 08:06 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I cannot push aside the idea that the Author was old and poor and with failing eyesight when he decided to put his notes to vellum.  And that the Scribe may have been a bright 13-year-old nephew who agreed to do his uncle this favor for a pittance.  And thus the old man had to accept all the errors that the boy made.  And probably not even the Author himself could read his own handwriting of 30 years earlier...

This argumentation is problematic for several reasons.

Of course the circumstances of the creation of the VMS are still conjectural.  Even after we "decipher" the VMS itself, we may be able to figure out the origin of the abstract text, but we many not find in it any information about the creation of the physical object.

Quote:First, it is circular reasoning to assume that we can detect exceptions for rules we still want to discover.

I have some evidence that is independent of such assumptions, but most people here do not recognize it, so let's leave t at that for now.

Quote:the hypothesis doesn't fit with well known facts: 1) The VMS doesn't contain any corrections in form of deleted glyphs. If there are misspelled glyphs the scribe didn't care to scrape them out.

First, the ink of most of the VMS text is not iron-gall ink.  This is totally evident from the way it looks under infrared light, and is confirmed by the way it got erased by spills, e.g. on pages You are not allowed to view links. Register or Login to view. and f116v. 

The ink must have been like a watery watercolor paint: a suspension of a brown solid pigment in water with a bit of gum arabic or other water-soluble binding glue.  Like the blue and red writing on You are not allowed to view links. Register or Login to view..  (Iron-gall ink is the Ford Model T of inks: "you can have it in any color you want, as long as it is black".)

Iron-gall ink is notoriously hard to erase from vellum, and that is in fact why it was almost always used to write on it  The tannin from the oak galls binds chemically to the proteins on the vellum, and then, as the iron oxidizes in contact with the air, the two components turn the whole ink into an insoluble polymer.  That is why even freshly-applied oak gall-ink can be erased only by scraping away the vellum, down tho the bottom of its tiny pits.

Watercolor paint, in contrast, does not bind to vellum. It binds well to paper, by entering the gaps between the paper fibers; so that, even if the binding glue gets washed away, the pigment particles remain trapped there.  Whereas vellum does not have fibers with open spaces, like paper, so the paint will just sit on the surface until it dries; and then the pigment will be held in place only by the dried glue.  Moreover vellum is treated in manufacture to be slightly hydrophobic, so that the ink (of any type) will not be pulled away from the pen strokes by surface tension.

As a result, watercolor paint, whether applied with a brush or a pen, can be completely washed away from vellum by rubbing with a wet q-tip, without scraping.  As we can see on f116v.  

Thus there may have been many instances where the Scribe erased a word and wrote another in its place. We will never see those corrections.

Second, the BEEP BEEP BEEP, BEEP BEEP, so that any corrections by the original Scribe now are very hard to see.

Third, there are many instances of apparent "back-tracing" -- isolated glyphs or words that are darker than both the preceding and following ones, which can be best explained by the Scribe himself going back to a previously written word, with a freshly recharged quill, and retracing it.  In some cases the original can be seen sticking out from under the retraces (like that famous daiin on f1r) Sometimes the reason may have been that the original came out too faint or a bit crooked.  But in some places it seems that the intent was to correct the original glyph.  I owe you examples of the latter.

Quote:In my eyes the idea that the text was copied from a draft and that the scribe didn't understand what he was writing illustrates, how a problematic starting hypothesis leads to even more problematic conclusions.

The "Brain to Vellum" theory is viable only under the assumption that the text is meaningless gibberish that was generated by some complicated algorithm, and yet the Author-Scribe did not care about occasional mistakes in the application of the algorithm.  But there are many arguments against that theory, which we can discuss in the appropriate thread. Like the apparent necessity to write a determinate amount of text on a line, parag, or page -- a need which would not exist if the text was gibberish. Or the "big parags" on You are not allowed to view links. Register or Login to view. and f111r.

If the text has any meaning, it would have been insanity to write it directly to vellum.  Not just because the occasional "quillos" (typos, but with a quill), but because of major corrections like "insert this parag here", "switch the order of these two sentences", "break a parag here", which would be impossible to do on the clan copy itself.

I wonder whether there are any examples of surviving vellum manuscripts that were clearly written brain-to-vellum?  AFAIK most extant vellum manuscripts were written by professional scribes, which would have copied from either an original draft by the Author or from another professionally written manuscript.

Quote:Words with high mutual similarity are typical for the VMS. For each common word there is at least another one differing from it by only a single quill stroke. For example, in addition to the word <daiin> also the words <dain> and <daiiin> are present in the text. The existence of words with high mutual similarity to other words is quite normal for the VMS. To explain them as errors is therefore more than problematic.
 

I showed you recently an example of a meaningful text -- in a natural language, in the plain, with no errors -- with precisely that feature.  It is an expected consequence of the You are not allowed to view links. Register or Login to view.. Possibly even Bavarian could have that feature.  Or Latin, if written syllable by syllable.  And any text in any language would have that feature if encoded in a codebook cipher.

Quote:The shift from Currier A to Currier B demonstrates that the VMS has an evolving vocabulary with no stable baseline. There's no "correct" form of the text against which errors can be measured — because the text is a process, not a product. You can't have errors in a system that has no fixed target.

You seems to imply that the the transition from Currier A to Currier B is the result of the expected drift as your "self-copying" method feeds upon its own output and gradually mutates it.  But the transition is abrupt, not gradual.  It affected most words, but not in a random way. Some words, notably daiin, remained equally common across the change.  Within each language, there seems to be no evidence of that alleged drift.

Quote: 4) The text perfectly fits into the available space. This is even the case for holes within the parchment or if a drawing of a plant separates a line into multiple parts.

That is definitely not true.  There are hundreds of examples where the handwriting was clearly compressed before an intruding drawing or the end of a line.  And the m glyph, which is probably an abbreviation for some other ending, is more common in those places.

Quote:This indicates that the text layout responds to the layout of the page. ... this observation indicates that the text was adapted during writing

Of course the layout of the text responds to such accidents as page margins, figures, holes, creases, and bad vellum spots.  That is what a Scribe would do when copying meaningful running text (like a parag) from a draft.  Even if he did not understand a iota of the text (which is strongly suggested by the layout of f34r)

Quote:An alternative idea would be that the VMS is a facsimile of the original manuscript. But this would mean that besides the text layout, the "errors" were copied.

Even if the original was a clean copy, the Scribe who copied it may have confused similar glyphs.  Especially if he did not know the language.  See the bottom example on You are not allowed to view links. Register or Login to view..

All the best, --stolfi
(27-03-2026, 03:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view....

The transition is not abrupt. Cosine similarity analysis between folios shows the transition from Currier A to B "descends smoothly, almost linearly, with increasing rank" with "the complete absence of any sudden change in the slope" (You are not allowed to view links. Register or Login to view., p. 307). The intermediate forms <chol> → <cheol> → <cheo> → <chey> → <chedy> document a gradual transformation through minimal edits (You are not allowed to view links. Register or Login to view.). Words typical for Currier A exist in Currier B, but not the other way round. This is continuous evolution, not an abrupt switch. The folio vector analysis in You are not allowed to view links. Register or Login to view. (2018) independently supports this: their Voynich folio vectors "cluster into a single large group" with no definitive A/B separation.

The 20% error rate is not a finding. It is a measure of how far the actual text deviates from your expectation of what it should look like. But consider what "correcting" those errors would produce: substituting rare word forms with more common ones would increase overall text repetition, pushing the text further toward the excessive redundancy that already distinguishes Voynichese from natural language. The corrections would make the text less language-like, not more. This is self-defeating: the hypothesis that errors explain unusual forms leads to a "corrected" text that fits the language hypothesis worse than the original.
(27-03-2026, 08:34 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Cosine similarity analysis between folios shows the transition from Currier A to B "descends smoothly, almost linearly, with increasing rank" with "the complete absence of any sudden change in the slope" (You are not allowed to view links. Register or Login to view., p. 307).


That argument is flawed.  For one thing, the most common word, daiin, has similar frequencies on A and B: (~9% and ~4%) 
[attachment=14932]
The word count vector of each page is likely to be dominated by the "AB-words" -- daiin and few other words that are common in both languages.  Whereas the big difference between A abd B is a larger set of words -- the "A-words"  and "B-words" -- that occur in only one language, but  more rarely. So the vector is a few largish numbers, always in the same place, a few dozen 1s and a bunch of zeros.  

When you compute the cosine of the two vectors, the coordinates are implicitly normalized to have norm 1.  That makes the vectors nearly parallell to the AB-words general direction.  The sampling noise in the AB-word counts will swamp the differences due to the presence of A-words or B-words.  So it is no wonder that the distribution of all page-page cosines is like that or random vectors.

To check your claim, you should choose two sets of presumed A-words and B-words, and map each page to two numbers a,b which are the relative frequencies in the page of words from each set, and plot each page as a color dot.  Being combined counts, those numbers will be less affected by sampling noise.  

Then, if your claim is true, you should see a single cloud, as A-words get replaced gradually by B-words.  But many previous tests have shown the opposite: A-pages and B-pages form two clusters with few if any dots in between.

All the best, --stolfi
(29-03-2026, 07:01 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view....

Table 2 in Timm & Schinner (2020, p. 7) shows the raw frequency counts for seven diagnostic words across sections ordered by similarity. The transition is visible without any statistical processing: ⟨chedy⟩ increases from 1 to 210, ⟨chol⟩ decreases from 228 to 14, and the Astro and Cosmo sections sit squarely between Currier A and B on every word. These are not cosine similarities — they are raw counts. The gradient is in the data itself.

[attachment=14940]
Pages: 1 2 3 4 5 6 7 8 9