The Voynich Ninja

Full Version: "The Currier languages revisited" revisited
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
(20-12-2025, 03:49 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.
Quote:I mean, suppose folio A and folio B are correlated under metric C (of whatever the proper lingo is). So what?

It means that they are similar on some dimension.

Thank you for the explanation, but the problem for me is not in the math part of it, I probably can even code PCA from scratch if needed. I think I once implemented SVD for an embedded platform in pure C, but it was a long time ago.

The question is with the further implications from the similarity. Similarity can emerge in many possible ways, even purely random sequences can show some similarity by chance. It's good that some of the studies use some form of shuffling as the controls, but this only proves that the similarity is not spurious (unless the similarity metric was fine tuned in the first place). It doesn't show whether it's intentional or a byproduct of some other factor. If the Voynich Manuscript is a plaintext in some language, then it should be possible to identify topic or language similarity, but if it's not, then I see no way to meaningfully interpret the results of PCA at all.
I find the subject of an A/B switch vs a progressive drift very interesting. It would be great if this problem could get a clear answer.
I hadn't previously realized that Scribe 4 apparently played a major role in the A/B transition or switch.

As always, it is likely I made errors, but I tried computing the frequency of the bigram ‘ed’ in each zodiac page. It’s lucky that the order of these pages is known. I processed Circle and Label text both separately and together. As a reference, the average frequency of 'ed' in Currier B is ~4%. 


[attachment=13072]
This is consistent with my own results.

I wouldn't call this either A or B language.
Rather than using % of 'ed' bigrams, I did the analysis on You are not allowed to view links. Register or Login to view. using % of words including 'ed'.

The foremost distinction is the following: on Herbal A and Pharma pages this is <1%
On Herbal B, stars and Bio pages this ranges from 16% to 28%, leaving a big gap.

The remaining pages (astro/cosmo/zodiac) are in this gap, with the zodiac around 6%.
I have called this C language.
Do we see the same sort of results with chd and lk
It looks pretty similar (to Marco's chart) using search on the voynichese site, but I don't have anything ready made to test. Would appreciate if someone who does could check Smile
(20-12-2025, 09:03 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.
(20-12-2025, 08:35 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.If I understand correctly, an implication is that Currier A (as defined by the left-side cluster in the plots) is coincident with Scribe 1 plus Scribe 4. So the A/B split can be entirely explained by the different scribes (but for those three herbal pages)?
With the caveat that Scribe 4 also does one side of the big Rose foldout and those pages all cluster on the B language side with the text on Scribe 2's nine-rosette diagram, yes.

To which Marco replied

Quote:Ah, thank you. So Scribe 4 wrote both A and B pages, and there is no simple explanation for the two clusters. That's more interesting and more like the Voynich (nothing is ever simple).

Looking at Lisa Fagin-Davis' "How Many Glyphs and How Many Scribes?" paper, I see I need to make a correction and a further caveat -- she says, "Quire 14 is the famed 'Rose' foldout, with six panels on the obverse written by Scribe 2 and the nine-segment Rose on the other side apparently (but not definitely) written by Scribe 4."

So a) I had Scribes 2 & 4 flipped with regard to which side of the foldout sheet each did, and b) it could still be the case that Scribe 4 only works in a variant of the A language (subject to the debate about the status of Zodiac labelese) since Lisa leaves some wiggle room regarding whether or not Scribe 4 did the rose diagram.

If Scribe 4 *did* do the rose side of the big foldout, that is interesting in and of itself because it implies

1) that he/she knew there were distinct A & B languages/dialects, and 

2) made the deliberate decision to write the rose diagram non-label text in a B language dialect to go with Scribe 2's use of a B language dialect for the non-label text on the other side of the sheet rather than in the A language dialect Scribe 4 wrote the Quire 9-12 circular diagram/zodiac page non-label text in.
(20-12-2025, 01:11 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.Potentially dumb question but how would PCA look like in 3 dimensions?
Do we get any additional resolution of clusters compared to the usual 2D projection?

It's not a dumb question at all, it's just been a while since I last used GnuPlot's "splot" command. If anyone else wants to take a shot, I've attached data files for the 1st 3 PCA features with labels (53.7% of total covariance) and without labels (56.7% of total covariance) included. f116v is left out because the only text line in the transcription is

<f116v.1,@Lx>    oror.sheey <!valsch vbren so nim gaf mich o>

and that results in the data point having a massively outlying value for the 2nd (and it looks like 3rd) coordinate:

8 f116v__STR_3  -0.0283306258  +0.1906012118  +0.1632490594

Here's a side view of the point cloud for the no labels data:

[attachment=13077]
(20-12-2025, 12:20 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Scribe 4 wrote all the pages with circular diagrams (except IIRC f57v).

Unless I'm misreading Lisa's paper, Scribe 2 did f85r2 and f86v4.
(21-12-2025, 03:16 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.Unless I'm misreading Lisa's paper, Scribe 2 did f85r2 and f86v4.

Correct!
Is the plot correct ?

[attachment=13078]
Sorry if this has been asked before:

Can we check whether the big difference between A and B is merely a change in the spelling system?

Imagine that someone is writing in German and at some point realizes that it is silly to write "sch" and "ck", so he starts writing "sh" and "k" instead.  Since the letters {s,c,h,k} occur in other combinations, this spelling change will affect the digraph frequencies in a complicated way.  

If the VMS Author decided to change the spelling system, it must be because the new one is more efficient (as in the hypothetical German example above).  How do the token lengths of A and B text compare?  

The replacement of "od" by "ed", observed by Jacques Guy, would not save glyphs but would save pen strokes.  So we can ask the same question above, but counting strokes instead of glyphs:
  • i e n(?) = 1
  • q o a y Ch Ih d m r s l k t p f = 2
  • Sh = 3
  • CTh CKh etc = 4
  • CTHh CKHh etc = 5

(Better focus just on normal text first, that is, parag and circular text.   Including labels etc and pages with little text only confuses things; we can look at them later once we figure things out for normal text.)

Has anyone tried to check whether there is a systematic (or semi-systematic) mapping of the A lexicon to the B lexicon that makes one language more similar to the other one?  Like, say, "replace od by ed in every word, and suppress the platform of CTh etc if it is followed by y or aiin"? (This is just an example of what a "systematic mapping" would be like; I am not proposing this specific rule.) 

Beware that a mere change of subject can drastically change the frequencies of words; and hence the frequencies of digraphs, because these are determined largely by the most frequent words.  For example, the word "herb" should be fairly common in a herbal (like the "alchemists herbals") but absent in an astrological text.  Then the digraph "rb" should be much more common in the former than in the latter.

All the best, --stolfi
Pages: 1 2 3 4 5 6 7