The Voynich Ninja

Full Version: "The Currier languages revisited" revisited
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
I have quite a lot of data on this, which I have started to summarise, but this is still far from ready to publish. I can say with confidence that it is not a matter of spelling change. I am also reasonably sure that it can at some point be explained with a few 'rules'. But I haven't found them yet.

What it looks like most to me, is that all frequent A words continue to be used in B, but there is a whole set of new words. This causes a shift in frequencies of these frequent A words of course.
Is it reasonable safe to say that B is more advanced and this likely was invented after A?

Thank you for the 3D plots, I do think they show a more distinct separation of topics which was not visible in the 2D projection. One neat way of visualizing 3D spaces is combining images of different rotation angles of the cube in an animated .gif

One more question if you have time - what happens when you label scribal hands instead of topics? Maybe it's possible to combine both in each data point (symbol for topic / color for Hand) or use dots with an outer ring so each data point has 2 attributes? Unfortunately I don't have time to play with the data at the moment...
(21-12-2025, 02:44 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.files for the 1st 3 PCA features

Thanks for posting the coordinates.

A shortcoming of static perspective plots is that they just show a different, oblique 2D projection. Each point could lie anywhere along the line of sight, so the 3D content is illusory.   One low-tech way to hint at the true positions is with lollipop markers that explicitly project each point to the PC1-PC2 floor.  For the no-labels cloud,
[attachment=13080]
Following @Bernd's suggestion, I tried an easy online You are not allowed to view links. Register or Login to view.:
[attachment=13082]
(21-12-2025, 04:03 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.what happens when you label scribal hands instead of topics?

I once created something similar using Stylo (2D):
You are not allowed to view links. Register or Login to view.
(21-12-2025, 04:03 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.Is it reasonable safe to say that B is more advanced and this likely was invented after A?

*Sigh* It's treacherous to interpret geometry in some feature space in terms of history/process, but if space aliens put a raygun to my head and forced me to bet, that's how I'd go. Not so much Herbal B, but definitely Starred paragraphs B, and especially Bio B. If I had to guess: whatever the underlying "system" is, the early Herbal A bifolia are the original form. I'd have to look closely at Rene's dialect assignments to be sure, but I'd potentially go so far as to suggest that the initial three pure Herbal A/Scribe 1 quires may be the oldest layer and got bound together because they had always been together (even if not nested). The reason everything isn't in Herbal A is because there is something wrong with it -- some weakness in the "key" used that offers a chink into breaking the "system". Scribe 1's effort to fix this is the dialect used in the Pharma pages and the Herbal A pages close to them in both bigram frequency space and the binding. Scribe 4's effort is the non-label dialect used in the Astro/Cosmo & Zodiac quires. Scribes 2 & 3 go in a different direction, with Herbal B being the initial cut that gets refined (whether sequentially or in parallel) into the Starred paragraph pages & Bio pages dialects.

Quote:One more question if you have time - what happens when you label scribal hands instead of topics? Maybe it's possible to combine both in each data point (symbol for topic / color for Hand) or use dots with an outer ring so each data point has 2 attributes? Unfortunately I don't have time to play with the data at the moment...

With the exception of the "rose" diagram, the Herbal A/Pharma/Astro/Zodiac PCA plot from earlier in the thread essentially does that for Scribes 1 & 4. Looking at Scribes 2 & 3, I agree looking at Herbal B/Scribe 2 vs. Bio/Scribe 2 vs. Herbal B/Scribe 3 vs. Starred para/Scribe 3 (vs. Rose/Scribe 2 as well maybe) might be illuminating, but I'm not sure how quickly I'll get around to it. I spent *way* too much time thinking about the Voynich last week. If only to get closure and make progress on actually finishing the dang thing, I'll probably work on getting Part 1 of "Deciphering Murder", my Voynich Mss./Midsomer Murders mashup fanfic, ready for posting in the "Voynich Talk" subforum over the holidays. (If I had any idea it was going to turn into a novella-length work, I never would have started...hopefully people will enjoy.)
Hello all,
this thread is very interesting. I would like to share a couple of plots from some automatic topic-detection work I did last summer. The "topics" detected are, in my view, more like stylistic clusters than semantic topics, but I think they can still provide a useful summary. That's why I post this here, as a summary of language differences.

The first plot shows folios labelled as Currier A (left) and Currier B (right). Within each side, the X-axis lists folios in ascending numeric order. Each colour represents a different topic cluster. In Currier A we mainly see two dominant styles (brown and green), with three folios that stand out as unusual for A (if I am not mistaken: f56, f58, and f65r), where orange, red, and violet topics appear instead.

[attachment=13087]

When we divide the same data by writing hand, we see that Hand 1 appears to have a very distinct stylistic profile, and Hand 4 also stands apart, while Hands 2, 3, and 5 show stylistic distributions that are more similar to each other.

[attachment=13086]

I believe these results are consistent with the patterns discussed in this thread, and they may offer an additional point of view about them.
(21-12-2025, 11:53 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In Currier A we mainly see two dominant styles (brown and green), with three folios that stand out as unusual for A (if I am not mistaken: f56, f58, and f65r), where orange, red, and violet topics appear instead.

f58 & f65 are the halves of a Scribe 3 bifolio that Currier(? D'Imperio?) initially classified as A language back in the day. Best guess is it has anomalously low frequencies of the main bigrams used to classify A/B in the early days while overall looking like B pages. Otherwise you have a unique example of someone other than Scribe 1 writing in Herbal A. I've been meaning to dig up D'Imperio's early cluster analysis paper to see if she flags those pages as sticking out.
(20-12-2025, 08:35 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I would be curious to see a version of the plot colored by Scribe/Hand. 

As always, it’s possible I made errors. I used the You are not allowed to view links. Register or Login to view. shared by Karl here to re-create the plot in You are not allowed to view links. Register or Login to view., colored by hand and adding labels.

It seems the two clusters match the hands very well.
Scribe4 has pages in the two different clusters. The two pages that end up in the B cluster on the right are the Rosettes page I think (f85v1, I could never understand exactly page numbering for the large foldout) and the last zodiac page: Sagittarius (f73v).
Scribe3 has a single “A” page: f58r. The position of f58v, just across the gap, is also interesting. See also Karl's comment #37 just above.

You are not allowed to view links. Register or Login to view., the zodiac pages show a drift from “close to A” to “close to B” (all in the range of Rene’s C/Cosmo intermediate language). You are not allowed to view links. Register or Login to view. shows that the drift takes place both in circle and label text.


[attachment=13092]
(20-12-2025, 04:10 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.then I see no way to meaningfully interpret the results of PCA at all.

First, there is nothing magic about PCA.  The goal is to find a projection of the N-dimensional raw data space (the frequencies of N chosen digraphs or words on each page) onto a 2- or 3 dimensional space, so that we can visualize the clustering of pages by section, language, or whatever.  PCA gives the 2 or 3 axes in N-space along which the cloud of points has maximum spread; but those are not necessarily the best axes for the purpose.

Back in the mailing-list days I did such an You are not allowed to view links. Register or Login to view. using the frequencies of 50 chosen words, or frequencies of 'elements" like a, y, iin, Ch, Che, etc.  I picked the projection axes specifically to maximize the distance between the clouds of certain sections.  You are not allowed to view links. Register or Login to view. are the details.  The results are broadly consistent with the posts above.  Namely:

* Each section (counting Herbal-A and Herbal-B as separate sections) produces a relatively compact cluster that is visibly distinct from other clusters.  Thus the A/B split is merely the most dramatic difference, but similar distinctions exist between sections within each language

* It is not surprising that word frequencies are different in each section.  Even for the most common words, which may or may not be "function" words like "much", "is", "find", "good"; not to mention "content" words like "herb", "star", "blood", etc.  

* If word frequencies change, digraph frequencies will change too, since they are determined by the digraphs that appear in the most common words.  As I mentioned before, "rb" is probably much more frequent in an herbal text, (Latin or English) than in a text about astrology.  (Unless the latter it talks a lot about "orbits"...)

* What is surprising is that (IIUC) Herbal-A differs from Herbal-B noticeably more than either differs from Bio or Stars. Thus, besides the difference of topic, we indeed have a difference of language or spelling (or encryption).  Maybe the texts of Herbal-A and Herbal-B were taken from sources in two different dialects, or two very similar languages.  Like Northwest Lower West Bavarian and Southwest Lower West Bavarian...

* Chances of getting a useful insight from these analyses will improve if one uses fewer data so as to reduce the number of factors that affect them. Like comparing Herbal-A and Herbal-B only, thus hopefully eliminating the "topic" factor.  Then maybe one can figure out whether the difference is a change of spelling, or something else.  Once one gets some insight on Herbal-A vs. Herbal-B, one can then consider what is happening in other sections.

* And the chances of obtaining useful insights will improve a lot if one uses the good old "scientific method": make an hypothesis, then devise the simplest and most effective way to test for it, and do that.

Al the best, --stolfi
(22-12-2025, 07:49 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.* And the chances of obtaining useful insights will improve a lot if one uses the good old "scientific method": make an hypothesis, then devise the simplest and most effective way to test for it, and do that.

I think this is the missing piece for me in all this discussion about PCA, Currier languages, etc. Without a hypothesis and clearly stated assumptions PCA cannot be an argument for or against anything. PCA results are just a bunch of numbers with zero explanatory power by themselves. What is more, I think a popular approach to the manuscript where someone would just run a bunch of statistical tests and then try finding an explanation for the results is akin to divination, with graphs and charts replacing the crystal ball and tarot cards, but essentially it's the same attempt of finding pieces of a coherent story in random data and then creating a plausibly looking narrative based on those.
Pages: 1 2 3 4 5 6 7