The Voynich Ninja

Pages: 1 2 3 4 5 6

I've thought about one of these problems (that of the line patterns) and think it could be explained by the transition to literacy. If the spoken language showed sound changes based on neighbouring words that may have been recorded in writing by altering the spelling of those words. But the line break might have been considered a "true" break by the writer and the sound changes weren't written. It's a bit complicated to explain, and only a suggestion. I wrote about it a bit on my website a few years ago.

(12-05-2020, 01:13 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.
(12-05-2020, 11:49 AM)elieD Wrote: You are not allowed to view links. Register or Login to view.I like the idea that the manuscript is two people transcribing what an other was saying, without them necessarily knowing the language of this third person

I like the idea, but I'm not sure I like its implications. Transcription of a language one doesn't understand is almost always fairly lossy. I once briefly worked for a hospital that hired a second-rate medical transcription service. This service was based in a foreign country, where English is spoken widely but not natively. The company clearly did not vet its scribes for fluent English comprehension. Some of the mistakes I would find in the transcriptions of my notes were subtle and understandable, but made a crucial difference in the meaning. The issue was usually the scribe being unable to reliably distinguish between very similar sounds, that his/her native language didn't distinguish. "He had a pain" became "He had a pen", in one memorable example. Usually these problems could be sorted out and corrected by a native speaker who knew the context well. But when we're dealing with the VMs, we have very little context, and no indication of whether there are any native speakers of the original language behind it. If Voynichese really is a case of mis-transcription of a poorly understood language, it's not at all clear to me that the original text is recoverable.

" it's not at all clear to me that the original text is recoverable." => And this could exactly be why hundreds of years later there's still people talking about it
It's absolutely not satisfying to think that it could be impossible...but it could be.
Though I think that with large advances in Artificial Intelligence we will be able to come up with some translations.
At this point it's very useful to ask ourselves, not just what the VMS means, but also why no one succeeded in its translation.

I would like to know what was the usage during Middle Age when you found someone with an unknown language. Let's say a guy speaking old Turkish ends up in front of two french monks in France. Could it be possible for them to think: "This guy is speaking an unknown language, we will write what we hear and invent a new alphabet?"

TLDR: It seems that dialect differences are not entirely explained by different scribes: different sections (as defined by illustrations and layout) seem to also play an important role.

While we wait for the announced paper by Prof. Claire Bowern, I created a few plots along the lines of what Rene did at the end of You are not allowed to view links. Register or Login to view., as a quick way to look into the relationship between dialects and scribes. As always, I may have made errors: be careful.

I used the Zandbergen-Landini EVA transcription (ZL_ivtff_1r.txt) and converted it into CUVA. I ignored dubious spaces and all non-paragraph text.
Each dot in the graphs represents a whole page (both sides). Each graph is repeated twice: on the left, labels represent scribes (according to Lisa's analysis); on the right, labels correspond to page numbers.

Dot shapes and colours are exclusively based on Table I in You are not allowed to view links. Register or Login to view. (not on the classical Currier A/B classification, nor on image-subjects from the ZL file). I tried to use colours similar to those used by Rene in the page I linked above.
Squared dots correspond to Hand 1.
Red is for Botanic Scribe1 (which should likely be the same as Currier A Herbal pages), Yellow for Pharma/Recipes.
Green diamonds: Astro / Cosmo / Zodiac / Rosettes pages (several pages where excluded because they don't contain paragraph text).
Light-blue circles: Botanic by other Scribes than 1. The fact that You are not allowed to view links. Register or Login to view. is assigned to Scribe 1 has no effect here, because the verso contains no paragraph text so only You are not allowed to view links. Register or Login to view. was processed.
Dark-blue circles: Stars / Q20.
Purple circles: Balneological / Biological / Q13.

This first graph shows the % of CUVA: OL / EVA: ol on the X axis and CUVA: ED / EVA: ed as Y.
[attachment=4377]
The Y axis nicely separates Scribe 1 from the rest, and Currier A from B (think of a threshold at ED%=1).
The exceptions are:

The Astro/Cosmo pages by Scribe 4, which appear to be Currier A.
More interestingly: f58, by Scribe 3. This is a page we recently discussed in You are not allowed to view links. Register or Login to view. because its words appear to be more varied than any other page. Here it is marked as a Botanical page, but actually the only illustrations are marginal stars like those in Q20 (also by Scribe 3). In the ZL file, both You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. are classified as Language A. But You are not allowed to view links. Register or Login to view. found that they should be classified as B. Can we say that Scribe 3 wrote a Currier A page?

I think that the separation between pages by Scribe 2 in the Botanical and Bio sections is significant: the variation in OL% apparently depends on the section (i.e. illustration subjects and presumably text content, if the text is meaningful) rather than the scribe.

About the Currier B cloud at the top of the graph: the purple Bio pages (Scribe 2) are separated from the rest (mostly because of higher X, OL%, values). HerbalB (by 2, 3, and 5) and Stars (by Scribe 3) are mostly overlapped.

Second graph. X: Cuva: SO / EVA:cho. Y: Cuva: DY / EVA:dy.
[attachment=4376]
This confirms most of what can be seen in the first graph, but less clearly. It is still possible to separate Currier A and B. f58 and Scribe4's pages still fall within the "A" cloud. But here the separation of the Bio/Balneo pages from the rest of Currier B is lost: all B pages are mixed together, with low SO% and variable values for DY%.

Third graph. X: Cuva: OR / EVA:or. Y: Cuva:HO / EVA:qo.
[attachment=4375]
Here the separation between A and B is lost and the pages form a single V-shaped cloud, with the two arms corresponding to A and B and many pages from several sections clustering at the bottom-left. f58 is at the centre of the graph: if this does not confirm that it should be regarded as A, it does not suggest that it is B either.
Possibly the most interesting thing here is that (as in the first graph) the pages by Scribe 2 are split into two groups (Botanical and Balneo/Bio) separated by the Stars pages (Scribe 3).
Balneo/Bio have frequent occurrences of 'qo' and 'ol', while Botanical pages by the same Scribe have low values for 'qo' and mid-range values for 'ol'. Apparently, we cannot just say Scribe 2 has a preference for writing 'qo' and 'ol'.

This is very interesting Marco, and extremely informative. A lot of information to digest.

First of all, I regret that I have not yet updated the IVTFF transliteration files with the information of Lisa's hands. I have everything prepared, but just not yet done it.
This would have saved a little bit of work for this exercise.

I appreciate that you have largely used the same colours as I did, because that makes it easy (at least for me) to interpret the figures. I too am intrigued by folio 58 (r and v). Following the criterium of presence of the bigram ed (ED in Cuva), this folio is clearly Currier language A.
From the various plots, it 'language' is closest to that of the Pharma pages.
A good test for this would be to check the occurrence of eo on one of the two scales.

On the biological (Q13) and stars (Q20) pages it is interesting to see how the two folios of the same bifolio are related to each other. For example, in your figure 1, the two 'detached' pink circles 79 and 80 are together on one bifolio.

The biggest 'problem' in this type of analysis is that there are too many different graphs that one could make.

Herbal B is illustrative of the whole issue: the subject should be the same as Herbal A but the writer is the same as Q13. The Herbal B specifically differs from Herbal A in percentage of [ol], which cannot be a subject difference, yet Q13 tracks Herbal A in the percentage of [ol] so it cannot be a scribal difference.

It's almost as though Scribe 1 wrote Herbal A while Scribe 2 wrote Q13, both using the same spelling/dialect/system, and later Scribe 2 wrote Herbal B using a different system. Yet it can't be that simple as Scribe 1 must have changed their system to write Pharma as that showed differences from Herbal A (which aren't demonstrated here).

It's almost as if a scribe, when they sat down to write a section, just made changes to the system. Neither subject nor dialect are to account for the difference.

Following Rene's suggestion, I added CUVA:EO EVA:eo (but not including EVA:eeo which is represented as CUVA:UO). Here it is displayed together with CUVA:SO / EVA:cho.

[attachment=4380]

The new measure separates the Pharma/Recipes section from (most of) Herbal A, showing the variability in Scribe 1's output mentioned by Emma and not detectable from the previous graphs. The bizarre f58 is here shown not to be similar to Pharma/Recipes after all: it appears closer to the main group of Currier B. But from the first graph in the previous post, as also observed by Rene, we know that f58 greatly differs from Currier B on the basis of EVA:ed.

I have also tried using Principal Component Analysis to map the seven dimensions (CUVA:ED OL DY SO HO OR EO) to only two dimensions that can be easily plotted. All the measure are homogeneous (bigram percentages), so I did not apply any normalization. Since this is the first time I experiment with PCA, I am particularly doubtful of the correctness or meaningfulness of what I did. At least the graph seems comparable with the information provided by all the others.

[attachment=4381]

Principal Component 1 (Y axis) nicely separates Currier A from B. If what I have done is correct, this graph does not seem to show a true continuity in the dialects: there are noticeable gaps.
PC2 shows different things for the two areas:
Currier A at the bottom is split in three separate clusters: most Herbal A pages on the left; f58, f51 (the red square whose label is hidden) and Astro/Cosmo pages at the centre; Pharma/Recipes and some of the HerbalA pages that are bound near the Pharma section on the right. Interestingly, the small group at the centre was written by three different scribes (1, 3 and 4).
Currier B (top half of the diagram) also shows a good deal of variability along PC2, but here it seems impossible to discern groups: HerbalB, Q20 and Bio/Balneo all overlap. Also in this case, the variability does not seem to clearly correspond to scribes or sections.

Has it been remarked before that the Herbal A near the Pharma in the manuscript are similar to Pharma in textual characteristics? Their being bound apart from the other Herbal A folios is maybe not a mistake? But then f93 seems to be quite normal.

Were the final Herbal A folios written at a different time from the main body, and at the same time as the Pharma? Did they figure out something was missing by the time they got to that section?

This is really excellent stuff!

From the close correspondence with my earlier results, I doubt that there is any major error in what you have done.

The added dimension of Lisa's hands brings significant additional information.

One problem in the interpretation of these figures is, that pages with relatively little text will have larger error bars, and we can't see that. The precise location of these pages is not as accurate as those of pages with a lot of text.

With respect to the scribal hands, in my opinion (and based on the PCA plots) scribe 5 has his own little area on the right-hand edge of the language-B pages, while scribes 3 and 4 can hardly be distinguished from each other, but are concentrated below the scribe-2 language-B pages.
I consider this a weak confirmation both of Lisa's hand identification and the suggestion that different scribes generate different text properties. However, the second point is a correlation-type correspondence which may not be causal, but can be the consequence of another external cause/reason.

What this also shows (again) is that the herbal-A pages near the end of the MS, and near the pharma pages, are somewhat different from the other herbal pages and closer to the pharma pages in terms of text properties. That shows that these pages have not become dislocated by pure chance. At least some of them...

Thank you, Emma and Rene!
Your questions and observations make me even more eager to read Bowern's paper: I am sure it will add more ideas for further research.

The difference between You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. (same bifolio) pointed out by Emma looks like another interesting anomaly. Herbal pages do not have that much text and, as Rene says, the results are not reliable. But voynichese.com shows that f.93v and and f.96v look dramatically different, for instance with respect to 'or' and cho/sho. On the other hand, the two "r" sides seem to be comparable.

It could be that these fluctuations are random noise and the position of f93 is accidental: the fact that f87, 90 and 97 are close to the Pharma group seems less likely to be accidental.
Unreliable data due to short text could be the reason why Botanical pages (both A and B) appear to be more spread than pages from more text-intensive sections (e.g. Baleno/Q13 and Stars/Q20). One could consider experimenting with bigram counts for whole bifolios, instead of pages. Another possibility is following the steps of Julian Bunn and process character statistics, instead of bigrams. Both options would double the number of samples from each page. I might play around with these options in the future.

Coming back to the PCA graph, in the Balneo/Q13 section, f80 seems to stand out: this appears to be due to You are not allowed to view links. Register or Login to view. being half as frequent as in the other pages of the section (f79, in the same bifolio, also has a lower rate). In this case, the quantity of text is considerable.

I didn't notice that pages by Scribe 5 concentrate on the right-side of the B-cloud in the PCA plot: thanks to Rene for pointing that out! Unluckily, I cannot understand why it is so: PCA results are not easy to interpret. The individual values for the pages are rather different, for instance f41 appears to be closer to the Balneo section than to the other pages by Scribe 5. Could their similar position on the right border of the cloud suggest that their features share some prominent geometrical property (e.g. constant ratios across some couples of features?).

I agree that both Scribes and illustrations appear to be correlated with changes in dialects. It will be interesting to see where the new path opened by Lisa leads...

(19-05-2020, 08:48 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Following Rene's suggestion, I added CUVA:EO EVA:eo (but not including EVA:eeo which is represented as CUVA:UO). Here it is displayed together with CUVA:SO / EVA:cho.

The new measure separates the Pharma/Recipes section from (most of) Herbal A, showing the variability in Scribe 1's output mentioned by Emma and not detectable from the previous graphs. The bizarre f58 is here shown not to be similar to Pharma/Recipes after all: it appears closer to the main group of Currier B. But from the first graph in the previous post, as also observed by Rene, we know that f58 greatly differs from Currier B on the basis of EVA:ed.

EVA-[-eo-] and EVA-[cho] are not independent from each other. EVA-[-eo-] occurs in tokens like [cheol] and the sequence EVA-[cho-] in tokens like [chol]. This is also visible in the graph since the top-EO% folios as well as the top-SO% folios belong both to Currier-A. As a result it looks as if there is a gap between [cho]-folios and [cheo]-folios ,whereas this is in fact only the difference between tokens like [chol] and [cheol].

This overlap between [-eo-] and [cho-] also affects the outcome for bifolio 93/96. On folio 93v we can find tokens like [chody], [cthody], [chol], and [cthol] (see network graph for You are not allowed to view links. Register or Login to view.), whereas folio 96r uses tokens like [chol], [cheol], and [sheol] (see network graph for You are not allowed to view links. Register or Login to view.).

Pages: 1 2 3 4 5 6

Emma May Smith

elieD

MarcoP

ReneZ

Emma May Smith

MarcoP

Emma May Smith

ReneZ

MarcoP

Torsten