@dgs346 Thank you for taking the time to go into this more. I think I had your process slightly backwards in my head so your diagram and explanation really helped clear it up for me.
While the character mapping is interesting, I am most intrigued by how you are using your transliterations. I especially think the idea of finding ways to test their fitness. Your way of making subtle, but significant changes, then tracking that change and testing against it, seems at its core, a pretty damn solid way of doing that. I hope that in addition to the character mapping results you continue to share your thoughts on this part of the process as well.
(04-08-2024, 03:12 PM)A.Wilmarth Wrote: You are not allowed to view links. Register or Login to view.I hope that in addition to the character mapping results you continue to share your thoughts on this part of the process as well.
Yes, I will continue to share my results on GoodReads/Amazon, with links here and on voynich.net.
Another member of this forum has expressed interest in helping me apply programming skills and computing power to this process. This would accelerate the number of permutations of (transliteration) x (target language) that I am able to test.
In the meantime, I am selectively exploring additional transliterations for manual testing. For example, my v221 through v226 transliterations assume that the glyphs {o}, or {9}, or both, are position-dependent. The assumptions are, for example, that an initial {o} does not have the same function (or meaning) as an {o} in the final, medial or isolated positions. That is, an {o} is at least two glyphs, which are visually indistinguishable. To make them statistically distinguishable, I replace initial {o} with {ó}, all other {o} remaining as {o}.
Likewise, I am selectively exploring additional medieval languages for which I can find sufficiently large corpora. For example, a colleague shared with me
Uusi Testamentti (New Testament), the first written document in Finnish, dating from 1548. I am not sure that it retains the original spelling, in which for example the title was
Se Wsi Testamentti.
Here's an updated summary of our trial mappings of the Voynich "words" {8am}, {1oe} and {2c9}.
[
attachment=9022]
Trial mappings of {8am}, {1oe) and {2c9} to text strings in medieval Arabic, English and Italian. Author's analysis.
Further thoughts on the {8am} strategy for mapping the Voynich manuscript: now including mappings of selected "words" to medieval Galician as represented by
Crónica Troiana.
You are not allowed to view links.
Register or
Login to view.
To count the resulting Galician words, I used
Corpus Xelmirez (You are not allowed to view links.
Register or
Login to view.), a corpus of medieval Galician developed by Dr Xavier Varela Barreiro of the Instituto da Lingua Galega in Santiago de Compostela. The website of
Corpus Xelmirez does not indicate how many words are in the corpus; it's probably several million. I recently visited the Instituto da Lingua Galega and will resume correspondence with Dr Varela Barreiro.
[
attachment=9250]
Selected mappings of the "words" [8am}, {1oe} and {2c9} to words in selected medieval languages. Author's analysis.
There is a prior manuscript edition of
Crónica Troiana, dated 1393, written by Fernán Martis as a translation from the French
Roman de Troie by Benoît de Sainte-Maure . I have a scanned pdf of the Martis manuscript; it has OCR text but the OCR is not usable. From a comparison of the first pages of the 1393 and 1490 editions, it's clear that the printed edition expanded the earlier abbreviations and concatenations, for example:
- "oconto" in 1393 became "o conto" ("the story") in 1490
- "q̃ndolles" in 1393 became "quando lles" ("when to them") in 1490.
[
attachment=9251]
Crónica Troiana: the first three lines of the text (folio 9r), in the 1393 manuscript and the 1490 printed edition.
Whenever time permits, I have the idea to reconstruct a digital text of the Martis manuscript by identifying the most common abbreviations and concatenations, and reverse-engineering them from the 1490 text.
Given that several samples of the Voynich vellum were carbon-dated to the first half of the fifteenth century: it seems to me that at least two hypotheses are permissible with regard to the nature of the source documents:
- that the text was written in the second half of the fifteenth century: in which case, printed documents would have been available in many cities in Europe (and I conjecture that the Voynich producer might have preferred to use printed documents, for their low cost, uniformity and ease of distribution)
- that the text was written in the first half of the fifteenth century, in which case the Voynich producer would have had only manuscripts as source documents.
Hence, in the context of my evaluation of Galician as a possible precursor language: here's a further examination of the differences between the 1373 manuscript and the 1490 printed edition of
Crónica Troiana.
[
attachment=9255]
The first six lines of the Galician epic story Crónica Troiana, as they appeared in the 1393 manuscript by Fernán Martis and in the first printed edition of 1490. Image credits: Fernán Martis, additional graphics by author.
Here's a revised set of provisional mappings of the common Voynich "words" {8am}, {1oe} and {2c9} to selected medieval languages. In the case of medieval English, this revision is based on the full text of the Auchinleck manuscript (published in the 1330s), which has 348,998 words.
[
attachment=9274]
Selected mappings of the Voynich "words" [8am}, {1oe} and {2c9} to words in medieval Arabic, English, Galician and Italian. Author's analysis.
Some further thoughts on doubled letters in medieval European languages, and why doubled glyphs are rare in the Voynich manuscript.
You are not allowed to view links.
Register or
Login to view.
[
attachment=9303]
The frequencies of the most common doubled letters in selected medieval European languages. Frequencies are calculated as counts of bigrams, divided by total counts of letters in the respective corpora. Author’s analysis.
Correction to table in previous post:
[
attachment=9306]
The frequencies of the most common doubled letters in selected medieval European languages. Frequencies are calculated as counts of bigrams, divided by total counts of letters in the respective corpora. Author’s analysis.