The Voynich Ninja

Full Version: Lingua Volgare Shorthand
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
(03-03-2025, 02:02 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.Two Italian solutions added to You are not allowed to view links. Register or Login to view.in 24 hours!  But very different from each other, so thank you for that. 

Your two main docs are much better papers than what we are usually presented with.  

But weirdly, that makes me feel I expect more from your solution.  There seems to be a disconnect between the level of confidence ("the key we provide here finally allows for an identification of the language of MS 408") with the amount actually translated, which seems to be only three sentences if I've read correctly.  This is by no means unusual for a Voynich solution but you do seem aware of things like confirmation bias and insufficient evidence, so it feels jarring. 

Three areas that came to my mind as potential issues:

Lack of translated material
You say the main problem is that none of your predecessors have gone behind individual words/lines and deciphered longer passages of text.  It is certainly a key problem, if not the main problem.  But I've not been able to tell why your work is different.  Three sentences isn't enough to distinguish a solution from all the others.

You say it hasn't been shown before that a language proposition is effectively transferable to all parts of the manuscript.  Am I misunderstanding this?  It seems impossible to tell this from only three sentences.  Or perhaps you have more than three since you mention others are in the Supplementary Info, but the excel seems to be only isolated words?

More sentences are also needed to grapple with one of the other key problem seen in all solutions:  having too many degrees of freedom, which Marco mentioned.  If you base the solution on abbreviation, isn't this problem exacerbated?  If the text is meaningful, I'm personally very partial to abbreviation being involved (bearing in mind its other problems), but it does give your system a lot of flexibility.  Identifying both context and more consistent trends of abbreviation would assist with this problem, but you can't do that with only 3 translated sentences. 

I think this may also be a problem for your explanation on Claire's Criterion 3 (grammatical material).  Three sentences, especially ones unrelated to each other since they are drawn from separate quires, doesn't seem enough to establish proof of grammar.  You seem to rely a lot on isolated words, but I don't think they can demonstrate grammar without being read in a sentence.  This is an issue we've seen in at least one other solution.  For example, I could say that the qokaiin, qokain, qokedy, qokeey, qokeedy etc cluster represent different inflections of the Latin word puella.  Even if I show a rough correspondence between the glyphs and the puella inflections, it's not proof of syntax until I can show that the plural genitive appears in a sentence where the plural genitive makes sense, and repeat for the other inflections.

Claire Bowern's views (separate paper on her criteria)
I appreciate that there is effort to prove the solution via what I think would be objectively recognized valid criteria developed by Claire Bowern. 

But I'm concerned that - with this as the basis - for Claire's Criteria 1, 2, and 5, you write "Passed. Confirmed by C. Bowern in February 2025"

...and yet for Claire's Criteria 3, 4, 6, and 7 there is nothing of Claire's thoughts.  Maybe she didn't have time to evaluate these, since they are the hardest to meet?  But if she expressed any degree of doubt, given how this paper is set up, you ought to highlight that. 

And I would expect at least some initial doubt from Claire Bowern on Criterion 6 (explain why Voynichese performs so differently from other writing systems on text metrics like entropy) in regard to the idea that the abbreviation/shorthand involved is a key factor behind the entropy values.  Some of her comparison work of various texts indicated that abbreviations raised conditional entropy rather than lowering it.  

She did caveat that this was based on the common known abbreviations, so is it your case that the ones employed here are substantially different from contemporary ones?  Again, I'm not at all anti-abbreviation but quite a few people better informed than me have expressed You are not allowed to view links. Register or Login to view. that it is even largely present, let alone the main explanation for failure of decipherment so far. 

Line Patterns and other Voynichese behaviour
Lastly, my guess is that Claire was also thinking of line patterns when she set out Criterion 6.  Entropy was just an example.  I may have missed it somewhere but I've not seen explanations for any of the below:
  • paragraph/top line behaviour:  why EVA p and f are almost exclusively on top lines of paragraph.  That seems to be "s" under your system.
  • line-start behaviour:  why some glyphs are disproportionately more common or rare at line start.  Some like EVA ch show consistency in their aversion to line start across the top three scribes; some like EVA q show aversion in Scribe 3's Stars and attraction in Scribe 1's Herbal A.  There are also the vertical impact patterns I've been working on that require explanations
  • line end behaviour:  why some glyphs are disproportionately more common/rare at line end, especially the final glyph of a word but also the initial and middle glyphs.  And Patrick Feaster's work indicates it might only peak at line end - there are trends appearing across the line.
  • first-last combinations:  the existence of glyphs appearing disproportionately across a word break, e.g. words ending with y tend to be disproportionately and spectacularly so - followed by words starting with q, at least in the Balneological and Stars section.   Emma and Marco have done You are not allowed to view links. Register or Login to view. on these kind of combinations.
  • the excessive alliteration in Voynichese.  You seem to be saying that "qo" means that/which.  I did a visual (so very rough) count years ago of alliterative words.  In the Balneological section alone, there were over 175 consecutive q-word pairs; 44 consecutive q-word triplets; 11 consecutive q-word quadruplets; and 3 instances of five consecutive q words. This is by no means the only glyph that likes to alliterate:  initial o is just as bad if not worse, as is initial ch, and initial sh is no slouch either.  It's definitely not a matter of a few isolated lists. 

That's just some off the top of my head.  Maybe your answer for some/all of these is abbreviation, but there are real challenges to grapple with if that's the case, and these should be set out and refuted.

tl;dr:  please could we have more sentences, and an explanation for line patterns?

Dear Tavie,
thank you for this - we came here exactly for this type of well-founded scepticism. We do not claim to have an answer for every single point (especially not instantaneously and without further diving into the manuscript itself). We also do not think that our solution is perfect as is - in our opinion it is a start to work from which needs revisions and improvements. An expectation to be able to just read the cipher once the encoding is reversed is also not necessarily realistic as very few people fluently read medieval shorthand manuscripts in the first place. It will take time to substantiate or refute this properly. We generally prefer to express our uncertainty clearly where it is due, but we have also have been criticized for that. It seems tricky to strike a good balance here with different preferences of different scholars. That being said, let me briefly address the points raised:

1. Lack of translated material
We are working on a longer stretch of text. Given how much work there is involved in substantiating individual shorthand abbreviations, finding comparison in other manuscripts etc. it will take time to provide this. It is a reason we already provided our basis to a larger community because we also have to address the reproducibility aspect and - for obvious reasons - we cannot do this ourselves. We believe, however, that consistency is a very important aspect of any solution. The vocabulary list features words from nearly every page (without repetition), so what we have is a relatively large consistent vocabulary that finds a reflection in authors roughly contemporary with the Voynich. This is an argument in itself.

Flexibility: Yes, shorthands increase flexibility. This makes them harder to read correctly, not easier. While this ambiguity is a hindrance for a straight forward reading of the manuscript, it is not necessarily an argument against a solution. It's not like anything can be read into a shorthand. Shorthands are designed to highlight the recognizable parts of words as briefly explained in the Seven Criteria doc.

2. Criteria by C. Bowern
We learned about her preference for these criteria after finishing the work on the main paper. That's why we addressed the additional criteria in the separate document. The ones marked as "passed" have been assessed by her and considered sufficient. The others need further elaboration. Our solution at least partially addresses the low entropy as a function of oversplitting glyphs. We believe that this in itself is an important finding that should be taken into consideration and factor into future statistical studies. We find this to be a major issue for general assumptions derived from statistical analyses which is why we have addressed the oversplitting problem in much detail. We also need to be clear that there is a difference between common abbreviations and a shorthand.

3. Line patterns and other Voynich behaviour
We believe these patterns can be addressed as we read more of the manuscript but this does require extensive work. The patterns are duly noted though they also need to be adjusted in some cases based on oversplit glyphs. Our working hypothesis is that at least the herbal section provides horticultural instructions in a broad sense based on the verbs (e.g. piotare, potare, terrare, etc.) that we read. Some of the line patterns are likely due to these step by step instructions where a line can but not necessarily has to be a unit.
Quote:Our solution at least partially addresses the low entropy as a function of oversplitting glyphs. We believe that this in itself is an important finding that should be taken into consideration and factor into future statistical studies.

I am sorry but that is not correct, and it is consequently not an important finding. This has been studied numerically and the results are clear. Any proposed alternative opinion should also be backed up by quantitative information.

The truth is that an alphabet like Eva, which indeed is well known to split up things that are probably units, does indeed further lower conditional entropy, but in all known alphabets this entropy is anomalously low.

Sorry to be brief but I cannot do better on my mobile phone.
(03-03-2025, 12:19 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.
(02-03-2025, 10:36 PM)ginocaspari Wrote: You are not allowed to view links. Register or Login to view.D(P/F)IE (P/F)OTMO OL OLC(P/F)UE (P/F)AR S(P/F)ARARE

E CVE (P/F)AR OR O(P/F)E T(P/F)O LDOTE ODOTN (P/F)OTE

TA? D(P/F)E D(P/F)OD(P/F)E (PI/FI)E TD(PI/FI)VE C(P/F)E DOC(P/F)OTE TAL

TAL (P/F)ODUO TM TARE IO (P/F)UE (P/F)ODOTE

This looks more like proto-Romance than any form of Italian.
The three short examples of "grammar" discussed actually do not conform to the grammar of a Romance language, e.g. the determinative article "o" is invariant like "the", rather than inflected by gender and number.

The system also features Gibbs-like flexibility. E.g. the first word of the first sample kShody can be made to match a number of Italian words and expressions. Some examples:

DP/FOTE

de piote "out from its clod of soil"

deposte - demoted
disposte - arranged
disponete - you arrange
dio puote - god can
dio potea - god could
dio pote - god prunes
da ponte - from [the] bridge
da fonte - from [the] source
di porte - of doors
do forte - I give strongly
dà forte - s/he gives strongly
di potere - of power
di fiorite - of flowered
deiforme te - godlike you
deformate - deformed
deportate - deported

Dear Marco,
thank you very much for your input. While a shorthand naturally increases flexibility, it is by far not as arbitrary as you show here. We read the bench glyph with plume as Pi/Fi. This excludes the following of your suggestions:

deposte, disposte, disponete, dio puote, dio potea, dio pote, da ponte, da fonte, di porte, do forte, da forte, di potere, deiforme te, deformate, and deportate. The only one that might be considered is "di fiorite" but since we have identified the word fior / fiore, it would much more likely be abbreviated as D(I/E)FIOR(I)TE than the rather awkward abbreviation D(i)FIO(ri)TE that does not capture the root FIOR.

Also note that while extreme contractions can occur and are well-known in medieval manuscripts (like a single T for T(erminus)) these have to be very frequent words that are commonly used. What also often happens is that we find a word in its full form at the beginning of a text that is then later on abbreviated. We believe to have such a case for the frequent word fol which occurs at the beginning of the manuscript as folie. 

While it might seem to be possible to just randomly add letters in between the glyphs, that is absolutely not what a shorthand solution looks like.
(03-03-2025, 09:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The truth is that an alphabet like Eva, which indeed is well known to split up things that are probably units, does indeed further lower conditional entropy, but in all known alphabets this entropy is anomalously low.

I agree. The most obvious EVA splits are the benches. Second perhaps [i]-clusters. But unsplitting those still won't get your h2 up to 2.5. (Conditional character entropy of around 3 would be at the lower end for regular texts).

The notion that EVA probably splits too much is not new, and it wasn't new when I wrote this post: You are not allowed to view links. Register or Login to view.

There, I took merging common glyph pairs to the extreme, and the resulting entropy is still not satisfactory. Not to mention the reduction to word length etc.
(03-03-2025, 09:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
Quote:Our solution at least partially addresses the low entropy as a function of oversplitting glyphs. We believe that this in itself is an important finding that should be taken into consideration and factor into future statistical studies.

I am sorry but that is not correct, and it is consequently not an important finding. This has been studied numerically and the results are clear. Any proposed alternative opinion should also be backed up by quantitative information.

The truth is that an alphabet like Eva, which indeed is well known to split up things that are probably units, does indeed further lower conditional entropy, but in all known alphabets this entropy is anomalously low.

Sorry to be brief but I cannot do better on my mobile phone.

Dear René,
thank you for your - albeit brief - input. We have dedicated an entire chapter (6) here You are not allowed to view links. Register or Login to view. to the topic of entropy. We acknowledge that not the entire phenomenon of low entropy is explained by this, but it is a substantial part. A recalculation is in order. The oversplitting aspect affects a majority of the most frequent bigrams, which is fascinating because we did not design our solution to specifically look for oversplit glyphs or had the goal of increasing entropy.  The lowered entropy which has to do with the end of words is likely strongly affected by the shorthand which disproportionately affects word endings. We are very much in favor of a quantification of this across the entirety of the Voynich, this does however require a renewed transcription. While exact values need a lot of transcribed text, it is obvious which bigrams are affected by our solution and whether they lower or increase entropy. The direction of the result is thus already clear. 

TLDR: We believe the low entropy is due to two phenomena 1. oversplitting (which accounts for the majority) 2. shorthand (which mostly affects low entropy regarding word endings)
Hi! Thank you for sharing your work.

(03-03-2025, 10:27 AM)ginocaspari Wrote: You are not allowed to view links. Register or Login to view.We read the bench glyph with plume as Pi/Fi.

Does this consistently apply to the whole MS? For example, is it possible to identify a plausible reading for the following highlighted parts?

[attachment=10095]
(03-03-2025, 10:46 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
(03-03-2025, 09:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The truth is that an alphabet like Eva, which indeed is well known to split up things that are probably units, does indeed further lower conditional entropy, but in all known alphabets this entropy is anomalously low.

I agree. The most obvious EVA splits are the benches. Second perhaps [i]-clusters. But unsplitting those still won't get your h2 up to 2.5. (Conditional character entropy of around 3 would be at the lower end for regular texts).

The notion that EVA probably splits too much is not new, and it wasn't new when I wrote this post: You are not allowed to view links. Register or Login to view.

There, I took merging common glyph pairs to the extreme, and the resulting entropy is still not satisfactory. Not to mention the reduction to word length etc.

Dear Koen Gheuens,
yes - we are arguing very much in your favour. Because something is not new does not necessarily mean it is incorrect. As elaborated above, we hypothesize the "missing entropy" to be due to the shorthand especially when it concerns word endings. Please take a look at chapter (6) here You are not allowed to view links. Register or Login to view. where we elaborate on the individual bigrams and you will see in the table that a vast majority of them are affected. Possibly even more than you mention in your post as we believe that the upwards stroke at the beginning of MO, NO, N, and M are frequently misread as an individual A. We do not expect a shorthand to display the same characteristics as the base language in terms of h2 and a reduced word lenght is absolutely not a problem for a shorthand. In fact, that's the goal.

It's also worth to note that Claire Bowern calculates an increase of entropy via the inclusion of abbreviations in a normal text - I am mentioning this because this will likely follow as the next counter argument. The key point here is "inclusion in a normal text". We assume here that we are dealing with a shorthand that reduces most words. We do not think we face a normal text with occasional common abbreviations.
(03-03-2025, 10:27 AM)ginocaspari Wrote: You are not allowed to view links. Register or Login to view.While a shorthand naturally increases flexibility, it is by far not as arbitrary as you show here. We read the bench glyph with plume as Pi/Fi. 

Thank you, I had missed that. This sounds quite dysfunctional. pi/fi are not shortened, but replaced with a sequence which requires a similar number of strokes, only making the text more ambiguous (by dumping the p/f distinction). Not an abbreviation, but pure information loss.

Abbreviated medieval scripts tend to do the opposite: reducing the number of strokes while preserving most of the phonetic content.

EDIT: I am probably missing something again. How is f-without-i encoded? E.g. how are these words written?

felce, fusto, foglia, falce

EDIT2: no problem, I see that ch can stand for both p and f. More information loss....
A couple quick points:

* As someone associated with the Institute of Archaeological Sciences in Bern you almost certainly know people on the linguistics faculty, or at the very least know people who know people on the linguistics faculty. I would strongly advise identifying one or more expert linguists and running this by them.

* The most common words in any fairly large sample of text tend to be function words -- applying your mapping, is that the case here?

* I share the skepticism of some of the other posters that what you're suggesting will quantitatively shift the entropies enough to make the text look consistent with a natural language, but am open to persuasion by demonstration. This can be approached from either (ideally, both) of two directions: 

1) Map (say) the first three quires from EVA to your proposed Latin alphabet equivalents and compare the h1 and h2 values to those of the types of texts you mined for vocabulary matches. I understand that there are some ambiguities in doing this -- should EVA 'ar' be read as "N" or "AR", for example -- but you have to start somewhere.

2) Take some of the contemporary texts you mined for vocabulary and convert them to EVA using your scheme (or better yet, to the Currier transcription alphabet, which also groups some of the oversplit EVA sequences to single characters) -- does doing that lower h1 & h2 to levels consistent with the Voynich text? There may be some difficulties automating this perfectly, but again you have to start somewhere.
Dear all,

thank you for the engaged comments and suggestions. So far we have received a lot of feedback regarding statistics but very little in terms of consideration for lingua volgare shorthand manuscripts. This is really what we are looking for at this point in time and why we have reached out. If you have suggestions for names of renowned lingua volgare experts that would be interested in the task, we very much appreciate a comment or PM. 

There are various ways in how to approach the Voynich and ultimately the results these methods produce need to converge to a single solution. But it is a process and the expectation for a miraculous reveal that makes this manuscript all of a sudden legible is likely misguided. This is in no way a discregard for the important statistical work that has been created by many members of this forum. Understanding what a shorthand is and how it functions is nonetheless a requirement for even being open to such a solution displaying different statistical properties. 

We will continue to work with the people we have already reached out to and update on any progress.
Pages: 1 2 3 4 5 6