quimqu > 18-02-2026, 05:29 PM
quimqu > 18-02-2026, 08:59 PM
Fontanellean > 19-02-2026, 12:04 AM
quimqu > 19-02-2026, 07:24 AM
(19-02-2026, 12:04 AM)Fontanellean Wrote: You are not allowed to view links. Register or Login to view.Do I read the graph correctly that a Latin Alchemical-Herbal text is uniquely situated among the Voynich samples when it comes to mutual information? Is that significant? It sounds like you expect natural texts to surround the Voynich curves if enough are sampled.
quimqu > 19-02-2026, 10:35 AM
Rafal > 19-02-2026, 11:29 AM
Quote:From the natural languages analyzed in this work, only Ambrosius Medionalensis In Psalmum David CXVIII Expositio has similar behaviour

quimqu > 19-02-2026, 11:41 AM
(19-02-2026, 11:29 AM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Quote:From the natural languages analyzed in this work, only Ambrosius Medionalensis In Psalmum David CXVIII Expositio has similar behaviour
I guess that's the real pain
If none real text had this behaviour then we could think it describes some disctinction between natural and constructed text.
But as some 100% natural text has this behavior then is it really indicative of anything ???
nablator > 19-02-2026, 11:46 AM
(19-02-2026, 10:35 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.To myself (and this is full my theroy), we could say that the area between the full text and the shuffled text is the effect that the word ordening has to the MI (so how the natural language links words and characters in a long term).
Quote:The mutual information of two jointly discrete random variables X and Y is calculated as a double sum:You are not allowed to view links. Register or Login to view.
![]()
quimqu > 19-02-2026, 12:17 PM
(19-02-2026, 11:46 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.(19-02-2026, 10:35 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.To myself (and this is full my theroy), we could say that the area between the full text and the shuffled text is the effect that the word ordening has to the MI (so how the natural language links words and characters in a long term).
I am trying to understand the asymptotic behavior: your MI(d) curves seem to converge to a different value in word-shuffled texts than in "raw" (original) texts, right? If so, it seems very counterintuitive: there should be no predictability at all at large distances. For the "raw" texts because obviously words are preferentially followed by other words, but this fact tells you nothing about the words on the next page or 10 pages later. For the word-shuffled texts it is often characters of the same word that are picked together at low distances so there can be some predictability, it diminishes with distance and goes down to zero as soon as the distance exceeds the length of the longest word.
The explanation must be in the way MI(d) is calculated or displayed. Pardon the very basic question: how is it defined? Is it this formula?
Quote:The mutual information of two jointly discrete random variables X and Y is calculated as a double sum:You are not allowed to view links. Register or Login to view.
nablator > 19-02-2026, 12:28 PM
(19-02-2026, 12:17 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.It is this formula. It is essentially the mutual information between characters separated by exactly d positions, averaged over all such pairs in the corpus.