Jorge_Stolfi > 22-05-2026, 06:36 AM
(22-05-2026, 03:54 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.(21-05-2026, 02:28 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Can you be more specific?
Sure. See Schinner, Andreas (2007) 'The Voynich Manuscript: Evidence of the Hoax Hypothesis', Cryptologia, 31:2, 95 - 107 (specifically, p 100-103). Key passage: "The Levenshtein distance of two character strings is an integer ranging from 0 (exact match) to the maximum of the two string lengths (no similarity), denoting the number of elementary edit operations necessary to make both strings equal. Mapping this number to the interval [0,100] yields a 'percentage of dissimilarity' for two tokens. In Figure 3, the similar token repetition distance distribution Pn for the VMS compared with normal texts is presented. Here n denotes the number of other tokens between two similar ones, i.e., n = 0 corresponds to the situation of two alike tokens in immediate vicinity. Two words are considered 'similar' if their dissimilarity as defined above is less or equal to 30%; it turns out that the precise value (+/-10%) of this threshold changes Pn only quantitatively, not qualitatively."
Quote:Understood, and the word count of the Roger Bacon excerpt was picked to match the sample of Voynich text I was comparing it to.
Quote:I think if you're positing rates that are unusually large compared to known rates of scribal errors in manuscript texts then that's potentially a problem (and way back in this thread I suggested using rates of scribal errors in Greek mss. copied by an "illiterate" scribe as a reasonable proxy for the Voynich).
Quote:f you think the Voynich Mss. text is Chinese, don't start by translating Voynichese into Chinese. Start by showing how to translate Chinese into Voynichese. And yes, I understand that you don't actually think it's Chinese per se, and per my comment below I understand that you can't follow this path given the nature of your claim. From my point of view, that's a problem.
)Quote:It's a strawman to suggest anyone doesn't understand that these are things that have some variance around mean values. I think you exagerate the extent to which "those statistics are greatly affected by topic, style, nature of the text, cost of vellum, etc."
Quote:the fact that hill climbing n-gram stats decrypters work as well as they do would appear to be an existence proof that they are not.
Quote:As for the (statistical) significance of the spacings of the cribs, there is a limit to how closely I've been following the thread after the initial "daiin" match, and I don't want to get into why I'm unconvinced of the significance of the "daiin" spacing.
nablator > 22-05-2026, 10:28 AM
(22-05-2026, 06:36 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I cannot judge the main text; but the red text in that clip ("Here ends Bonaventura's Dialogue Between Soul and Reason") has five spelling errors in 7 words, which were corrected by another scribe -- an error rate of 71%. Obviously the monk who had been anointed as The Exclusive Keeper Of The Red Inkwell did not know Latin -- not even enough to tell that "Bonaventura" was a proper name; that, being a genitive, the ending should be "-æ", "-ae", or "-ę" (as corrected), not "e"; and that "raaonem" was not a valid word.
Jorge_Stolfi > 22-05-2026, 10:16 PM
(21-05-2026, 06:53 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(21-05-2026, 02:28 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Well, you have that text above to play with. Do you see "babble-like sequences" in it?Anyway, this doesn't happen in the VMS [...]
Jorge_Stolfi > 22-05-2026, 11:24 PM
(22-05-2026, 10:28 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.The missing loop of "b" is not an error, it was perfectly readable without it.
Jorge_Stolfi > 23-05-2026, 12:02 AM
(21-05-2026, 03:40 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.What I mean is a word that only appears once or twice in many recipes, but always in the same position when used within a recipe. For example, the word "grows" or the phrase "also known as". If you were to find a VMS match for a word like this, showing a positional match across recipes, that would be interesting.
Quote:With something like "daiin", it is so common throughout the text that it will undoubtedly coincidentally match with something.
JoeyB > 25-05-2026, 06:02 AM
(22-05-2026, 04:22 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(20-05-2026, 01:35 AM)JoeyB Wrote: You are not allowed to view links. Register or Login to view.FIRST: which SBJ or ZHB version is the right comparator. And SECOND, whether the source text could have existed in the 'right form' around 1400. And I definitely don't have anything useful to add there.
The answer will require a rather long post; it will come. But in brief: as @richforto noted, the version of the Shennong Bencaojing (SBJ) that became the Starred Parags section (SPS) or the VMS was not the "reconstructed" file that I had been using, but a version that was embedded in some later materia medica; most likely the Zhenghe Bencao (ZHB), which was composed around 1080 CE. On block-printed copies of the ZHB circulating in the 1300-1400 CE time frame, the SBJ text would have been clearly marked out by being printed in double-size font and white-on-black instead of black-on-white. Thus a doctor or scholar who had a copy of the ZHB could have read it out aloud easily, excluding certain ~500 CE additions that had become as "sacred" as the SBJ itself.
Quote:BUT, the other issue seems to be more testable: if this is a positional-distance hypothesis, can the method we use to match also recover the rooster/f105v32-38 pair when we run the whole thing blind? Which files shouldI grab for that? EG, compare all SPS paragraphs against all SBJ/ZHB entries without preselecting the rooster pair,
Yes, and and that is exactly what I am doing now for every SPS entry, including the Rooster one. Except that I am testing only against the 243 "good" SPS parags -- excluding those which have two or more stars on the margin, which presumably are in fact two or more parags run together. So I expect that at most 2/3 of the SBJ entries will have an identifiable match.
However, the Rooster entry is a very easy case because it is the only one with eight instances of 主. Thus f105v.32 is the only parag with enough daiin to even begin to compete, even allowing for "quillos" like dain, laiin, or kain. So it easily comes out as the match for the Rooster entry. I will discuss this in detail later.
Apart from the Rooster one, there are a few SBJ entries with three 主, and a few with two. The vast majority has only one. On these, my programs will often find two or more parags that seem to have a daiin (or quillos thereof) at the right places. So I have to put those entries on hold for now.
On the other hand, I now have a couple more cribs that can be used in the comparison, notably 气 qì (some sort of "vital energy"; sort of like the Western "humors" but completely different). That is a rather common character in the SBJ, and seems to correspond to chedy (or sometime slight variations thereof, like chdy or cheda) in the SPS. By including those cribs in the list of keywords to be matched I can often resolve those ambiguities, and make already identified matches more certain.
Quote:I started looking at and playing with files in the ic.unicamp....Notes/077 folder but before I go too far down the road, if the method is bad I don't want to keep going, and if there are files that are final versions or authoritative I'd want to use those.
Sorry, those programs and files were not meant for other people's use. They are still messy and buggy tools that I am using to match SBJ recipes to SPS parags. A task that is progressing, but much more slowly than I hoped, for reasons that I will detail later. Feel free to use those programs, but I cannot guarantee them.
All the best, --stolfi
Jorge_Stolfi > Yesterday, 06:13 AM
Quote: You did not copy the pinyin incorrectly [...]
The character 血 is a polyphone (多音字) that has two standard Mandarin readings: the literary reading (xuè) and the colloquial reading (xiě).
Because Shanghan Lun (伤寒论) / Jinui Yaolue formulas and symptoms are classical medical texts, the choice between these two readings depends on how the words are structured in Traditional Chinese Medicine (TCM).
1. 肠澼脓血 (cháng pì nóng xuè)
- Why it is xuè: In this phrase (which refers to dysentery with pus and blood), 脓血 ([i]nóngxuè[/i]) is a tightly bound, formal compound noun ("pus-blood").
- The Rule: In Mandarin, when 血 forms a stable, multi-syllable compound or technical/medical term (like 血液 [i]xuèyè[/i], 血管 [i]xuèguǎn[/i], or 充血 [i]chōngxuè[/i]), it almost always takes the formal literary reading xuè.
2. 下血赤白 (xià xiè chì bái)
- Why it is xiè (or colloquially xiě): In this phrase (referring to the passing of red and white stool/blood), 下血 is a verb-object construction where 下 act as a verb ("to discharge / pass down") and 血 stands alone as the direct object noun ("blood").
- The Rule: When 血 stands alone as an independent noun or functions as the object after a verb (like 吐血 tùxiě "to vomit blood" or 流血 liúxiě "to bleed"), native speakers naturally shift to the colloquial reading xiě
MarcoP > Yesterday, 08:43 AM
nablator Wrote:Jorge_Stolfi Wrote:I cannot judge the main text; but the red text in that clip ("Here ends Bonaventura's Dialogue Between Soul and Reason") has five spelling errors in 7 words, which were corrected by another scribe -- an error rate of 71%. Obviously the monk who had been anointed as The Exclusive Keeper Of The Red Inkwell did not know Latin -- not even enough to tell that "Bonaventura" was a proper name; that, being a genitive, the ending should be "-æ", "-ae", or "-ę" (as corrected), not "e"; and that "raaonem" was not a valid word.
You posted this several times and it really isn't a good example of scribal errors. There is only one error, the "e" in "bone" should be an "a". The missing loop of "b" is not an error, it was perfectly readable without it. Most (like 99%) late medieval manuscripts did not use any ę/æ/œ whatsoever, it was not a mistake to write "e" instead of "ę". It's not "raaonem" but "racionem", a medieval spelling, extremely frequent in "ti"+vowel patterns. The "ci" ligature was written without a dot, like the first one in Expli"ci"t.
Jorge_Stolfi > Yesterday, 01:50 PM
(Yesterday, 08:43 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I investigated a little about the origins of the name "Bonaventura" and it's a nickname based on "bona ventura" (good luck). Therefore the genitive "boneventure" (or the equivalent "bonaeventurae") was clearly understandable to those familiar with Franciscan lore.
MarcoP > Yesterday, 03:42 PM
University of Oxford Wrote:The most obvious difference in appearance between Medieval Latin and Classical Latin is in how words were spelled. Although Classical spellings were generally retained for inherited vocabulary, changes in pronunciation which had happened over the centuries — many the same as those which had led to the divergence of the everyday Romance languages from Latin and from each other — influenced the corresponding spelling of the words. Thus we often find ci before a vowel where the Classical spelling would have been ti (e.g racio for ratio), and the diphthongs ae and oe which had come to be pronounced the same as the simple e sound are often written e. (We also find as a result examples where ae or oe are written where the expected spelling would be just e.) Other alternations in spelling arising from changes in pronunciation are the interchange of b and v, the insertion or deletion of h, the use of single consonants for double ones (and vice versa), and the substitution of y for i. Sometimes spellings were also influenced by the pronunciation of a word in the everyday local language related to or derived from the Latin word (or thought to have been so).