![]() |
|
The 'Chinese' Theory: For and Against - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html) +--- Thread: The 'Chinese' Theory: For and Against (/thread-4746.html) |
RE: It is not Chinese - nablator - 17-06-2025 (17-06-2025, 09:38 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.So, the Chinese text was transcribed to pinyin and then space separation of pinyin groups was treated as word breaks? Word breaks were guessed and spaces inserted (or removed) accordingly. Google Translate is not optimized for old text, the meaning can be different in old and recent texts for the same characters, and so is the word segmentation. I can't ask Google Translate how it does the segmentation, so I asked DeepSeek. I don't know why it assumed the text is Classical Chinese, anyway, these are its explanations: DeepSeek Wrote:Notes on Segmentation & Pronunciation DeepSeek Wrote:In ancient Chinese texts, words are indeed written without spaces, which can make word identification challenging. However, there are several reliable methods and principles that scholars use to segment and interpret words correctly: RE: It is not Chinese - Jorge_Stolfi - 17-06-2025 (17-06-2025, 06:28 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.BTW, I'm not sure what method was used to determine the number of words for the Chinese manuscript. I fed the Chinese text of the webpage you posted to Google Translate, and copy-pasted the pinyin that it provided (at the bottom of the left pane). Two-word compounds are not explicitly marked in the Chinese script; each syllable is a separate character, and the spaces between characters are all the same. In pinyin transcriptions, some pairs of syllables are written as a single word. I don't know what the rule is and how rigid it is. Does anyone here know? Answered by pervious post... Most entries in Chinese and Chinese-Xxx dictionaries are two-syllable compounds where the meanings of the individual syllables have only vague relation to that of the compound, like English "pineapple", "necktie", or "typewriter". But there may be a gradation in that regard; I suppose that in English one could write either "back-scatter" "backscatter" or "back scatter" without upsetting the reader, and the same perhaps happens when one writes Chinese in pinyin (which I suppose is something that the Chinese themselves very rarely do). So perhaps the pair joining in pinyin is defined by some standard dictionary. But my guess is that Google Translate just gets that info from crunching a big pile of random pinyin texts, and therefore is random to some extent. RE: It is not Chinese - Jorge_Stolfi - 17-06-2025 (17-06-2025, 06:28 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.With some numbers this looks more interesting to me. However, I don't see this as 6 near coincidences. The total number of entries is the obvious optimization parameter. You wouldn't even consider comparing Voynich stars section with a manuscript of 30 entries or a tome of 3000 folios. If I understand it correctly, the similar number of entries was one of the things that attracted your attention to the Chinese MS. If VMS had 30 entries, there would be another Chinese (or Arabic or Hindi) piece of interest with 30 entries and maybe a different origin story. Yes, and if you find such a manuscript, please let us know. Indeed I came to this hunch when I was surf-searching for Chinese medical texts, and noticed that the SBJ had about the right number of entries and right length of each entry. But, besides those and the other statistical similarities, what makes the SBJ a strong candidate is that it was widely known and available in the whole the Chinese cultural sphere (including countries that have never been under Chinese control); and, if someone asked a doctor anyhwere in that area "what is the most important medical book you have here", the answer would very likely have been the SBJ. Quote:Also, you decided to remove section titles from bencao, if I understand it correctly. Were they a later addition and not part of the original work? All I know is that the SBJ that exists today is a version that was said to have been "expanded" around 1400 CE, and (IIUC) somewhat reorganized. IIUC the expansion included separating the "recipes" into mineral (Google's "Jade"), vegetable, animal, etc. Which is what those subsection titles seem to be. But indeed maybe there were section titles in the original SBJ too. Descriptions of the original only mention the division into 120/120/125 entries. So perhaps 3 of those 4 "titles" in the SPS are the headers of those original sections. But the first two are on the same page (f105r) only a dozen recipes apart; and there seems to be no title at the start of the SPS (page f103r).... RE: It is not Chinese - Jorge_Stolfi - 18-06-2025 (17-06-2025, 11:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But indeed maybe there were section titles in the original SBJ [Shennong Bencao Jing] too. Descriptions of the original only mention the division into 120/120/125 entries. So perhaps 3 of those 4 "titles" in the SPS are the headers of those original sections. But the first two are on the same page (f105r) only a dozen recipes apart; and there seems to be no title at the start of the SPS (page f103r).... Here are the lines of the Starred Parags section (SPS; from page 105r to page 116r line 30) that I am currrently considering to be "titles": <f105r.T1.9a> <f105r.T2.36> <f108v.T1.52> <f114r.T1.34> The middle two, <f105r.T2.36> and <f108v.T1.52>, which are centered on their respective lines, seem to be correct: if there are titles in that section, those two must be among them. The creators of the interlinear file were uncertain about the last one, <f114r.T1.34>. Three possible interpretations are
An argument for (2.) is that line 34 ends with -m, a glyph that is common at end of paragraphs, while the previous line (33) has full width and does not end in -m. A possible explanation for the right-justification in options (1.) and (2.) is that the Scribe copied line 33, then skipped line 34 by mistake, started to write line 35, noticed the omission, wrote line 34 at the right, then continued line 35, bending it down to avoid line 34. For option (3.), a possible explanation is that that the Scribe skipped several words at the "carriage return" from line 35 to line 36. Several lines later, when the omission was noticed, he/she went back and added the missing words in the nearest space available -- right-justified above line 35. But this conjecture does not explain why line 35 is tilted and bent as it is. So now I am more inclined to believe in (2.) above. I will fix the file accordingly. The first title <f105r.T1.9a> is even more dubious. Here it is again for convenience: Here the previous paragraph ends with a half-line (9) so line 9a is unlikely to be part of it. Line 10 starts right below line 9, with only a bit of extra space, as expected for a parag break; but is tlted with respect to line 9, as if to leave space for line 9a. That would suggest that line 10 was written after line 9a. However, the way the words of line 9a avoid the tall p gallows of line 10 says that line 9a was written after line 10 was completed. Thus, in this case, explanation (3.) above seems more likely than (1.) and (2.), except that it does not explain the extra tilt of line 10. Anyway, I think I will accept this explanation and insert line 9a between lines 10 and 11. Incidentally, note that the handwriting changes abruptly at the next parag break, between lines 12 and 13. Could this coincidence be related to that anomalous line? I have fun visualizing the Author checking on the work of the Scribe after he finished line 12, noticing the hack of line 9a, and firing him on the spot. Then hiring another Scribe, training him on the script, and instructing him to write with smaller "font" to save vellum. However, I don't see much difference between the two handwritings, apart from "font" size, ink color, and stroke width. So I think it is more likely that the explanation for the discontinuity is more banal, simply a long pause in the work between lines 12 and 13. RE: It is not Chinese - Pepper - 18-06-2025 (17-06-2025, 09:02 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(17-06-2025, 02:27 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.(17-06-2025, 08:35 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There are generally two kinds of Voynich theories: the solution kind (providing some specific plaintext for specific parts of the MS, be it labels, lines, etc) and the origin story kind, of which your Chinese theory is an example.I think the origin story is not at all convincing but that's largely irrelevant to whether the solution is correct or not, so it's a shame to get bogged down in arguments about it. That all sounds like a convenient means to explain away the oddities of the text that don't fit the proposed translation, which is certainly a feature of nearly all Voynich theories! (17-06-2025, 03:00 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(17-06-2025, 02:27 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.The solution part of the theory IS falsifiable. Jorge has even suggested a plaintext for the recipes section. Falsifying it won't be easy but also not impossible, if somebody is sufficiently motivated. Having read more, I agree. RE: It is not Chinese - Jorge_Stolfi - 18-06-2025 (17-06-2025, 03:00 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.On the other hand, if we assume a perfect transcription of the known text from Classical Chinese, then yes it is falsifiable and as far as I'm concerned my experiment with computing longest repeated contexts a few posts ago did falsify it.Agreed, you did falsify the Chinese theory with that assumption included. All the best, --jorge RE: It is not Chinese - Jorge_Stolfi - 18-06-2025 Before merging the two dubious "titles" of the Starred Parags section (SPS) into the adjacent parags, the situation could be described by the following diagram: Code: ------ start of the SPSAnd 69 + 51 = 120. :puzzled: Between the two definite titles there are 106 parags, which is neither 120 nor 125 but not very far off. Assuming the average of ~14 parags per page, the 4 missing pages should have ~56 parags; in that case, between the second definite title and the last (dubious) title there would be 106 + 56 = 152 parags. Also neither 120 nor 125, but also not very far off. Let's suppose for a moment that indeed the SPS is the old (pre-1400) version of the SBJ (in some language, with the pronunciation of 1400), and the latter indeed had three sections of 120, 120, 125 entries, with a title at the top of each entry, and these titles got transcribed by the Author as <f105r.36>, <f108v.52>, and either <f114r.34> or a lost title in the missing bifolio. Then we would need an explanation for how the titles of the SPS ended up in their current positions among the list of paragraphs. Two possibilities are that the bifolios of the quire got scrambled before the folios were numbered (something which we know happened in the Biology section), and/or the pages of the Author's draft got scrambled before the Scribe copied them to the vellum. I won't try to propose a detailed scrambling scheme yet... All the best, --jorge RE: It is not Chinese - Jorge_Stolfi - 18-06-2025 If the Starred Parags section (SPS) is indeed a phonetic transcription of some version of the Shennong Bencao Jing (SBJ) in some language and with some old pronunciation, one of the smallest obstacles in exploiting that "Rosetta Stone" is the uncertainty about the paragraph breaks in the SPS. To illustrate the problem, I picked one of the shortest parags in my SPS file: the one that starts on page f111v, line 14, which has only 12 words. Here is the relevant section of that page (image clipped from the Beinecke Library scans, contrast-enhanced): The green triangles indicate the paragraph breaks as marked in my SPS file. As I posted before, my assumed criteria for identifying a parag break are
By these criteria, the only parag break in the image above that can be trusted is that between lines 25 and 26, that satisfies criteria (1.) and (2.). All other breaks are dubious guesses based only on criteria (3.) and/or (4.) And now I can already see some mistakes. There is an f in line 16; so, by criterion (1.), there should be a parag break between 15 and 16. That is probably a bug in the file, even though the other three criteria are not met. The break between 13 and 14 was one of my additions. But now I would guess that there should be a break between 12 and 13, because of the wider spacing, and the break between 13 and 14 should be removed, because that am as the next-to-last word is not convincing evidence. An there are 23.5 pages in that section...
RE: It is not Chinese - oshfdk - 18-06-2025 (18-06-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.As I posted before, my assumed criteria for identifying a parag break are I'm not sure how good these criteria are, except for 2). For example, the below are the one leg gallows from You are not allowed to view links. Register or Login to view. that don't appear to be on the first lines of paragraphs. Most of them are in cPh/cFh clusters, but a few are not, and there is one with no suspicious ligatures at all, near the center of the page. As far as I understand, p and f do frequently appear on the first lines of paragraphs, but I don't think they define the first lines of paragraphs. RE: It is not Chinese - Jorge_Stolfi - 18-06-2025 (18-06-2025, 04:47 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.As far as I understand, p and f do frequently appear on the first lines of paragraphs, but I don't think they define the first lines of paragraphs. As I see it, they are not grammatical markers of paragraphs, but only ornate letters that (as in many other medieval manuscripts) are typically used by scribes to highlight the start of paragraphs, or other noteworthy lines or words. I would compare them to the ornate capitals that were used in other manuscripts at the start of paragraphs or sections. Or to the still current English custom of capitalizing every word in paper titles and newspaper headlines, I see the ornate and "bridging" gallows in some Herbal pages as instances of the same thing: bits of fanciness added by the Scribe, by influence of the general scribal customs. Moreover, page You are not allowed to view links. Register or Login to view. is clearly special in many ways. The Scribe may have made more liberal use of one-leg gallows there, because of its importance. Just as the frontispice of books is usully much more ornate than regular pages. Or maybe some of those ps and fs are highlighting proper names. Or maybe some of those lines with ps and fs are indeed top-of-parag lines. Note the -m ending on previous lines... All the best, --jorge |