nablator > 17-06-2025, 09:51 PM
(17-06-2025, 09:38 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.So, the Chinese text was transcribed to pinyin and then space separation of pinyin groups was treated as word breaks?
DeepSeek Wrote:Notes on Segmentation & Pronunciation
• Word boundaries follow Classical Chinese grammar (e.g., 上藥 shàng yào = "high-grade herbs," treated as one compound).
• Tone marks are added for clarity (e.g., wéi 爲/为, yǐ 以).
• Fixed terms (e.g., 君臣佐使 jūn chén zuǒ shǐ = "sovereign, minister, assistant, envoy") are kept intact.
• Modern Pinyin is used for readability, though some archaic pronunciations may differ.
DeepSeek Wrote:In ancient Chinese texts, words are indeed written without spaces, which can make word identification challenging. However, there are several reliable methods and principles that scholars use to segment and interpret words correctly:
1. Grammar & Syntax Rules
• Classical Chinese has a relatively fixed grammatical structure. For example:
Subjects usually precede verbs (e.g., 王曰 "The king said").
Objects often follow verbs (e.g., 伐国 "attack a state").
Modifiers (adjectives, possessives) come before nouns (e.g., 大国 "great state," 吾子 "my lord").
• Function words (e.g., 之, 乎, 者, 也) often mark boundaries between phrases.
2. Morpheme-Based Nature
• Most words in Classical Chinese are monosyllabic (one character = one word), but there are also disyllabic compounds (e.g., 天下 "all under heaven," 君子 "gentleman").
• Context helps determine whether two characters form a single word or are separate.
3. Context & Meaning
• The broader passage often clarifies word boundaries. For example:
In 學而時習之, the segmentation is 學 / 而 / 時習 / 之 ("Learn and often practice it").
• Misreading can happen if context is ignored (e.g., 可以 is often 可 / 以 "can / use to" rather than the modern 可以 "can").
4. Fixed Phrases & Idioms
• Many set phrases recur in Classical Chinese (e.g., 不亦樂乎, 如之何), making them easier to identify.
5. Punctuation in Later Texts
• Many received editions (e.g., Song dynasty printed texts) include punctuation marks (句讀 judou) that guide segmentation.
6. Comparative & Philological Methods
• Comparing parallel passages in different texts.
• Analyzing rhyme schemes in poetry (e.g., Shijing) helps confirm word divisions.
Example Analysis
Take an unpunctuated sentence:
民可使由之不可使知之
Possible interpretations depend on segmentation:
1. 民可 / 使由之 / 不可 / 使知之 ("The people can be made to follow, but not to understand.")
2. 民可使 / 由之 / 不可使 / 知之 (alternative reading with different emphasis).
The correct choice depends on context and grammatical logic.
Jorge_Stolfi > 17-06-2025, 10:08 PM
(17-06-2025, 06:28 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.BTW, I'm not sure what method was used to determine the number of words for the Chinese manuscript.
Jorge_Stolfi > 17-06-2025, 11:32 PM
(17-06-2025, 06:28 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.With some numbers this looks more interesting to me. However, I don't see this as 6 near coincidences. The total number of entries is the obvious optimization parameter. You wouldn't even consider comparing Voynich stars section with a manuscript of 30 entries or a tome of 3000 folios. If I understand it correctly, the similar number of entries was one of the things that attracted your attention to the Chinese MS. If VMS had 30 entries, there would be another Chinese (or Arabic or Hindi) piece of interest with 30 entries and maybe a different origin story.
Quote:Also, you decided to remove section titles from bencao, if I understand it correctly. Were they a later addition and not part of the original work?
Jorge_Stolfi > Yesterday, 07:55 AM
(17-06-2025, 11:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But indeed maybe there were section titles in the original SBJ [Shennong Bencao Jing] too. Descriptions of the original only mention the division into 120/120/125 entries. So perhaps 3 of those 4 "titles" in the SPS are the headers of those original sections. But the first two are on the same page (f105r) only a dozen recipes apart; and there seems to be no title at the start of the SPS (page f103r)....
Pepper > Yesterday, 08:14 AM
(17-06-2025, 09:02 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(17-06-2025, 02:27 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.(17-06-2025, 08:35 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There are generally two kinds of Voynich theories: the solution kind (providing some specific plaintext for specific parts of the MS, be it labels, lines, etc) and the origin story kind, of which your Chinese theory is an example.I think the origin story is not at all convincing but that's largely irrelevant to whether the solution is correct or not, so it's a shame to get bogged down in arguments about it.
Abstractly that may be true, but in practice any attempt to decipher the VMS must make some assumption about its origin and how it was produced. That is necessary to limit the possibilities for the language and encoding, to estimate the fraction of errors, and to exclude spurious features from analysis.
In fact, most attempts at decipherment to date have made the same assumption about the origin: the manuscript was created in Europe, and the text and diagrams (not just the script) were original creations by the Author, and either they were a nonsensical hoax, or their meaning was perfectly known to the author. In the second case, every word and every detail of the drawings was intentional; and therefore could be a clue for the decipherment, or had to be explained by it.
And I believe that those attempts failed, and were doomed to fail, because that assumption is false. The "Chinese Theory", in contrast, provides an entirely different set of candidate languages and a very different type of "encryption"; and it implies that, while the text and diagrams had meaning, the Author himself had only a limited understanding of them. Thus he must have made many errors, and (in the Herbal especially) made up a lot of stuff that he had failed to record. And therefore it demands very different approaches to decipherment.
(17-06-2025, 03:00 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(17-06-2025, 02:27 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.The solution part of the theory IS falsifiable. Jorge has even suggested a plaintext for the recipes section. Falsifying it won't be easy but also not impossible, if somebody is sufficiently motivated.
I disagree. It may be theoretically falsifiable in the same way as a teapot orbiting the sun is theoretically falsifiable, but practically not. It has been suggested that the plaintext can be some older version of a known Chinese book possibly transcribed with mistakes from an unknown version of an Oriental language. How do you refute this? Other than providing a complete solution, which would falsify most competing theories.
On the other hand, if we assume a perfect transcription of the known text from Classical Chinese, then yes it is falsifiable and as far as I'm concerned my experiment with computing longest repeated contexts a few posts ago did falsify it.
Jorge_Stolfi > Yesterday, 08:45 AM
(17-06-2025, 03:00 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.On the other hand, if we assume a perfect transcription of the known text from Classical Chinese, then yes it is falsifiable and as far as I'm concerned my experiment with computing longest repeated contexts a few posts ago did falsify it.Agreed, you did falsify the Chinese theory with that assumption included.
Jorge_Stolfi > Yesterday, 09:52 AM
------ start of the SPS
|
| 61 parags
|
...... dubious title <f105r.9a> sairy.ore.daiindy.ytam
| 8 parags
...... tile <f105r.36> otoiis.chedaiin.otair.otaly
|
| 104 parags
|
...... title <f108v.52> olchar.olchedy.lshy.otedy
****** missing 4 pages f109r-f110v
|
| 106 parags
|
...... dubious title <f114r.34> ytain.olkaiin.ykar.chdar.alkam
|
| 51 parags
|
------ end of the SPS
Jorge_Stolfi > Yesterday, 04:15 PM
oshfdk > Yesterday, 04:47 PM
(Yesterday, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.As I posted before, my assumed criteria for identifying a parag break are
- If the line has one or more one-leg gallows (p or f), it must be the first one of a parag.
- If the line ends well before the right margin, it must be the last one of a parag.
- If the line ends in -m or -g, its likely to be the last one of a parag.
- If the spacing above the line is larger than average, it is likely to be the first of a parag.
Jorge_Stolfi > Yesterday, 06:15 PM
(Yesterday, 04:47 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.As far as I understand, p and f do frequently appear on the first lines of paragraphs, but I don't think they define the first lines of paragraphs.