The Voynich Ninja

Full Version: It is not Chinese
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(18-06-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.As I posted before, my assumed criteria for identifying a parag break are
1. If the line has one or more one-leg gallows (p or f), it must be the first one of a parag.

Why discard the idea that, if one-leg gallows are used when there is more space, the rare presence of a one-leg gallows in a line not clearly demarcated as Top Row could simply mean the scribe has found enough space? 

Just looking at one-leg gallows as line initials, without even looking at one-leg gallows elsewhere in the line, we can see around five instances of two consecutive lines starting with p in the Balneological section and 1 vertical pair for the Stars section.  If we stuck to this rule, this would mean some paragraphs consisting only of one line each. 

(I would also make a similar argument for your final m point below:  there are at least 50 pairs of consecutive lines in Stars that end in final m).  

(18-06-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.2.  If the line ends well before the right margin, it must be the last one of a parag. 

Fair enough.  That seems to me the most solid way of identifying a paragraph break, more solid than simply the presence of ornate gallows.

(18-06-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.3.  If the line ends in -m or -g, its likely to be the last one of a paragraph. 

I don't follow this.  If I am understanding this right, you are saying that the scribe hasn't faithfully replicated all the author's paragraph breaks, and so there are hidden paragraph breaks in the text:  lines that we currently think of as mid-paragraph but are actually paragraph ends.  And a big clue to identifying concealed paragraph ends is if it ends in final m.  Is that right?

But if paragraph ends - whether concealed or clearly demarcated - are likely to end in final m, then we would expect to see the clearly demarcated paragraph ends to be overwhelmingly final m. 

But we don't see that.  Line end words that appear as clearly demarcated paragraph ends are often different from the line end words above them in the paragraph. In the Stars section, final m is woefully underrepresented at these positions in comparison to the line ends above them.  This would imply the opposite to your rule.

(As a side note, the different behaviour of paragraph end words is an argument to me in support of abbreviation at line ends, although not proof).
(18-06-2025, 07:59 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.Why discard the idea that, if one-leg gallows are used when there is more space, the rare presence of a one-leg gallows in a line not clearly demarcated as Top Row could simply mean the scribe has found enough space? 

As I wrote, Rule 1 may not be 100% correct, but I think that inserting a parag break before every such line is the best option.  I believe that not breaking would be more often wrong than right.  But I do not see how we can check this hunch.

The Scribe's decision to use one-leg gallows is not simply "when there is space", because there are such gallows squeezed in text with normal line spacing.  That f on line 15 of page f111v, for example. 

Quote:If we stuck to this rule, this would mean some paragraphs consisting only of one line each.

There is nothing wrong with one-line paragraphs.  One should break a parag whenever there is a major shift in the focus of discourse or line of argument.  Publishers don't like short parags because they use more paper, and each additional page means 10,000 additional sheets of paper if they print 20,000 copies. So their editors will tell authors to merge short parags into more meaty ones. 

That applies to vellum too; but the result of that economic pressure usually was that the Scribe would  truncate or abbreviate the end of a paragraph (which may be the function of the -am ending) so as to avoid a last line with only a few words; and then he would cram parags together, with no extra space, using other devices to indicate the break.  Like an ornate letter, a ¶ sign, some one-leg gallows...

By the way, the rules I gave are intended for the Starred Parags section.  Maybe other rules would be more appropriate for other sections

Quote:I would also make a similar argument for your final m point below:  there are at least 50 pairs of consecutive lines in Stars that end in final m.

And the SBJ has many one-line entries, too...

Quote:If I am understanding this right, you are saying that the scribe hasn't faithfully replicated all the author's paragraph breaks, and so there are hidden paragraph breaks in the text:  lines that we currently think of as mid-paragraph but are actually paragraph ends.

Well, unfortunately that is a possibility that we cannot exclude.

It seems clear and logical that the Scribe ignored line breaks in the draft, and inserted line breaks by himself whenever he reached the right margin.  As he would do in his main job of putting to vellum the drafts of letters, contracts, etc. in Latin.

But then, if the last line of a parag in the draft ended at the right margin, the Scribe might not notice the parag break, and copy the next parag as a continuation of the previous one.

As for the am, it may be an abbreviation like "etc." -- which the Scribe was allowed to insert at the end of a parag in place of certain unimportant words in order to avoid a very short final line, but might also have been used by the Author in the draft to avoid repetition or redundant phrases.  

But the am may also be a word of the language that is common at the end of sentences, like the "-ta" or "-desu" of Japanese, the "ist" of German... Since the end of a parag is also the end of a sentence, that would explain why am is common in that position.

In summary, even a simple thing like identifying the paragraphs of the SPS (which, in the SPS=SBJ theory, would correspond to the entries of the SBJ) is a messy task, that at this stage we cannot hope to solve with 100% accuracy and confidence.  Sigh.
(18-06-2025, 04:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.As I posted before, my assumed criteria for identifying a parag break are
  1. If the line has one or more one-leg gallows (p or f), it must be the first one of a parag.
  2. If the line ends well before the right margin, it must be the last one of a parag.
  3. If the line ends in -m or -g, its likely to be the last one of a parag.
  4. If the spacing above the line is larger than average, it is likely to be the first of a parag.

I forgot a rule specific to the SPS:

  5. There should be an approximately one-to-one correspondence between stars in the left margin and parags, and each star should be near the first line of the corresponding parag.

i say "approximately" because it seems that the star and parag counts in some page are off by one or two units; and "near" because the stars and parags are often out of alignment.  As if the Scribe was not aware of the correspondence, and copied stars and parags independently of each other.
(18-06-2025, 09:11 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.As for the am, it may be an abbreviation like "etc." -- which the Scribe was allowed to insert at the end of a parag in place of certain unimportant words in order to avoid a very short final line, but might also have been used by the Author in the draft to avoid repetition or redundant phrases.  

But the am may also be a word of the language that is common at the end of sentences, like the "-ta" or "-desu" of Japanese, the "ist" of German... Since the end of a parag is also the end of a sentence, that would explain why am is common in that position.

But what I said in the post above was that I see the exact opposite:  final m seems underrepresented at the ends of paragraph that we can see clearly (i.e. those with clear indents).  It does not seem common at paragraph end, let alone more likely to occur at paragraph end.  It underperforms.  Quite badly.    

This is what I see based on the Scribe 3 Stars section (all folios in Stars except 115r, where Lisa thought it might be two scribes).  Perhaps my stats are wrong since it's been a while since I gathered these particular ones, and they need double-checking.  But even allowing for errors, I don't see evidence for saying that paragraph ends are more likely to have final m.
(18-06-2025, 10:13 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.But what I said in the post above was that I see the exact opposite:  final m seems underrepresented at the ends of paragraph that we can see clearly (i.e. those with clear indents).  It does not seem common at paragraph end, let alone more likely to occur at paragraph end.  It underperforms.  Quite badly.

This would be expected, if the final m was used as "etc.". It would end paragraphs when there is no space left to finish the paragraph on the same line, but the remaining part is too short to put it into a line on its own. When there is plenty of space, as in the case with clear indents, there is no need for "etc.", the paragraph is just written out in full.
(18-06-2025, 10:43 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.This would be expected, if the final m was used as "etc.". It would end paragraphs when there is no space left to finish the paragraph on the same line, but the remaining part is too short to put it into a line on its own. When there is plenty of space, as in the case with clear indents, there is no need for "etc.", the paragraph is just written out in full.


I was focusing on the second suggestion, that final am is common at paragraph end (which I don't think it is) because it may represent a word that is common at sentence end, and I hadn't thought through the first suggestion as you set it out here.  I see that could make sense if we believe there are likely many more paragraph ends than are implied by indentation, but there doesn't seem to be evidence for this, so I would still query the rule that final m is likely to indicate a paragraph end.
Only two points I would add to this: 
Indeed, Eva-m is preferably line-ending but I do not see it as preferably paragraph-ending. 
Furthermore, there are three cases in the stars section where there seems to be a single longer paragraph but still stars every few lines.
This is a bit subjective though.
(18-06-2025, 10:13 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.But what I said in the post above was that I see the exact opposite:  final m seems underrepresented at the ends of paragraph that we can see clearly (i.e. those with clear indents).  It does not seem common at paragraph end, let alone more likely to occur at paragraph end.  It underperforms.  Quite badly.   

What I meant was that m can be used as evidence of end-of-paragraph when conditions (1.) and (2.) are false: that is, when the next line has no p or f, and has full width.

But I am afraid that you are right.  This criterion is wrong.

Sigh.  I have to check again all breaks in the SPS.  Fortunately most of them (I think) are determined by criteria (1.) and/or (2.).  For the others, I will have to rely on (4.) (wider interline spacing) and/or (5.) (near a star) and/or (6.) (it is the last line on the page).
(14-06-2025, 12:45 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think this: You are not allowed to view links. Register or Login to view. is the book that Prof. Stolfi referred to as a possible source of the Voynich MS.

Thanks for a deep dive into this interesting hypothesis, and although I am a bit late to the party, and certainly not an expert in Voynich, but I think I can offer some insights in terms of the context for different versions of BenCao (本草). 

First of all, the version shared by oshfdk from ctext (中國哲學書電子化計劃, this is an open-source project to digitize ancient texts, hence the quality of the transcription sometimes might not be the best, and some are done by OCR with really bad quality). And this particular version is poorly formatted, with subsections and section titles in the wrong place. Also, this particular version I believe, is from a late 18th-century reconstruction version edited by You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.. There are 3 major "modern versions" compiled in 1799(孫本), 1844(顧本), and 1845(森本). And their differences are how they categorize the "Better quality"(上品), "Medicore quality" (中品), and "Poor quality"(下品). The 1799 version split them into 120/120/125 (total 365), and the 1844 version split them into 142/113/103 (total 358), and the 1854 version split them into 125/114/118 (total 357). For modern Chinese medicine practitioners, the 1844 version is usually considered more proper due to its categorization following the famous BenCao GangMu (You are not allowed to view links. Register or Login to view. published in 1596). 

The format of ancient Chinese printed books is very crucial, since they are often a collection of notes and references mixed in with the "original texts", and each version is basically a reference to a reference to a reference, that theoretically still contains the original texts. Here is an example of an entry from the 1799 printed version (孫本)
[attachment=10901]

The red marked lines are texts for the supposed original texts, and the orange lines are additional notes from the current author, usually denoting some alias or their opinions of the entry. The green lines are the sources from previous works (quotes and notes from particular authors/versions), and the blue lines are references to aliases and additional notes from these older sources. And as you can see, the format matters. Since there are no breaks or punctuation, the paragraph alignments, heights, and even font sizes are crucial in separating the source materials, reference sources, and different notes from various authors/versions. This also means, digitizing and showing these printed versions is a massive challenge, as well as very hard to do OCR automatically, even breaking the words into phrases and paragraphs needs extensive work.

As for the index and subsections, here are the index pages for the "Better quality" for each entry in the 1799 version
[attachment=10904][attachment=10902][attachment=10903]
The subsections of different types (like minerals, plants, different types of animals, fruits, etc.) are added after the listing of indexes (I marked in yellow), with how many of them in numbers (marked in green, the word 種 is the classifier), and their numbers in older sources (marked in light blue, the word 舊 means old, and 同 means the same, if there were no change in the number of types listed). The extraction of the subsections also required understanding the work itself, and definitely needs some work to insert them in the main text correctly.

For a more complete transcription, here is a version that mostly kept the format (the 1799 version).
You are not allowed to view links. Register or Login to view.

And here is the scan of a reprinted 1799 version in the 20th century
You are not allowed to view links. Register or Login to view.
(14-06-2025, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.the first page of that text is an introduction.  The "recipes" proper start after  the 《中卷》, and line 1 after that is the section title.



This is not the "very original" SBJ, compiled around 200 CE, but an "expanded" version of it that was written around 1400 CE.  Unfortunately it seems that the Chinese version of the former is lost, and what comes up on searches is usually the latter.  Sources say that the former version had 365 remedies, while the "expanded" version has more, but the author himself says many are additions.  But perhaps the "very original" version survives in other languages, and must have been around in the 1400s.



Thanks, Prof Stolfi for your knowledge, and I feel I can contribute a bit about the sources and evolution of BenCao in history.


The transcription of that particular version in the link is incomplete and poorly transcribed. And I think the error of listing the introduction as the volume 1 (上卷), and starting the recipes from volume 2 (中卷) came from that the 3 "modern versions" I listed in previous replies break down the volume (卷) differently. The 1799 version (孫本) breaks Better/Mediocore/Poor into 3 volumes with a "prefix"(序) in the back (and sometimes omitted), while the 1844 (顧本) and 1845 (森本) both break them into 4 volumes, and place the prefix (and their own prefix and descriptions) in volume 1, and then the rest 3 volumes for B/M/P qualities (they labelled the volumes different, 1799 and 1844 versions use Chinese numerals for volumes, and 1845 uses the quality B/M/P to name the volumes). If you don't know there are differences, and use one of the more "authoritative" 1844 version volume names, but content from the 1799 version, you mixed up the transcription badly.



As for the origin and "additional notes" as well as the expansion of the contents, here is an evolution map made by Prof and Dr. Okanish Tameto (You are not allowed to view links. Register or Login to view.)

[attachment=10906]



The oldest "whole texts" we know of came from a source around the 5th century called 神農本草經集注 or simply 本草經集註. It literally translates as "the collection and notes for BenCao", where BenCao means foundation or fundamental (本) for medicinal materials  (in ancient Chinese scripts 艸 literally just meant anything from nature/forest/the wild). Many famous physicians and medicinal practitioners in history would use the phrase for where to get materials to make medicines (this is sort of like the "dictionary" for traditional Chinese medicine), while they would also wrote other books dictating how to turn them into medicines, and how to apply them in practice, as well as the associated knowledges, or even drawings about the source materials as well as the intermediary and finish products. It is from the prefix of this 5th-century source that referencing the older sources the author used in the "collection", and he (the author is supposed to be You are not allowed to view links. Register or Login to view.) specifically mentioned he was the one to reorganize and recompile the older sources (he said there were sources with 595, or 431, or 319 entries before him) to 365 entries to match the days in a year (he was an astrologist and knowledgable in many classical arts, hence delibrately chose this number). Although the original is definitely lost to history, but since it was referenced so many times in later BenCao works that include its contents as references and footnotes, we can pretty much reconstruct its contents as a whole. 



In fact, since the 5th-century texts were so detailed in referencing older sources, we can find partial older texts written by "You are not allowed to view links. Register or Login to view." (sometimes called 吳普本草), within its footnotes (and the references to 吳暜 were kept throughout historical versions). We don't know much about 吳普, only that he was meant to be a disciple of a very famous physician You are not allowed to view links. Register or Login to view., who was recorded in historical texts as living around the 2nd and 3rd century (but his work was supposedly lost due to he lived in the Three Kingdom period and had a "disagreement" with the powerful warlard CaoCao). Hence, the often contribution of the "oldest" BenCao came from him in this time period (through his disciple 吳普). And the 5th-century texts of the collection already "expanded" the supposedly 2nd or 3rd-century works into 730 entries, where the author added 374 new entries to the supposedly recompiled 365 entries from the old sources (and recompiled the subsections into more detailed types). Later "reconstructions" and recompiled versions, often picked not just from the 365 ancient entries but also the expanded 374 "new entries" from the 5th-century work. (and different variations picked differently and categorized differently as well)



Here are some variations of the reconstructed 5th-century versions digitized

You are not allowed to view links. Register or Login to view.

You are not allowed to view links. Register or Login to view.

The earliest surviving copies of BenCao is called 新修本草 (translated as "new addition of BenCao"), where its prefix was signed as finished in 659 (顯慶四年 of the Tang Dynasty), and include 850 (or 844) entries. And it was supposedly supported and funded by the Tang royal court. It also supposedly had drawings of the materials in 7 volumes, as medicines in 25 volumes associated with it, however, none of these survived. Even the main text, only part of the volumes, survived in the format of handwritten transcriptions
 
Here is one version of this with volumes 3,4,5,12,14,15,17,18,19,20, and supplemental (describing the originals of the transcription) from the National Diet Library in Japan (transcribe fully in Chinese and supposedly since the Tang Dynasty, where only supplementals use classical Japanese)
You are not allowed to view links. Register or Login to view.


And here is a different version from the Beijing University Library (with volumes 4,5,12,13,14,15,17,18,19,20)
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

There is also another version in the National Palace Museum in Taiwan, but not open to the public.

These handwritten versions already use the format of different alignments and fonts to distinguish original sources, references, and additional notes, as well as using indexes with subsections in them. And they were followed by later works in printing versions.

The oldest surviving "printing press" versions were Song Dynasty versions, printed in the Yuan Dynasty around the 13th to 14th century.

You are not allowed to view links. Register or Login to view.

And you can find scanned versions here
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

Or reprinted versions with clearer texts 
You are not allowed to view links. Register or Login to view.

This is one of the "associated" volumes of drawings called 衍義本草 You are not allowed to view links. Register or Login to view. for the main text You are not allowed to view links. Register or Login to view. (which are recompiled versions following older structures and include references to older texts above, with many different variations dated from the 10th, 11th, and 12th centuries). Printing press technologies had been pretty mature at the time, and often multiple copies of the original survived to these days, with many reprint versions following different lineages. The core texts of the 5th century versions are mostly re-copied into them but modified in some ways from different variations. Here is a page of a screenshot of the scan from a 14th-century version.
[attachment=10907]
And Here is the transcription of the main entry 
"枸杞,味苦,寒。根大寒,子微寒,無毒。主五內邪氣,熱中消渴,周痹,風濕,下胸脅氣,客熱頭痛,補內傷大勞噓吸,堅筋骨,強陰,利大小腸。久服堅筋骨,輕身不老,耐寒暑。一名杞根,一名地骨,一名枸忌,一名地輔,一名羊乳,一名卻暑,一名仙人杖,一名西王母杖。生常山平澤及諸丘陵阪岸。冬採根,春夏採葉,秋採莖、實,陰乾。"

It is the entry for theYou are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view. (枸杞), and you can see that background colors for negative were used to highlight the name of the entry, and the font sizes to distinguish references/footnotes, alignment and paragraphs for different sources from different eras. And the drawing has titles indicating where to locate the materials using ancient regional/provincial names. (like here, You are not allowed to view links. Register or Login to view. was an ancient provincial name, indicating where they are abundant, and in a way also a type of map). Because the nature of the texts, constantly referencing older sources, the majority of the texts and footnotes are aliases of the materials' names in different times or places (you can see the repeated phrase 一名, it is the word for alias in classical Chinese). And each "sentence" in an entry is usually extremely short in terms of number of "words", usually just 2 or 3, even for the longest sentence like 生常山平澤及諸丘陵阪岸 can be broken down to 生(grow in)常山(a region name)平澤(flood plain)及(and)諸(various)丘陵(hills)阪岸(coast), consist only 7 "words".

Each main entry has a very structured format as well. It started with the "index names" that can be used to be cross-referenced from the index pages (just like a dictionary), followed by the main property of the materials (溫 寒, etc.), secondary properties (of various parts), followed by its uses, followed by associated symptoms could be applied to, followed by its effects and effectiveness, followed by the aliases and footnotes, followed by what types of the enviroment to find it (like in valley 山谷 or swamp/lakes 湖澤), followed by how to collected and manufacturing procedures (like drying, etc.). Although not every entry would have all the items listed or as detailed. If you read them out loud, they would already sound like some types of poetry or rhymes since many aliases have similar or just one syllable difference, and their properties would start and end with the exact same phrases. 

Since these variations of BenCao were pretty close to the 15th century, and would have many and multiple printed copies, I think if the Chinese Theory or related Far East sources were possible, they are the likely sources, or the likely mimicking targets of similar structures.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14