The Voynich Ninja

Full Version: It is not Chinese
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(16-06-2025, 11:54 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There have been anecdotal reports that Voynichese is not very convenient to write, at least not with a quill. [...] The most obvious examples are the gallows. If these are just 4 characters, then using a simple v shape rotated in 4 cardinal directions would create a much simpler script

A quill pen must be pressed lightly onto the paper/vellum, so that the two prongs are slightly spread apart, in order to get the ink to flow properly.  This is best done when pulling the pen in the general direction of the handle.  For a right-handed Scribe, that means preferably south-west, south, or west.  

Pushing the pen forward (in the direction opposite to the handle) or sideways risks snagging it onto irregularities of the paper/vellum, with undesired results.  To write strokes in those directions, the nib of the pen must be carefully shaped, and the pen must just touch the paper much more lightly than used in the "pull" strokes. As a result, strokes in those directions tend to be thinner and tapering, as the ink at the tip of the pen is quickly exhausted.  

Thus rotating a character by 90 degrees is not a good idea, as strokes in the "good" directions will become strokes in the "bad" directions.  The main strokes of the Voynichese glyphs, like e and the two sides of the o, are traced the SW, S, and N directions. The plumes of s, r, Sh, n, and part of the loops of d, mt, k etc seem to be traced mostly in the "bad" direction. 

Quote:I think someone in the past proposed that the shapes of the glyphs reflect the phonology in the way similar to Korean You are not allowed to view links. Register or Login to view., where "the letters for the five basic consonants reflect the shape of the speech organs used to pronounce them".

Indeed pairs of Voynichese glyphs with "similar" shapes like a and o, t and k, r and s seem to have similar next- and prev-glyph distributions, as if they were phonetically similar (both vowels, both consonants, both tone markers, a voiced/unvoiced pair, etc).

By the way, that table that I posted may be incorrect, in that the glyph that is the combination of an e stroke with an e stroke is probably Ch, not ee.  Likewise the combination of i with e would be Ih, not ii. Indeed, in Ch the ligature and the second e can be drawn as a single stroke h; so Ch would be another 2-stroke (not 3-stroke)  glyph, and Sh a 3-stroke (not 4-stroke) one.
(16-06-2025, 11:47 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:Wax tablets were also used in the Middle Ages. Here is an example from the Codex Manesse ( Cod. Pal. germ. 848 ):
You are not allowed to view links. Register or Login to view.
Nice! 

But is that really a wax tablet?  Considering the round tops, could that be a picture of Moses showing the Ten Commandments to the Hebrews?

Yes, referring to Wikipedia, it is a wax tablet in the hand of Gottfried von Straßburg.

[attachment=10829]
Wax tablet with semi-circular upper part.
(17-06-2025, 01:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.r and s seem to have similar next- and prev-glyph distributions

This is not true. If you examine my tables of character affinities from  You are not allowed to view links. Register or Login to view.  you will see that  r s do not have the same distribution with other characters pre- and post- .  r almost never comes after .  s likes to do so.

Neither do  a o have similar affinities.
(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Moreover, probabilities are not objective values that one can measure with some instrument or compute from scratch (in spite of what You are not allowed to view links. Register or Login to view. says on page 37).  The probability of a proposition is a measure of one's belief in it; and therefore it is inherently subjective, since it depends on what one knows, not only about the proposition specifically, but about life, the universe and everything.

I agree. And I'm not saying that your theory is impossible, under some circumstances, so your probabilities could be a better description of objective reality than mine. I think there is one practical caveat though, which depends on your plans with regard to the Chinese theory. There are generally two kinds of Voynich theories: the solution kind (providing some specific plaintext for specific parts of the MS, be it labels, lines, etc) and the origin story kind, of which your Chinese theory is an example. While the solution theories can be verified more or less objectively, by reproducing the process and computing various statistics to find how likely it's for the result to be spurious, the only more or less objective criterion I know for the origin theories is how persuasive they are. If you agree with this, then in a sense your probabilities are not really important, it's the collective probabilities of the readers that make the difference.

On the other hand, there are, as far as I understand, hundreds of Voynich MS theories happily enjoyed by their respective authors according to their subjective probabilities, without much of public interest.

(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Yes, but the "lot of flexibility" is really You are not allowed to view links. Register or Login to view.. For instance, the two lower strokes of 冬 can be You are not allowed to view links. Register or Login to view..

Yes, there is a lot of flexibility. For example, in You are not allowed to view links. Register or Login to view. cursive a lot of lines can be merged and simplified, but again according to some rules, as far as I know. Anyway, since we are talking about an imperfect copy, this doesn't really matter, but this also undermines the argument that:

(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.BOTH You are not allowed to view links. Register or Login to view. weirdos resemble upside-down Chinese characters more than characters of any other script that could be easily known in Europe at the time -- including Greek, Cyrillic, Armenian, Georgian, Runic, Hebrew, Arabic, Ethiopian, ...

As I already said, I don't think this is true. As far as I can tell, both left weirdos are much more similar to some glyphs of European scripts. I've You are not allowed to view links. Register or Login to view. in my previous post. But maybe I'm missing something here. Could you provide an example of Chinese characters that look like these weirdos? Preferably not any modern variants of the past 100 years, since there have been quite a few Chinese/Japanese modern font styles developed under strong influence of the European typesetting and calligraphy.

(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
  • the word structure (quite unlike that of "European" languages, but just like that of the "Chinese" languages);

What do you mean exactly by "the word structure"? In all flavours of Chinese a word /correction/ syllable consists of an initial sound, a vowel and a final, as far as I know. In Voynichese the structures of words /correction/ or parts of words are much more complex than that, and you certainly are aware of this, because there are very well known "Stolfi's" models of decomposing Voynichese words, that do not conform to a simple clean prefix-infix-suffix model. I'd say if Voynichese was a phonetic representation of Chinese of any kind, this would be very obvious.

(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
  • the lack of identifiable articles, copula verbs, inflections

There is lack of identifiable anything in the Voynich MS. Statistically it's not similar to any language, including Chinese.

Also, there are a lot of grammar particles in Chinese, which would have been just as easily identifiable as the articles in the European languages. 

(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
  • the relatively large number of duplicated words, like chor chor

I don't think this is a strong argument. Duplicated words are not a feature of Classical Chinese, as far as I know. In the manuscript that you mentioned as the possible source I've only found 43 instances of duplicated character tokens. There also have been much fewer homophones in the Classical Chinese, as far as I know, so there would be not that much more duplication in the phonetic representation. After removing punctuation, I've found 15 instances of repeated words in Opus Magus. So in this aspect there is no great difference between Chinese and Latin.

It's possible to argue that the actual text of the Voynich MS is not in the Classical Chinese, but in some Chinese vernacular that the Reader uses, and that already passed though syllable simplification which resulted in many duplicated words.

(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
  • the division of the Zodiac into 12 sectors of exactly 30 units, which may originally have been 24 sectors of 15 units;
  • the Zodiac starting with Pisces;

These are the only pieces of evidence for the Chinese theory so far that I personally would call specific. Is there a long form explanation of why these point to the Orient? I may have missed it. I understand that the Lunar year starts in Jan-Feb, etc, but on the other hand, why was there at all the need to put European Zodiac signs onto the Chinese Zodiac?

Quote:
  • and the similarity of the Starred Parags section with the Shennong Bencao Jing, which is being discussed separately.

Was this mainly about the number of entries? We don't actually know the number of entries in the Voynich MS.
(17-06-2025, 01:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thus rotating a character by 90 degrees is not a good idea, as strokes in the "good" directions will become strokes in the "bad" directions.  The main strokes of the Voynichese glyphs, like e and the two sides of the o, are traced the SW, S, and N directions. The plumes of srShn, and part of the loops of dmtk etc seem to be traced mostly in the "bad" direction. 

I have limited experience with quills, but I didn't mean I have none. I've prepared a few quills from scratch myself and practiced writing with them, including the Voynichese glyphs. So, I'm familiar with the mechanics of writing, although definitely not at the professional scribe level.

I think writing v in the 4 directions is quite easy with the quill, certainly not harder than writing l (especially the pointy kind) and considerably easier than writing any of the gallows characters. Which to me means that whatever goal the inventor of the script pursued, it was not stenography. Or if it was stenography, then the result is a complete failure.
When I think of a wax tablet, I think more of a notepad.
Once the ink is on paper or parchment, simple corrections are no longer possible.
There was also Moses, who was pulled out of the Nile.
(17-06-2025, 08:35 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There are generally two kinds of Voynich theories: the solution kind (providing some specific plaintext for specific parts of the MS, be it labels, lines, etc) and the origin story kind, of which your Chinese theory is an example.

I think it's both. 

I think the origin story is not at all convincing but that's largely irrelevant to whether the solution is correct or not, so it's a shame to get bogged down in arguments about it. Once we can read the manuscript there will hopefully be internal clues as to the why and who of its creation, but at the moment no speculation is falsifiable.

The solution part of the theory IS falsifiable. Jorge has even suggested a plaintext for the recipes section. Falsifying it won't be easy but also not impossible, if somebody is sufficiently motivated.
(17-06-2025, 02:27 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.The solution part of the theory IS falsifiable. Jorge has even suggested a plaintext for the recipes section. Falsifying it won't be easy but also not impossible, if somebody is sufficiently motivated.

I disagree. It may be theoretically falsifiable in the same way as a teapot orbiting the sun is theoretically falsifiable, but practically not. It has been suggested that the plaintext can be some older version of a known Chinese book possibly transcribed with mistakes from an unknown version of an Oriental language. How do you refute this? Other than providing a complete solution, which would falsify most competing theories.

On the other hand, if we assume a perfect transcription of the known text from Classical Chinese, then yes it is falsifiable and as far as I'm concerned my experiment with computing longest repeated contexts a few posts ago did falsify it.
Here are some advances in the comparison between the Starred Parags (SPS) section and  the Shennong Bencao Jing (SBJ).  Recall that the files are:
  •   You are not allowed to view links. Register or Login to view. The Starred Paragraphs section (SPS) from Takeshi's transcription in the 1.6e6 interlinear file, from page You are not allowed to view links. Register or Login to view. to line 30 of f116r. With one parag per line, in the EVA encoding, with all alignment fillers and comments removed, all weirdos and missing chars mapped to '*', one "=" at start and end of each line (= parag).
  •   You are not allowed to view links. Register or Login to view. The SBJ from the webpage posted by @oshfdk, minus the introduction 《上卷》 and section headers (see below), converted to pinyin by Google Translate, mapped to lowercase.
Both files are in UTF-8 encoding.  Again, if you just click on those links you will see gibberish, because the server at my Univ expects plain text files to be in ISO-Latin-1 and thus messes up the formatted HTML that it sends to your browser.  You will have to download the files and look at them with any text editor or viewer that understands UTF-8.

While analyzing the number of words per paragraph in the SBJ file ("bencao.pin") posted earlier, I noticed that there were several parags with only 3--5 Chinese words.  It turns out that those are subsection headers. Here they are.  The locus 1.X.YYY means that it is subsection X of section 1 《中卷》starting at line YYY.  The notation 2.X.YYY is analogous but for section 2 《下卷》

Code:
  1.1.001 玉石部上品  yùshí ù shàngpǐn         Top grade jade                               
  1.2.019 玉石部中品  yùshí bù zhōng pǐn       Jade department middle grade               
  1.3.033 玉石部下品  yùshí bùxià pǐn          jade subordinate product                   
  1.4.044 草部上品    cǎo bù shàngpǐn          Top grade grass                             
  1.5.102 草部中品    cǎo bù zhōng pǐn         Kusanabe middle grade                       
  1.6.162 草部下品    cǎo bùxià pǐn            The lowest grade of grass                   
  1.7.219 木部上品    mù bù shàngpǐn           Top grade wood                             
  1.8.234 木部中品    mù bù zhōng pǐn          Kibe middle grade                           
  1.9.253 木部下品    mù bùxià pǐn             Kibe inferior grade                         
  2.1.001 蟲獸部上品  chóng shòu bù shàngpǐn   Top quality insects and beasts             
  2.2.017 蟲獸部中品  chóng shòu bù zhōng pǐn  Insect and animal department medium quality 
  2.3.042 蟲獸部下品  chóng shòu bùxià pǐn     Insect Beast Subordinates                   
  2.4.069 果菜部上品  guǒcài bù shàngpǐn       Top quality fruits and vegetables department
  2.5.080 果菜部中品  guǒcài bù zhōng pǐn      Medium range of fruits and vegetables       
  2.6.087 果菜部下品  guǒcài bùxià pǐn         Fruit and vegetable products               
  2.7.091 米穀部上品  mǐgǔ bù shàngpǐn         Top grade rice cereals                     
  2.8.094 米穀部中品  mǐgǔ bù zhōng pǐn        Mid-grade rice                             
  2.9.098 米穀部下品  mǐgǔ bùxià pǐn           The inferior product of Rice Valley         
The pinyin readings and translations are from Google Translate. I left them  unedited for the lulz.

After commenting those header lines out, the shortest remaining entry seemed to be normal:
Code:
  2.3.044  鼯鼠 主墮胎,令易產。  wú shǔ zhǔ duòtāi, lìng yì chǎn
    Flying squirrel: causes abortion and makes childbirth easier.
(If that can be called "normal"...)

And then I noticed that the Starred Parags file ("starps.eva") too had a few anomalously short parags of ~4 Voynichese words.  Those were so-called "titles", short lines with anomalous justification:
Code:
  <f105r.T1.9a>    =sairy.ore.daiindy.ytam=
  <f105r.T2.36>    =otoiis.chedaiin.otair.otaly=
  <f108v.T1.52>    =olchar.olchedy.lshy.otedy=
  <f114r.T1.34>    =ytain.olkaiin.ykar.chdar.alkam=
The title <f114r.T1.34> is a right-jutified line after a parag that ends with a full line. It had been assumed to be the last line of the previous parag that the Scribe skipped and then inserted in that non-standard position.  However, the first line of the next parag <f114r.P1.35> bends down to avoid that title.  Thus, if that conjecture is true, the Scribe must have realized the omission after writing the firat 4 lines of <f114r.P1.35>.  I have now re-interpreted <f114r.T1.34> as a title.

It is possible that other section headers were not recognized as such and were joined with adjacent parags.

After commenting out the subsection titles on both files, I counted again the number of words and parags, and basic statistics (min, max, average, and standard deviation) of the number of words per paragraph (nwp):
Code:
    statistic  !  bencao !  starps
    -----------+---------+--------
    parags     |     354 |     330
    words      |   10874 |   10457
    min nwp    |       7 |      11
    max nwp    |      76 |      72
    avg nwp    |    30.8 |    31.7
    dev nwp    |     8.5 |    11.2
 

Here is the histogram of the word counts (nwp):

[attachment=10830]
 
At first sight the histograms are different, but there are some intriguing similarities.  Note that both files have 23 entries with 27 words (the most common entry length in both files), six entries with 23 words, 8 entries with 37 words, 2 entries with 47 words, one entry with 53 words, one entry with 59 words, and one entry with 62 words.  In both files, there are anomalously few entries with 23, 37, and 43 words.

Considering the missing bifolio in the SPS quire, we have 6 surprising near coincidences: number of entries, and the mode, min, max, average, and deviation of the number of words per paragraph. (The total number of words is not an extra coincidence since it is the average npw times the number of entries.) 

Compared to the SBJ, the SPS has a somewhat broader npw histogram, as implied by the standard deviation.  It has more entries with 10-20 words and 35-70 words, and fewer with 21-34 words.  In particular, the SBJ has a second mode: 23 parags of 34 words, whereas the SPS has only 11. 
 
These discrepancies could be the result of the some word spaces being incorrectly inserted or omitted in the SPS as it was digitized; somewhat at random, with almost the same probability. 
 
Alternatively, some parag breaks in the SPS may be wrong, causing, for example, two consecutive parags that should have 22 and 32 words to become parags of 16 and 38 words; and two parags that should have 7 and 76 words to become parags with 13 and 70 words.
 
Both kinds of errors would have little effect on the average npw, but would increase its standard deviation, as observed.
 
There is also the bonus coincidence of both files having originally subsection titles with ~4 words each, althout the number of such titles is vastly different.  More on that later.
 
Now for the bad news. As @oshfdk observed, there are hundreds of multiword sequences  that occur many times in the SBJ.  In particular, there is a 10-word phrase  that occurs six times, on six consecutive lines:
Code:
    久食輕身不老,延年神仙。一名
    <s1.4.045> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.046> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.047> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.048> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.049> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.050> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    Eating it for a long time will make you light and immortal. It is also called
[code]
In contrast, the longest phrases that occur more than oncein the SPS  have only 3 words; and the most common occurs only three times:
[code]
    <f103r.P1.52> chedy.qokeey.qokeey
    <f108v.P1.44> chedy.qokeey.qokeey
    <f112v.P1.15> chedy.qokeey.qokeey
I will discuss the implications of this difference in another post.
(17-06-2025, 01:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(16-06-2025, 11:54 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There have been anecdotal reports that Voynichese is not very convenient to write, at least not with a quill. [...] The most obvious examples are the gallows. If these are just 4 characters, then using a simple v shape rotated in 4 cardinal directions would create a much simpler script

...

Thus rotating a character by 90 degrees is not a good idea, as strokes in the "good" directions will become strokes in the "bad" directions.  The main strokes of the Voynichese glyphs, like e and the two sides of the o, are traced the SW, S, and N directions. The plumes of s, r, Sh, n, and part of the loops of d, mt, k etc seem to be traced mostly in the "bad" direction. 

Still it's interesting that the vowels a,e,o,u in Giovanni Fontana's cipher (1420 ca) were rotations of the same simple shape. Example from You are not allowed to view links. Register or Login to view. cesto da uoue=egg basket
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14