The Voynich Ninja

Full Version: sh_ and ch_ compose the same words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
Many thanks @nablator,

my suspicion is largely confirmed. In the plot below, the dots have been colour-coded using the same convention as on the bottom of this page: You are not allowed to view links. Register or Login to view.
- light blue / cyan for Herbal (B) pages
- pink / magenta for biological
- green for cosmological
- dark blue for stars / recipes.
The herbal-B pages near the end of the MS are in open light blue circles, but they seem to match the other herbal B.

[attachment=3710]

The correlations for the individual clouds are a bit different, but for the biological pages it is still distinctily non-zero.
(15-11-2019, 09:47 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Many thanks @nablator,

my suspicion is largely confirmed. In the plot below, the dots have been colour-coded using the same convention as on the bottom of this page: You are not allowed to view links. Register or Login to view.
- light blue / cyan for Herbal (B) pages
- pink / magenta for biological
- green for cosmological
- dark blue for stars / recipes.
The herbal-B pages near the end of the MS are in open light blue circles, but they seem to match the other herbal B.

The correlations for the individual clouds are a bit different, but for the biological pages it is still distinctily non-zero.

Thank you Rene!
At first sight, it seems to me that also the light-blue (HerbalB) and green (cosmo) sections show correlation: they are penalized by the scale of the plot, but the shape of the clouds seems to me to be similar to that of the biological pages.
The blue (stars) section looks to behave quite differently.

I will also add a chedy/shedy percentage plot based on Nablator's spreadsheet. I compare it with Torsten's absolute counts plot.



Torsten measured a correlation coefficient of 0.84
By removing pages at 0,0, correlation drops to 0.74
By also considering % instead of absolute counts, correlation is 0.57

As you wrote above, the correlation for random word order with percentages should be close to zero.

[attachment=3711]

I also think that the cumulative plot of ch- vs sh- counts at the end of You are not allowed to view links. Register or Login to view. by Torsten is significant. Here we are back to statistics on the whole manuscript, not page-based. Here is a similar plot where word couples are labelled.

[attachment=3714]

Here is the detail of the area closer to 0,0:
[attachment=3715]
Da im VM das alemanisch erwähnt ist, muss ich das ( ch und sh ) auch so begutachten.
In diesem Fall ist es so als würde ich Birnen mit Kartoffeln vergleichen.
Für jemanden der die Sprache ( Dialekt ) nicht versteht, ist es sicher schwer.
Beispiel:
Geschrieben, Stern / geprochen, Schtern.
Stuhl = Schtuel
Chern = Kern, ch=k / Käse = Chääs
Chasch = Kannst Du / und ist eine Frage
( Sch / Ch ). Hier ist die Differenz "S". kleiner Effekt, grosse Wirkung.


Since the VM mentions this in Alemanic, I have to examine it ( ch and sh ) as well.
In this case it is like comparing pears with potatoes.
For someone who does not understand the language ( dialect ), it is surely difficult.
Example:
Written, star / spoken, yesterdays.
Chair = stool
Chern = core, ch=k / cheese = cheese
Chasch = Can you / and is a question
( Sh / Ch ). Here the difference is "S". small effect, big effect.
If the electronic translator is already struggling, how can I explain the whole thing?
@Rene
Chasch Schach ?  Big Grin
@nablator is of course right about what happens to the pages with (0,0) counts.

Interestingly, and as can be seen from the colour-coded plot, the average fraction of chedy / Shedy in the different B-language sections is not the same. This was already known of course, but it plays a role now.

When making the comparison for each 'sub-cloud' with respect to its own average, we would get the following plot. I put the two groups of herbal-B pages together.

[attachment=3716]

The clouds have now all been centred around the origin. The individual correlations per cloud are:

Herbal-B:       0.302
Biological:      0.614
Cosmo:        -0.297
Stars (Q20): 0.174

It is to be noted that the four centre points are strongly correlated, even though in the Biological section overall there are more Shedy than chedy.

The value for Cosmo is not significant, because there are only six samples.

Taking into account You are not allowed to view links. Register or Login to view.  from wikipedia, also the Herbal-B result based on 32 samples is only marginally significantly non-zero.
(15-11-2019, 12:31 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(14-11-2019, 03:39 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.Let me ask you Torsten, in your view, could you answer :

from the medieval manuscript viewpoint, for a random hypothetical invented alphabet

If we would use the letter c similar to the letter k, but have a slight preference for one of them,
would you then more likely see if they are 

a) different
b) they behave similar

c) neither

I would expect that both letters are interchangeable and that in one context letter 'c' is used and in another context letter 'k'. It would for instance surprise me to see multiple instances of 'color' and 'colour' in the same paragraph and I would find it stunning to read 'color' and 'colour' along with 'kolor' and 'kolour'.


I thought you would say that. If we estimate the level of experience and intelligence of the author of the VMS to an equally Medieval text, you will find that such written text, at least in the occult genre, are very inconsistent and much words are used inconsistently.

You will very frequently find in the same text, that every possible variation is used, color , colour , kolor, collor, kolore. and such.

We can not draw conclusions based on (in)consistency on a compact level, but we need to look at the big data.
Fortunately there are many pages and many examples, and decision making based on the paleographical elements in the VMS (the specific strokes) or the spelling of words, in specific instances will only allow us to make logical decisions, if we also look at all other instances.

For example if we see ok-chor- everywhere and find only 3 instances ot-chor, we can simply assume those 3 times were a slip of the pen.
(ok-chor and ot-chor have almost an equal count, something like 18 and 20, which is very usual for these ch-sh words)
I understand this is difficult to work with, because exact numbers and edit distance must be applied based on subjective findings, and not on hard numbers,
but it is the only way to come to the ultimate code-set for the VMS.
I also get exactly the same correlation ( +0.5714 ) for the percentages of chedy and Shedy per page.

Here is the colour-coded plot for these percentages:

[attachment=3717]

The correlations for these groups are:

Herbal:    0.3275
Biolog.:    0.6301
Cosmol.: -0.0642
Stars:       0.1469

These are largely the same values as found for the differences, according to the definition of @nablator
Thank you, Rene! These findings were totally unexpected to me: wonderful! I think (but I may be totally wrong) that the overall correlation of occurrences in pages is due to two distinct causes:
  1. inter-section: section "clouds" roughly align along a line; I believe that in this way also the uncorrelated blobs of "stars" and "cosmo" can contribute to the overall correlation;
  2. intra-section: the contribution of the "bio" section that exhibits local page-level correlation.
What do you think could be the causes of the different behaviour of the various sections?
This idea probably makes no sense: could it be that intra-section correlation (point 2) only appears in the bio section because it contains enough chedy  / shedy per page?

Of course, it will be interesting to see how other couples of frequent bench-words behave.
Yes, I agree it is necessary to make more comparisons in order to be able to find an explanation for all this.

The first thing that comes to mind is that this is an indication of arbitrariness. Like pulling words arbitrarily out of a hat. Or, as if the difference between ch and Sh is meaningless. This way of thinking causes other problems, e.g. in the area of entropy, and one would be pushed into the direction that Voynich words are not complete words, but verbose renditions of something smaller.

'Verbose' of course implies that there is a meaningless component in the text, but it is not all meaningless.

The difference between ch and Sh could also be 'trivial' rather than meaningless. What do I mean with that?
If the text is an encoding of some plain text, then the plain text was of course a handwritten text. The curl on top of the Sh could be a representation of a serif. Just one more idea....

What I didn't show is what happens if one splits the stars section (quire 20) into two parts as indicated on this page: You are not allowed to view links. Register or Login to view.

The group of bifolios 103+116 , 107+112, 108+111 is more like the biological text than the other three, so I called the two 'dialects': Stars-Bio and Stars-B.
With that split, the correlations of the sub-clouds for both is almost exactly 0. (0.002 and 0.015 respectively).

Also, the ratio of chedy : Shedy is:
127 : 83 for Stars-Bio (which is almost exactly 3:2)
58 : 29 for Stars-B (which is exactly 2:1)

This is perhaps, or even probably, coincidental, but there are other, similar considerations for the general difference between B and A language, and the mysterious absence / abundance of words including ed .

Just some random thoughts...

--- Edit: two more thoughts ---

The effect of varying page length, which causes an artificial positive correlation, seems to be effectively eliminated by two different methods:
1) the one proposed by @nablator, by subtracting the expected values based on some average
2) by computing the percentages

While they are different, the resulting correlations are quite similar (certainly equivalent), and any remaining correlation is clearly significant.

The word pairs with ch / Sh largely have a ratio of almost 2:1, but the most frequent pair chedy : Shedy is clearly an exception. This is mostly caused by the biological section, which again turns out to be 'different' in yet one more respect. It has always been seen as the most repetitive of all texts in the MS.
Pages: 1 2 3 4 5 6 7 8 9 10