The Voynich Ninja

Full Version: sh_ and ch_ compose the same words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
Thank you for clarifying, Nick.
Since this thread focuses on word-initial benches, I did not think of considering occurrences in other positions. Unless I messed up something, it seems the histogram for all words-with-benches is rather similar to that for word-initial benches only.

[attachment=3726]

bench-d is considerably more frequent than for word-initial benches only and I agree that this deviation looks significant. Overall, with respect to immediately following characters, the two benches behave similarly but not identically: we have seen that this is the case also with other behaviours (e.g. the preference of ch-words to occur at line end). The idea that the two benches are totally identical and arbitrarily interchangeable would explain the similarity but not the differences. As you say, these differences are "incompatible with the suggestion that ch and sh are essentially expressing the same thing". The similarity is very strong in some respects and must be explained. Some differences are large enough to be significant and must be explained as well.
(22-11-2019, 02:27 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Probably unsurprisingly, I too have an interest in the question of ch vs sh: but at the same time I have a deep mistrust of low-level interpretations that first require Voynichese 'words' to be literally words (i.e. that they must necessarily follow an explicitly language-like grammar) - this seems to me to be something we should be testing rather than assuming.

Indeed, "they're not words!" (You are not allowed to view links. Register or Login to view.). 'word' is just a term to describe tokens separated by spaces (see You are not allowed to view links. Register or Login to view., p. 2).[/font]


(22-11-2019, 02:27 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.What isn't so well known (I think) is that chd and shd are vastly more common in B pages:
  • chd B = chd A x 20.74
  • shd B = shd A x 23.14

[font=Tahoma, Verdana, Arial, sans-serif]Thanks for pointing out another example. To be precise 'chd' and 'shd' are more common in Herbal B (see You are not allowed to view links. Register or Login to view.). [/font]

[font=Tahoma, Verdana, Arial, sans-serif]The most frequently used 'word' type containing a sequence 'chd' is <chdy>. It will probably not surprise you that the two most frequently used word types using a 'ch'-glyph in Herbal B are <chedy> and <chdy> and that the two most frequently used 'word' types in Herbal B which contain a 'sh'-glyph are <shedy> and <shdy>. For <chedy>/<chdy> as well as for <shedy>/<shdy> a positive correlation coefficient is computed. Therefore <chedy>/<chdy>/<shedy>/<shdy> also form a network of related 'words'[/font]

[font=Courier New]Pearson's Correlation(chedy[501],shedy[426]): +0.84 (n=225,p=5E-60)[/font]
[font=Courier New]Pearson's Correlation(chdy[150], shdy[46])  : +0.44 (n=225,p=7E-12)[/font]
[font=Courier New]Pearson's Correlation(chedy[501],chdy[344]) : +0.47 (n=225,p=7E-14)[/font]
[font=Courier New]Pearson's Correlation(shedy[426],shdy[46])  : +0.41 (n=225,p=2E-10)[/font]

[font=Courier New]                chdy  shdy  chedy  shedy word count[/font]
[font=Courier New]               ----- ----- ------ ------ ----------[/font]
[font=Courier New]Herbal     (A)     7     2      1      0      8,087[/font]
[font=Courier New]Pharma     (A)     1            1      1      2,529[/font]
[font=Courier New]Astro              1     2      4      0      2,136[/font]
[font=Courier New]Cosmo             17     4     24     17      2,691[/font]
[font=Courier New]Herbal     (B)    53    16     62     35      3,233[/font]
[font=Courier New]Stars      (B)    40    12    190    113     10,673[/font]
[font=Courier New]Biological (B)    23     7    210    247      6,911[/font]

[font=Courier New][font=Tahoma, Verdana, Arial, sans-serif]This example further confirms the general principle: [font=Tahoma, Verdana, Arial, sans-serif]The chance for a 'sh'-word to occur on a page increases as more often related 'ch'- and 'sh'-words appear on that page.[/font][/font][/font]


(22-11-2019, 02:27 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.PS: it's also not widely mentioned that shch and chsh are equally abhorred in both A and B (shch A = 6, chsh A = 4, shch B = 6, chsh B = 5), a strongly consistent behaviour

See my previous post You are not allowed to view links. Register or Login to view. about this topic.
You discussed Voynichese words containing both ch  and sh, true: but there you also asserted that ch and sh are constrained by the shape of the characters next to them. I find this hard to understand, given that ch and sh have such a strong affinity for being followed by e and ee, yet have such a strong phobia for being followed by sh and ch.

I'm therefore not sure I can accept this kind of shape-driven argument when ch and ee have such similar shapes.
(ignoring other aspects such as following e, ee, qo etc. because they will lead to other discussions, such as what is a word/vord and what are exceptions etc.)

So, I think it seems fair, to conclude that it's currently impossible to reach a consensus based on 
what we see on sh_/ch_ and what we can count in the VMS.
I am skeptical about the possibilities of a consensus, and even about its usefulness, but the interest has certainly been raised, and so has the awareness of many details.

The parallel set of ch- and sh- words seems more consistent than people (certainly I) believed.

The interesting idea that Sh could imply some abbreviation of ch plus something else seems not to be confirmed.
If there was a single item of Voynich study I wish there was consensus on, it would be (what I think is) the blindingly obvious fact that we should do separate studies for Currier A and Currier B at the very least.

If people can't even agree this as a starting point for sensible analysis, I don't really think there's much point in looking for other points of consensus.

Every time I see a statistic formed from the whole Voynich corpus rather than from just A or B, I shake my head.
(22-11-2019, 09:28 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.You discussed Voynichese words containing both ch  and sh, true: but there you also asserted that ch and sh are constrained by the shape of the characters next to them. 
I find this hard to understand, given that ch and sh have such a strong affinity for being followed by e and ee, yet have such a strong phobia for being followed by sh and ch.

I'm therefore not sure I can accept this kind of shape-driven argument when ch and ee have such similar shapes.
With the exception of 'i' and 'e' the same glyphs occur rarely repeated. So it comes as no surprise that 'words' like <chchy> or <shshy> are rare.
(23-11-2019, 03:48 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.So, I think it seems fair, to conclude that it's currently impossible to reach a consensus based on  what we see on sh_/ch_ and what we can count in the VMS.

Your statement surprises me. Where did you see any disagreement concerning what we see on 'ch'/'sh' or how we can count instances of 'ch'/'sh'?

For instance Currier wrote: 
"Herbal Section contains both Language 'A' and 'B'. The principal differences between the two 'languages' in this Section are:
(a) Final 'dy' is very high in Language 'B'; almost non-existent in Language 'A'.
(b) The symbol groups 'chol' and 'chor' are very high in 'A' and often occur repeated; low in 'B'
..." (Currier 1976). 
With other words Currier concluded that there is a positive correlation between 'chol' and 'chor'. As far as I can see everybody agrees to Curriers observations. 


It is possible to add some more detailed observations. For instance that on pages frequently using 'chol'/'chor' also 'shol'/'shor' are frequently used or that the four most frequently used word types starting with 'ch' or 'sh' in the Stars section are <chedy>, <chey>,<shedy>, and <shey> (see You are not allowed to view links. Register or Login to view.). As far as I can see nobody argues that this observations are wrong. On the contrary, Renés statement was "the average fraction of chedy / Shedy in the different B-language sections is not the same. This was already known of course ..." (see You are not allowed to view links. Register or Login to view.) and Nicks statement was "It's well known that chor (and shor to a large extent) are much more common in A pages than B pages" (see You are not allowed to view links. Register or Login to view.).
What a complex set of rules you posit were necessary to generate a meaningless text! EVA ee can follow ch and sh (indeed, this is almost mandated in B pages), but the almost-visually-identical ch and sh cannot. And we haven't even considered here the issue of why eech and eesh should be so rare.

At the core of Voynichese's behaviour lie a number of sets of deep-rooted adjacency asymmetries, of which the ch/sh complex is just one. Whatever the actual reason for those asymmetries, the claim that they are simply due to some kind of shape harmony seems to me to be grossly inadequate.
(23-11-2019, 10:36 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.EVA ee can follow ch and sh (indeed, this is almost mandated in B pages), but the almost-visually-identical ch and sh cannot.

Exactly, almost-viusally-identical glyphs normally do not follow each other. There are for instance only two examples for two 'gallows' in a row: <otkchedy> on You are not allowed to view links. Register or Login to view. and <chpkcheos> on You are not allowed to view links. Register or Login to view.. There are also only two instances of <ddy> and <ddor> (You are not allowed to view links. Register or Login to view.).


(23-11-2019, 10:36 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.At the core of Voynichese's behaviour lie a number of sets of deep-rooted adjacency asymmetries, of which the ch/sh complex is just one.

Exactly, there are some systematic asymmetries, of which the 'ch'/'sh' complex is just one. For instance, only 49 out of 612 'sh'-word-types (8 %) are used more frequently than the corresponding 'ch'-word-type  (see You are not allowed to view links. Register or Login to view.).


(23-11-2019, 10:36 PM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Whatever the actual reason for those asymmetries, the claim that they are simply due to some kind of shape harmony seems to me to be grossly inadequate.

The cause for this observations is another topic and will probably lead to other discussions. Therefore, I would suggest to use this thread to discuss only the observations.
Pages: 1 2 3 4 5 6 7 8 9 10