The Voynich Ninja

Full Version: sh_ and ch_ compose the same words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
(10-11-2019, 02:12 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view....

The amount of votes in the poll, shows that interest is low,...

I have noticed that whenever a thread is purely about the text, that participation drops to only a handful of people compared to threads about the imagery.

I am very interested in the text.
Even if EVA-sh and EVA-ch look like the same thing one has to make a huge step to prove that they are the same thing.
BTW, the same holds for Torsten's auto-copying theory: it's not clear how "text can be written using auto-copying method" implies "text is written using auto-copying method". This sort of implication is the most difficult and fundamental part of each hypothesis and any method or idea which helps to deal with it is highly appreciated. This also answers the point about low popularity of text-related threads...
We don't even know how to define ch words and sh "words". We don't know what constitutes a "word" in Voynichese or if there are any in the linguistic sense.

Just because two letters are adjacent doesn't mean they have the same function.

For example, consider the English words "morph" and "mophead" (mop-head). The "ph" in the first one is not interpreted or parsed the same as the "ph" in the second one, even though the letters m o p and h are in both words and are in the same order. Even though "ph" and "ph" look they same, they have different affiliations with the other letters and they are pronounced differently, as well.
In the topic You are not allowed to view links. Register or Login to view.  # 11 and 18, I pointed out the variety of options for the location of the apostrophe above the benches and the use of other symbols as an upper index. Unfortunately applications are broken. Here are the new app addresses. You are not allowed to view links. Register or Login to view.
  You are not allowed to view links. Register or Login to view.  (I would be grateful to the administrators if they can insert these links in the original posts).
On the other hand, in the topic You are not allowed to view links. Register or Login to view.  I gave examples of the diversity of the use in the form of legs of benches of various symbols.
If we jointly analyze all these examples, it is impossible to exclude the possibility that each stroke is an independent symbol.
Therefore, my views cannot be described by the three proposed answers in this survey.
But, of course, there is a visual similarity (interchangeability) of the ch and sh benches, which can be explained by a variable vowel in the root of the word (in Russian, рост / раст, зор / зар ...) or the appearance of umlaut (for the German language).
 
(10-11-2019, 07:05 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(10-11-2019, 02:12 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view....

The amount of votes in the poll, shows that interest is low,...

I have noticed that whenever a thread is purely about the text, that participation drops to only a handful of people compared to threads about the imagery.

I am very interested in the text.

Davidsch and -JKP-, I just wanted to say I feel your pain on this. The text is most of what holds my interest in the VM. My likelihood of coming back to a Voynich researcher's blog increases proportionally to how much they focus on the text. René and Nick's blogs are just hands down the best resources for *general* VM info. But I've found you two gentlemen's blogs, plus Emma's and formerly Stephen Bax's, the most fun to read and think about. I'm a very verbal and auditory thinker, not so much visual. Plus I don't have the background in medieval European art, literature, or history to add much to discussions of the imagery. I have, however, been an amateur linguist and geographer my whole life, who really appreciates the art of writing, as in scripts and calligraphy.

What I'm discovering, and am humbled by, is that understanding the imagery is likely our only hope for understanding the text. What is standing in the way of decoding the VM is too little context. And if we find the fruits of -JKP-'s labors convincing, and accept that the VM's script is synthetic and likely idiosyncratic, that leaves the imagery and the physical characteristics of the book as our only leads for more context.
To check if there is a relation between words containing 'sh' and words containing 'ch' it is only necessary to count them. [font=Tahoma, Verdana, Arial, sans-serif]We all talk about the same manuscript no matter what we think about it. It should be no problem to agree to facts like that the words <daiin> and <dain> exists seven and six times on [/font][font=Tahoma, Verdana, Arial, sans-serif]page You are not allowed to view links. Register or Login to view. and that the words <chol> and <shol> exists five and four times on page f1v. [/font]

for each shWord in shWords {
   shTypeCount++;
   shTokenSum += countTokens(shWord);
   chWord <- shWord.replaceAll('sh', 'ch');
   if (chWords.contains(chWord)) {
      chExistsTypeCounter++;
      chExistsTokenCounter += countToken(chWord);
      if (countToken(shWord) > countToken(chWord)) {
          shTypesMoreFrequentCounter++;
      }
   } else {
       chNotExistsTypeCounter++;
       shNotExistsTokenCounter += countToken(shWord);
   }
}
print(shTypeCount, shTokenSum, chExistsTypeCounter, chExistsTokenCounter, chNotExistsTypeCounter, shNotExistsTokenCounter, [font=Courier New]shTypesMoreFrequentCounter);[/font]

This is the result:
- there are 1211 'sh'-word-types with 4458 'sh'-word-tokens
- for 612 out of 1211 (50.5 %) 'sh'-word-types the corresponding 'ch'-word can be found
- for 3814 out of 4458 (85.8 %) 'sh'-word-tokens the corresponding 'ch'-word exists
- for all 'sh'-word-types with at least 4 tokens also the corresponding 'ch'-word can be found
- only 49 out of 612 'sh'-word-types (8 %) are used more frequently than the corresponding 'ch'-word-type

This means, as more frequently a 'sh'-word-type is as more likely the existence of the corresponding 'ch'-word-type becomes. If there is a corresponding 'ch'-word, it typically occurs more frequently or at least equally frequently. There are 49 exceptions from this rule: <sho> (130 times) occurs more frequently than <cho> (68 times), <sheedy> (84) more than <cheedy> (59), <dshedy> (36) more than <dchedy> (27), <she> (25) more than <che> (2), <shee> (13) more than <chee> (1), ... 

There is a reason for this result. The chance for a 'sh'-word to occur on a page increases as more often the corresponding 'ch'-word appears on that page:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

Sorry, but we are bound to the facts even if we don't like the outcome.
Those are interesting data Torsten. But to interpret them is something else.. It could be due to something like auto-copying where the presence of one form on a page will more easily spawn the other one. But it could also be seen as prefixes to topical words, or word-initial morphological variations.
Torsten's post ended with:

Quote:Sorry, but we are bound to the facts even if we don't like the outcome.

and this is quite misleading. The counts are the facts, and speculation about causes and effects are not facts.

All of the statistics presented in the post are based on the text of the entire MS. Then the reason is presented:

Quote:There is a reason for this result. The chance for a 'sh'-word to occur on a page increases as more often the corresponding 'ch'-word appears on that page:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

This is the first time that statistics 'per page' enter the argument, so it is not based on what is written before.

The evidence seems to be the two links to voynichese.com that show the distribution per page of the most frequent words including 'ch' and 'sh'.

Going back to the facts, words including 'ch' are roughly twice as numerous as words including 'sh'. More than that, this holds true for similar word patterns, so typically for all word types including a 'ch',  by replacing 'ch' with 'sh' one tends to finds half the number.

This is 'across the board'. It is an observation, regardless what process is at the origin of this behaviour. Now statistics are always more reliable when based on larger numbers, and when looking at individual word types, most of the time we are working with relatively small samples. For any word pattern including a 'ch', that appears N times, the expected frequency of the corresponding sh-word seems to be roughly N/2, but in reality there is a distribution around that value. In some cases, it will be less than N/2 and can even be zero. In other cases it will be more than N/2 and can even be greater than N. That these things happen is precisely what is shown by Torsten's statistics. There is a small number of words where the sh variant is more frequent than the ch variant. This says nothing very specific about the process that is behind the appearance of these words.

Also, it is shown that this happens for the text throughout the MS. Nothing is said about the behaviour per page. Of course, since a single page is a much smaller sample of text, one should expect a much greater dispersion of the statistics.

This is where the facts end.

Again, the only suggestion for the behaviour per page is given by the outputs of voynichese.com 

It is probably worth looking at them in detail to see if there is any evidence of the suggested cause. I find that there are plenty of pages where it does not hold even remotely, even for these very frequent words.
(11-11-2019, 11:49 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.To check if there is a relation between words containing 'sh' and words containing 'ch' it is only necessary to count them.

There is no doubt that EVA-sh is somehow related to EVA-ch (at least because they belong to the same alphabet Big Grin ). The question is how are they related. Let's do some calculations related to original question of the thread.

Let's assume EVA-sh and EVA-ch represent the same thing, and author chooses one of them randomly with probability 50% each. I used MarcoP's graph, You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. tool for calculations.
We have 427 occurences for EVA-shedy and 501 for EVA-chedy. The probability that out of 928 equally probable events the outcome of one of them is 427 or less is 0.83%. Say 1 to 100. That not too optimistic, but not so improbable itself. The thing is that when we take EVA-shol and EVA-chol, and then EVA-shey and EVA-chey, we will get roughly the same results, and since our three tests are independent we have right to combine them.. Our 1 to 100 quickly becomes 1 to 1000000 and a lot of words still coming...

Ok, let's assume the probability value was wrong. Take three first word pairs from MarcoP's graph and assign probability manually. I've got 42% for EVA-sh words. But take EVA-sho (130 occurences) and EVA-cho (69). The probability that 42%-percent event will happen 130 times out of 199 is 0.0000000029%.

Well, maybe try yet another probability? One can try all possibilities, but for me it's obvious that every choice will fail. Eva-sho/cho and EVA-shedy/chedy stats are very likely to be incompatible, seems that their confidence intervals hardly intersect.

So the choice between EVA-sh and EVA-ch isn't random. If so it depends on context. But given the same context (EVA-edy) only one of EVA-shedy/chedy should exist. But who said the context is only next symbols? It maybe previous ones, or even ones from previous paragraphs, or missed vowels in case of abjad? But then EVA-sh and EVA-ch appear to be not so similar as it seemed before...
It was my impression that Davidschs' original post was about words that begin with sh_ or ch_,
taking that to be the case then i agree completely with nablators' comments in post#4 and MarcoPs' post#5.

Using Takahashi transcription:
ch_ words 5901 :: set of ch_ words 1053 :: Avg len ch word: 7.01329534662868
sh_ words 3207 :: set of sh_ words 535   :: Avg len sh word: 6.766355140186916

Producing a difference in average length of about 0.3. Agreeing nicely with nablators number.

As for sh_ and ch_ behaving the same.
You are not allowed to view links. Register or Login to view.

There appears a marked difference between the two in the cosmological section with ch_ appearing much more often within the circles.

As an observation for futher research ( i didnt do any stats)  sh_words seem to appear earlier in the lines than ch_.
sh_ words seem to like being the second word, f79r, f93r, f103v, You are not allowed to view links. Register or Login to view. are notable.

Finally if sh_words were abbreviated versions of the same ch_ words. Then the text in its decompressed form would yield a remakable amount of alliteration.
Pages: 1 2 3 4 5 6 7 8 9 10