Pages: 1 2 3 4 5 6 7 8 9 10
(17-11-2019, 09:58 PM)farmerjohn Wrote: You are not allowed to view links. Register or Login to view.And if one thinks that EVA-sh and EVA-ch are the same based on correlation values, he must be very cunning when explaining almost equally high correlation coefficient for EVA-ok and EVA-qok 
Indeed there is not only a relation between 'ch' and 'sh'. There are far more relations. One of them is indeed between 'ok' and 'qok'.
I also agree with your You are not allowed to view links.
Register or
Login to view. that word pairs with high correlation coefficients are quite normal for the Voynich manuscript (see Timm & Schinner 2019, p. 2ff).
Wasn't there a theory that the biological section was the first written? - so in that case the chronological order could be the reverse of that table?
After digesting this thread and doing some searches myself on Voynichese.com, I think we have enough evidence to say that the digraphs [ch], [sh], [ee], [se], [es], [c*h], and [e*e] belong classed together, in terms of their observed behavior / occurrence. I don't think we have enough evidence to conclude definitively that any of these are completely equivalent to each other, let alone speculate on what any of them mean symbolically, if anything. But it's clear enough to me that every vord has between 0 and 6 "c-curves" in a row, and that this can be seen as a fundamental property of a vord. These c-curves, along with an optional gallows or [d], form the core of a vord, and have a semi-complicated set of rules about how they're connected in pairs to each other with crossbars.
If the VMS is a conlang or some similar kind of symbolic classification system, the number of c-curves could be one of the values that each vord encodes, and could give us some clues as to what kind of information the system is designed to record. Or, if the VMS is meaningless and its vords artificially generated, this could mean that one of the dice rolled or one of the volvelles spun determines the number of c-curves.
I do not agree RenegadeHealer with your big picture, such is much too soon to conclude. I want to focus on this smaller group first.
After returning back here after a whil, I must admit, there is a lot of data in this thread, and I am now a bit lost in the discussion.
Much more info came out of the posts than expected.
I wonder if it could be possible to examine the same text-parts regarding ch_/sh_ correlations
if we would NOT accept Voynich vords, but glue the text together, so spaces can not be trusted.
(13-11-2019, 09:37 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Additionally I have calculated the token frequencies for all 612 ch/sh word pairs for all folios. The resulting correlation coefficient is +0.55. To make comparison easier I have also calculated the correlation coefficient for the word frequencies for all 612 ch/sh word pairs for the whole MS. This correlation coefficient is +0.93.
Pearson's Correlation(count(chWords@folios),count(shWords@folios)): +0.55 (n=225 folios*612 word pairs=137700,p=too close to zero to calculate)
Pearson's Correlation(count(chWords@VMS), count(shWords@VMS)): +0.93 (n=612 word pairs,p=4.54E-268)
Hi,
Torsten.
I have some doubts about using correlation in these two cases.
To my mind correlation is applicable when we have some repeatable process (with two outputs) and each new iteration is independent of the previous one. These properties must hold: we can freely add measurements; or can take a subset of existing measurements - correlation coefficient in all cases will be the approximately same, just more or less precise. In simple words, correlation value can be calculated at any time from any place in any direction. So calculating "per page" is acceptable here, and new pages would be only to the good here.
But when we calculate "per words" we not only manually have chosen words, but also the only new measurements to add are {0, 0} (and all of them are absolutely legible: EVA-sheaktoiiin/cheaktoiiin are also words, why not to include them?). So there the resulting value does depend heavily on the selected data.
(21-11-2019, 01:05 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.I do not agree RenegadeHealer with your big picture, such is much too soon to conclude. I want to focus on this smaller group first.
After returning back here after a whil, I must admit, there is a lot of data in this thread, and I am now a bit lost in the discussion.
I could be wrong. I plan on crunching and tabulating data to try to support my claim sometime this week or next. I'll post it here, with my tentative interpretation, as soon as I have it finished. I'll probably start a new thread, since my hypothesis differs slightly from the title of this thread, even though it is inspired by it.
(17-11-2019, 02:21 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.These are the plots for the next four more frequent ch- words vs their sh- counterparts.
Your graphs for the next four more frequent words chol/shol, chor/shor, chey/shey, and cheey/sheey are fantastic. Not only the plot for chol/shol looks similar to the plot of chor/shor, but also the plots for chey/shey and the plot for cheey/sheey. This is not a coincidence. The most frequently used words in Herbal A beside <daiin> are <chol> and <chor>. Moreover, the correlation between <chol> and <chor> is as high as the correlation between <chol> and <shol>. And beside <chol>/<chor> also <shol>/<shor> are frequently used in Herbal A and also for <shol>/<shor> a positive correlation coefficient is computed.
Note: An observation of Currier for chol/chor was: "The symbol groups 'chol' and 'chor' are very high in 'A' and often occur repeated; low in 'B'" (You are not allowed to view links. Register or Login to view.).
Pearson's Correlation(chol[396] ,shol[186]) : +0.44 (n=225,p=4E-12)
Pearson's Correlation(chor[219] ,shor[97]) : +0.22 (n=225,p=5E-11)
Pearson's Correlation(chol[396] ,chor[219]) : +0.42 (n=225,p=5E-11)
Pearson's Correlation(shol[186] ,shor[97]) : +0.22 (n=225,p=5E-11)
chol shol chor shor word count
----- ----- ----- ----- -----------
Herbal (A) 228 104 155 63 8,087
Pharma (A) 45 11 24 5 2,529
Astro 8 2 2,136
Cosmo 19 8 8 8 2,691
Herbal (B) 13 11 6 4 3,233
Stars (B) 62 24 19 9 10,673
Biological (B) 14 18 1 3 6,911
This means we are not observing related word pairs, we are in fact observing networks of four or more similar words.
(17-11-2019, 02:21 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Per-section correlation also tends to be low, but there are a couple of possibly interesting exceptions, e.g. chey/shey in Stars.
Indeed, the four most frequently used word types starting with 'ch' or 'sh' in the Stars section are <chedy>, <chey>,<shedy>, and <shey>. These four word types also form a network of related words (see You are not allowed to view links. Register or Login to view.).
Pearson's Correlation(chedy[501],shedy[426]): +0.84 (n=225,p=5E-60)
Pearson's Correlation(chey[344] ,shey[283]) : +0.67 (n=225,p=6E-31)
Pearson's Correlation(chedy[501],chey[344]): +0.71 (n=225,p=6E-36)
Pearson's Correlation(shedy[426],shey[283]): +0.70 (n=225,p=6E-34)
chey shey chedy shedy word count
----- ----- ------ ------ ----------
Herbal (A) 55 40 1 0 8,087
Pharma (A) 21 19 1 1 2,529
Astro 2 7 4 0 2,136
Cosmo 20 16 24 17 2,691
Herbal (B) 20 6 62 35 3,233
Stars (B) 123 84 190 113 10,673
Biological (B) 94 99 210 247 6,911
Networks of four similar word types are not untypical for the MS. See for instance You are not allowed to view links. Register or Login to view. as an example for such a network on a single page. These networks mean that beside the direct relation between two ch/sh words also indirect relations might exist. An indirect relation for <chedy>/<shedy> is for instance <chedy> - <chey> - <shey> - <shedy>. In the same way the word <chol> is also related to <chor> via <shol> - <shor>.
These indirect relations also help to understand the correlation coefficient of +0.93 for the word frequencies for all 612 ch/sh word pairs for the whole MS (see posts You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.).
Pearson's Correlation(count(chWords@VMS), count(shWords@VMS)) : +0.93 (n=612,p=4E-268)
Pearson's Correlation(count(chWords@folios),count(shWords@folios)): +0.55 (n=225*612=137700,p=too low to calculate)
Probably unsurprisingly, I too have an interest in the question of ch vs sh: but at the same time I have a deep mistrust of low-level interpretations that first require Voynichese 'words' to be literally words (i.e. that they must necessarily follow an explicitly language-like grammar) - this seems to me to be something we should be testing rather than assuming.
As such, I prefer to look (as do cryptologists in general) to contact tables between tokens, with the inevitable Voynichological proviso there being that we don't yet know what Voynichese glyphs comprise tokens. All the same, looking at token adjacency generally gives us larger (and as a consequence, probably more statistically reliable) instance counts to work with.
It's well known that chor (and shor to a large extent) are much more common in A pages than B pages:
- chor A = chor B x 4.82
- shor A = shor B x 2.44
Contrarywise, che and she are much more common in B pages than A pages:
- che B = che A x 3.97
- she B = she A x 4.90
What isn't so well known (I think) is that chd and shd are vastly more common in B pages:
- chd B = chd A x 20.74
- shd B = shd A x 23.14
Of the 726 chd in B pages, 504 are chdy: and of the 162 shd in B pages, 100 are shdy.
My conclusion is that even though ch and sh have broadly similar ratios in A and B...
- ch A = ch B x 2.82
- sh A = sh B x 2.18
...the ways that ch and sh touch following tokens seem to have widely different distributions in A and B, and I think this is almost entirely incompatible with the suggestion that ch and sh are essentially expressing the same thing.
PS: it's also not widely mentioned that shch and chsh are equally abhorred in both A and B (shch A = 6, chsh A = 4, shch B = 6, chsh B = 5), a strongly consistent behaviour that is surely next to impossible for any kind of edit distance 'algorithm' to yield.
*ducks under table waiting for the almost inevitable barrage of edit distance spam to pass*
(22-11-2019, 02:27 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.My conclusion is that even though ch and sh have broadly similar ratios in A and B...- ch A = ch B x 2.82
- sh A = sh B x 2.18
...the ways that ch and sh touch following tokens seem to have widely different distributions in A and B, and I think this is almost entirely incompatible with the suggestion that ch and sh are essentially expressing the same thing.
Hi Nick,
I am not sure I can follow your argument. But I agree that statistics for the character following the two benches are interesting. Here I collected percentages for bench+char patterns in A and B. K and T correspond to ckh and cth. For instance, the first four bars represent
% of A ch- words following the che- pattern
% of A sh- words following the she- pattern
% of B ch- words following the che- pattern
% of B sh- words following the she- pattern
[
attachment=3724]
ch- and sh- appear to behave rather consistently, with ch- having high values for A or B when sh- also has high values for the same "language". But the deviations between the first four bars appear to be large enough to be relevant: in both A and B, she- tends to be relatively more frequent than che-. The next four bars for bench+o are almost perfectly aligned between ch- and sh-, and all the other sequences are considerably rarer. I don't think this is what you meant, since this can be observed on the whole text and does not depend on A vs B. Anyway, it seems to be a significant difference in the behaviour of ch- vs sh-. It is likely that the relative preference for she- vs che- has already been mentioned in this thread, but following everything is becoming difficult.
The phenomenon can also be seen in the top histogram You are not allowed to view links.
Register or
Login to view.: she- words tend to have higher frequencies than the corresponding che- words. Also in word-types plots at the bottom of You are not allowed to view links.
Register or
Login to view.: Xhe- words tend to be above (higher she- counts) other word-types.
Hi Marco,
The difference is that my stats relate to all the places in A and B where ch/sh are followed by other tokens (e.g. EVA e / or / ol / ar / al, etc), not just at the very start of words.
You have to be very wary about drawing inferences about Voynich words, particularly line-initial and line-final words), as you know well.
The stats relating to chd and shd are quite significant, in my opinion.
Cheers, Nick
Pages: 1 2 3 4 5 6 7 8 9 10