ch and sh as dual-function elements

ch and sh as dual-function elements - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: ch and sh as dual-function elements (/thread-5734.html)

ch and sh as dual-function elements - Labyrinthinesecurity - 13-05-2026

Working on the recipe section (folios 103r–116r), we found that ch and sh appear to carry two independent signals simultaneously.
The first is paragraph-level structure. A logistic regression using ch/sh-based features predicts tail vs no-tail paragraphs at balanced accuracy 86%, confirmed by within-folio permutation testing.

The second is folio-level language. The ratio of (ch sh)o-type words predicts which Currier language a folio belongs to with ~99% accuracy manuscriptwide, even though all recipe folios are Currier B. The cho/che balance appears to encode a stable folio-level property operating above the paragraph.

The key thing is that these two signals are orthogonal.

RE: ch and sh as dual-function elements - tavie - 13-05-2026

I'm not sure why this is in News. It's not an article. Surely Analysis of the Text.

And can you not provide more detail about what you are arguing?

RE: ch and sh as dual-function elements - oshfdk - 13-05-2026

I've noticed a pattern with your posts in that you appear to identify some property using statistical or ML algorithms and then confirm the presence or absence of this property via permutation testing. I'm not sure these are valid results. Given a purely random sequence one can identify many features that are just the properties of this particular random sequence. When the sequence is shuffled, the features are gone, but this doesn't mean there was any significance in these features in the first place.

A quick practical example, I have a python script that prints a random string of 50 lowercase characters and then prints a permutation of the same string.

I ran it and the first resulting string (the Original) was:

ehzcjusashtecbfuzjsrjgnyzoqqpeyvfnhzxbechhffwsddvp

Ok, I see "a feature" - there are three instances of double characters here clustered at the second half of the string, two of them near the very end. I can try to "confirm", that this is "a real feature", by looking at the second string (same letter, different order):

tazfscebzpogehufvwfnhyehnjurchzzbqvfqdcjsephdsxyjs

Now, there is only one instance of a double character here, a decrease of 66.6%! Interesting? Not really, because the fact is these are just two random strings made of the same letters in a completely random fashion.

The Voynich Manuscript is a fixed artifact with a certain sequence of characters. There may be reason for some of the properties of this sequence and there is definitely no reason at all for some other properties of this sequence. If a certain property is lost after shuffling the data, this by itself is not a proof of anything.

RE: ch and sh as dual-function elements - Bluetoes101 - 13-05-2026

There's a fair few "not news" news posts, they should be moved to "analysis of the text" imo.
Not that they are bad posts, just not "news".

RE: ch and sh as dual-function elements - Koen G - 14-05-2026

Thread moved.

RE: ch and sh as dual-function elements - Labyrinthinesecurity - 14-05-2026

(13-05-2026, 10:45 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I've noticed a pattern with your posts in that you appear to identify some property using statistical or ML algorithms and then confirm the presence or absence of this property via permutation testing. I'm not sure these are valid results. Given a purely random sequence one can identify many features that are just the properties of this particular random sequence. When the sequence is shuffled, the features are gone, but this doesn't mean there was any significance in these features in the first place.

A quick practical example, I have a python script that prints a random string of 50 lowercase characters and then prints a permutation of the same string.

I ran it and the first resulting string (the Original) was:

ehzcjusashtecbfuzjsrjgnyzoqqpeyvfnhzxbechhffwsddvp

Ok, I see "a feature" - there are three instances of double characters here clustered at the second half of the string, two of them near the very end. I can try to "confirm", that this is "a real feature", by looking at the second string (same letter, different order):

tazfscebzpogehufvwfnhyehnjurchzzbqvfqdcjsephdsxyjs

Now, there is only one instance of a double character here, a decrease of 66.6%! Interesting? Not really, because the fact is these are just two random strings made of the same letters in a completely random fashion.

The Voynich Manuscript is a fixed artifact with a certain sequence of characters. There may be reason for some of the properties of this sequence and there is definitely no reason at all for some other properties of this sequence. If a certain property is lost after shuffling the data, this by itself is not a proof of anything.

The tail/no-tail distinction is not something extracted from the text itself. It is an independently observable physical property of the manuscript, otherwise of course it would be fatal. So I think it makes the permutation results not circular, unlike maybe many similar claims on the manuscript?

RE: ch and sh as dual-function elements - oshfdk - 14-05-2026

(14-05-2026, 12:28 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.The tail/no-tail distinction is not something extracted from the text itself. It is an independently observable physical property of the manuscript, otherwise of course it would be fatal. So I think it makes the permutation results not circular, unlike maybe many similar claims on the manuscript?

I'm not sure why this makes any difference, the text itself is also an observable physical property of the manuscript.

Let me try another example. Suppose we discover that the presence or absence of the tail on a star highly correlates with the sixth character of the corresponding paragraph. We shuffle the stars and paragraphs and this correspondence is gone. Does this mean anything?

For starters, if we assume purely random sequence of tails and paragraphs, given that there are hundreds of character positions in each paragraph there ought to be some of them which would appear to correlate more with the tails and some which would appear to correlate less, purely by chance. The question of how likely some statistical correspondence to appear by chance depends not only on the properties intrinsic to this correspondence, but also on the number of different statistical observations we make in total. With modern computational assets and machine learning algorithms it's possible to run analysis on billions of statistics and uncover a lot of seemingly interdependent features that do not correspond to any actual cause and effect. Proving that the result is not spurious just with shuffling the data is not possible in this case, I think.