The Voynich Ninja

Pages: 1 2

Hello, I have a very quick question:

Is it already known that words containing [sh], and specifically those beginning [sh], are more common in the first line of a paragraph?

The presence of [f] and [p] has been known for a while in this location, but am uncertain if [sh] had been mentioned before. Can anybody recall?

(Also, seemingly, words ending [chdy], but I'm less certain on this.)

Hi Emma,
the phenomenon was discussed by Patrick in his 2022 Malta paper You are not allowed to view links. Register or Login to view. (Figure 3).

Patrick Feaster Wrote:Words beginning with [Sh] turn out to be distinctly more prevalent in the first lines of paragraphs than words beginning with [ch], even though they’re consistently less prevalent in lower lines

Thank you

I'm not surprised it has already been found, but glad that it's real.

I don't know if it was covered before Patrick but my assumption was that it probably would have been at some point.

I have work at the moment that I'm hoping to present for Voynich Day involving glyph behaviour at various positions in the line and paragraph with relation to both their position in the word and the scribe. sh in the top row is one of them, although it's not always demonstrably more frequent for every section e.g. the effect for initial sh is much smaller in the balneological section and seems indistinguishable from random chance. Not fully written up yet, and I'm trying to work through a few complications, but if you're also looking into this, happy to share informal observations.

Hi Emma,
I ran a script I put together in 2021, when I first learned of Patrick's research. These are plots show frequencies according to vertical and horizontal paragraph position. Figures are expressed as % of words containing a feature. I considered all paragraph text, ignoring uncertain spaces.

[attachment=8314]
Word-initial 'sh' (comparable with blue in figure 3 of Patrick's paper)

[attachment=8316]
'sh' in any word position

[attachment=8315]
'chdy' in any word position (~95% of occurrences are word final):

I would say that the plot above confirms that chdy is more frequent in the first line of paragraphs (though of course numbers are about one order of magnitude smaller than those for 'sh'). About 10% of the occurrences are of the pchdy type, and these of course concentrate in the first line. But also excluding 'pchdy', a preference for the first line is still there:

[attachment=8313]
plot for pchdy

[attachment=8312]
Xchdy excluding pchdy;

[attachment=8318]
word initial 'chdy' (basically, the word 'chdy')

[attachment=8317]
sum of the previous two (i.e. all occurrences of chdy, excluding pchdy)

It's interesting that the isolated peak at the middle of the last row appears to be due to the word 'chdy' specifically. Contrary to all other 'chdy' patterns, the stand-alone word chdy does not seem to show a preference for the first line.

As always, it's possible I made errors, but at least the plot for initial 'sh' appears to be consistent with Patrick's plot.

(20-03-2024, 01:24 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.I don't know if it was covered before Patrick but my assumption was that it probably would have been at some point.

I have work at the moment that I'm hoping to present for Voynich Day involving glyph behaviour at various positions in the line and paragraph with relation to both their position in the word and the scribe. sh in the top row is one of them, although it's not always demonstrably more frequent for every section e.g. the effect for initial sh is much smaller in the balneological section and seems indistinguishable from random chance. Not fully written up yet, and I'm trying to work through a few complications, but if you're also looking into this, happy to share informal observations.

That's good to know. I'm also gearing up to present something (if I'm selected), but it was much broader in scope. Now that I'm aware of what your talk might cover, I'll adjust accordingly and try not to overlap.

(22-03-2024, 08:51 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view....

I would say that the plot above confirms that chdy is more frequent in the first line of paragraphs (though of course numbers are about one order of magnitude smaller than those for 'sh'). About 10% of the occurrences are of the pchdy type, and these of course concentrate in the first line. But also excluding 'pchdy', a preference for the first line is still there:

...

When you say that [chdy] is an order of magnitude less than for [sh], are we talking about just raw token counts? Not an order or magnitude less in terms of percentage? I assume the former and not the latter, but I just want to check.

Hi Emma,
these percentages are computed on word tokens. I counted 537 tokens for 'chdy' vs 4417 for 'sh'. Since the percentage is computed by dividing by the total number of tokens (35131), the ratio between counts and percentages is the same: 1.5% for 'chdy' tokens vs 12.5% for 'sh'.

EDIT: to make the plots clearer, I should probably multiply the frequencies by 100 making them actual percentages. 0.03 in the last plot corresponds to 3%.

Here's a display of relative frequency of words containing [chdy] I made using the code at You are not allowed to view links. Register or Login to view., with top and bottom lines and first, second, and last words of lines separated out. Brighter = more prevalent. This complements Marco's plots for [chdy].

[attachment=8338]

Somehow I got sucked into this question. After a 'quick' shadow-calculation of results from the thread, I had the impression that the distribution of sh-initial words in paragraphs is qualitatively different than that of chdy-not-preceded-by-p. Namely, the former favors the literal first line of the paragraph (like p itself), while the latter shows a general upward trend. Now I am not so sure, but here is another set of plots.

The issue can be muddled by the preponderance of short paragraphs. Of all paragraphs tagged as such in IT2a-n.txt, about half are less than 5 lines long. Let us consider those that are 5 lines or longer.

In the maps below, the coordinates are line number (rows) and ordinal EVA character (columns); 355 paragraphs have been aligned and stacked together. On the left panel, each cell represents the number of sh-initial words beginning at that point. This absolute measure fades to the right and down with the fading population of longer lines and paragraphs. Dividing by the total number of words that begin at each cell gives the fractional map on the right. This density is flat-ish across the paragraph, but necessarily shows greater statistical scatter at the margins:

You are not allowed to view links. Register or Login to view.

The greater concentration of sh-initial words in the first line specifically is visible.

A discrete break is not visible in the (noisier) density of chdy-not-preceded-by-p:

You are not allowed to view links. Register or Login to view.

In the upper portion, where all paragraphs are represented, there is a plausible concentration toward the middle of the line, as seen in earlier replies.

Comparing cumulative fractions in successive lines is suggestive...

You are not allowed to view links. Register or Login to view.

...and deceptive, thanks to the much smaller population of chdy. Here, again, a statistical test might be necessary.

Finally, the same data in yet another implementation of R/D coordinates:

You are not allowed to view links. Register or Login to view.

All together a fair illustration of how R/D coordinates can reveal some patterns, but literally blur line-specific ones.

Hopefully any remaining errors and inconsistencies are negligible.

Pages: 1 2

Emma May Smith

MarcoP

Emma May Smith

tavie

MarcoP

Emma May Smith

Emma May Smith

MarcoP

pfeaster

obelus