The Voynich Ninja

Full Version: f and p appear predominantly in the first lines of paragraphs
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
You already know ad nauseam my three main opinions on this, Jorge, so I'll just put it for anyone who didn't see the gallows segment I did at Voynich Day:

1.  /k/ and /t/ do not always behave alike.  In the Stars section ("Scribe 3"), while /k/ underperforms extremely in top lines relative to lower lines, /t/ seems to overperform.  It acts more akin to /p/ and /f/ there, rather than /k/.  So I'd be cautious about merging the glyphs into a binary distinction of the /k/ and /t/ group vs  the /p/ and /f/ group.  

2. There's no easy substitution between the embellished and non-embellished the gallows when you look at the rest of the word.  In Stars, /pch/ is an extremely common gallows cluster in the headline, while /ke/ - in particular /kee/ - is an extremely common cluster in the lower lines that is greatly underrepresented in the top lines.  If we say /p/ is an embellishment of /k/, where are the /pe/ words?  If we say /p/ is an embellished ligature of /ke/, then that would make /pch/ the embellished equivalent of /kech/, which is not common in the lower lines.  Etc etc. 

(NB - the gallows mismatch is less of a problem for Scribe 1 although only in Herbal A, where /kch/ is more common in the lower lines than /ke/ anyway.).

3.  I can imagine there might be some difference in word choice in the top lines and that this would cause some degree of distortion in the stats.  But I struggle to believe it could be a decisive explanation, given the size of  the /p/ dominance, and the behaviour of /ch/ suddenly switching from being word-initial to mainly word-middle, and the increase in /sh/, etc.  There could be a language change, e.g. like a herbal might start with Latin names and then move onto describing the herb in the vernacular, but we see this across the different sections, and we also have to bear in mind similarly distinct behaviour at Line Start (including /ch/ becoming word-middle there as well).
(17-01-2026, 07:14 PM)tavie Wrote: You are not allowed to view links. Register or Login to view./k/ and /t/ do not always behave alike.

They are not completely alike, but they are quite similar.  So lumping them into a single "tike" meta-symbol is a reasonable option when doing character-level statistics.

In fact, when doing this sort of study, one should begin by "projecting" the data down to the smallest "space" where some insight can be obtained.  If one tries to work with all the variables at once, one easily gets a big pile of numbers that one cannot digest.  There will be seven hundred and fifty anomalies and asymmetries in the numbers, all demanding attention.

I am well aware of the pe x te puzzle, and the hints that p is somehow related to k and f is related to t, rather than the other way around as one would expect from the loops.  But by collapsing them into puffs and kites, and ignoring the e modifiers, we can clearly see some things that we could not see if we tabulated all 14 gallows types separately. 

For instance, from that table it is immediately obvious that the relative frequency of tikes (%ct) in the body lines is remarkably constant across all sections, independently of the "language" and "hand".  That is a hint (not proof, of course) that tikes have a phonetic value whose frequency in parags text is independent of the topic.  Is the k/t ratio constant too?  That is a separate question that can be investigated separately.

That table also shows that the frequency of puffs (%cp) in body lines varies by almost a factor of 3 between sections, just among the larger ones.  That is a hint (again, not proof) that the use of puffs is strongly dependent on the topic.  It would be compatible with the theory that puffs are embellished versions of tikes and/or other glyphs.  Like capitals in modern English, outside of headlines and sentence start: they are used for proper names, hence some texts may have many, other texts may have none.

That table says nothing about the te x pe puzzle; but, again, that is a plus.  That puzzle can be investigated separately, by statistical experiments focused on it.

In fact, there was this old suggestion that the hooks at the end of the horizontal arms of the puffs may be their missing e modifier suffixes.  EVA does not record this distinction; I don't recall whether this decision was much discussed, much less justified.  So I recently spent several hours going through all the puffs in my transcription file, checking them against the BL images, and changing p/f to w/z when they had hooks.  (I suppose that this data is already available somewhere, maybe in the GC transcription; but I decided to do it myself, for various other reasons.) 

I should now do some statistics on this data to try to confirm or disprove that old idea.  But I have been postponing that project because I am afraid that I will be sorely disappointed.  While doing that re-coding, I felt that the hooked/straight distinction was not well defined, not even to the level of the a/o distinction.  And that the hooks seemed to be present simply where there was space for them.  Thus I am afraid that they are indeed just meaningless calligraphic variation...

But there is another way to investigate that "hook is e" theory: it is to lump {t, k, te, ke} into a single "e-tike" meta-symbol, and p and f into a single "puff" meta-symbol, and see whether the two meta-symbols have the same next-glyph statistics.  Preferably after deleting all {a o y} and collapsing {r s} into one...

All the best, --stolfi
I withdraw the post.
(17-01-2026, 07:14 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.I can imagine there might be some difference in word choice in the top lines and that this would cause some degree of distortion in the stats.  But I struggle to believe it could be a decisive explanation, given the size of  the /p/ dominance.


The digraph "th" is very common in English because it occurs in the very common words "the", "this", "that", "there", "their", "them", "thus", "then", "than", etc.  But these words rarely occur at the beginning of a herbal entry.  Therefore I bet that the frequency of "th" in the head lines of English herbals is dramatically lower than its frequency in other lines.

Being shorter than average, those words will also be less frequent at the start of a line than elsewhere in the line; and more common among the last 2-3 words of a line.  Thus I expect that the frequency of "th" will have a similar profile along the line.  No matter how wide or narrow the pages are.

(These bets assume that the text is formatted with the trivial line-breaking algorithm, without hyphenation, as in the VMS.  They may not hold for high-quality typeset text where line breaking and hyphenation take grammar and semantics into consideration.)

All the best, --stolfi
(17-01-2026, 08:45 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Is the k/t ratio constant too?

Not constant: You are not allowed to view links. Register or Login to view.

Note: if k/t was constant (k-t)/(k+t) would be constant too.
(17-01-2026, 08:51 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.And so ‘Section A vs. Section B’ makes sense, too, because they are not ‘different scripts’, but a slightly modified ‘set of parameters’.

Check You are not allowed to view links. Register or Login to view..  I will be surprised if there are two natural languages that can produce a plot like the second one simply by deleting and identifying letters...

All the best, --stolfi
(17-01-2026, 09:06 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.[the k/t ratio is] not constant: You are not allowed to view links. Register or Login to view.

Nice plot! However the count k+t of tikes per page is small, so that ratio per page is expected to have a large sampling error.  And indeed from the plot it seems that the pages in each section vary randomly from -0.75 to +0.75, but the ratio per section varies a lot less.  Roughly the tikes are split evenly between k and t, but Herbal has  a bit fewer k than t, while Stars has a bit more...

All the best, --stolfi
Pages: 1 2