The Voynich Ninja

Inspire by Patrick's recent presentation I thought I would look at the transition probabilities of End--End pairs. These are the likelihoods (as a fraction of 1) that a word with a given ending will be followed by a word with another given ending. (I'm surprised nobody has done this before, so if they have, please do say.)

All numbers are from a selection of running text in Currier B. Only those word endings which occur at least 250 times have been counted, and results only show relationships at 0.05 or higher. Also, [in] was processed as a single glyph. The total likelihood for all the results given is in parentheses at the end.

[d]: [y] .46, [in] .17, [l] .14, [r] .12 (.89)

[l]: [y] .47, [in] .15, [l] .15, [r] .13 (.90)
[m]: [y] .31, [in] .25, [r] .14, [l] .13, [o] .05 (.88)
[in]: [y] .46, [l] .16, [in] .15, [r] .12 (.89)
[o]: [y] .32, [in] .19, [l] .18, [r] .17, [o] .05 (.91)
[r]: [y] .39, [in] .18, [r] .17, [l] .14 (.88)
[s]: [in] .33, [y] .27, [r] .16, [l] .13, [s] .06 (.95)
[y]: [y] .49, [in] .17, [l] .14, [r] .12 (.92)

My initial thoughts are that [d, l, in, y] are all very similar. [r] is a bit lower on [y] but not hugely different. But [m, o, s] are all quite variant. These are quite low counts (along with [d]), so it could be that there's simply a lot of spikiness in the data. Hard to tell Wink

.

Breaking the data down by bigrams for the first feature shows no big difference: [ol] and [al] are similar to [l], [or] and [ar] are similar to [r], [ain] and [iin] are similar to [in]

But the differences between [ey] and [edy] are worth breaking down, both as the first and second feature. Each occurs thousands of times: about 2,300 and 3,500, respectively. [$y] stands for some other word ending in [y], including [dy] not preceded by [e].

[edy]: [edy] .29, [in] .14, [$y] .14, [l] .13, [ey] .11, [r] .10
[ey]: [in] .20, [ey] .19, [edy] .15, [l] .14, [r] .12, [$y] .12

[d]: [edy] .21, [in] .17, [$y] .15, [l] .14, [r] .12, [ey] .10
[l]: [edy] .23, [in] .15, [l] .15, [$y] .14, [r] .13, [ey] .10

[m]: [in] .25, [r] .14, [l] .13, [edy] .11, [ey] .10, [$y] .10, [o] .05
[in]: [$y] .19, [l] .16, [in] .15, [edy] .14, [ey] .13, [r] .12

[o]: [in] .19, [l] .18, [r] .17, [edy] .12, [$y] .11, [ey] .09, [o] .05

[r]: [in] .18, [r] .17, [$y] .16, [l] .14, [edy] .13, [ey] .10
[s]: [in] .33, [r] .16, [l] .13, [$y] .11, [edy] .09, [ey] .07, [s] .06

This data looks a bit messy, but a few things can be seen:

[edy] clearly likes to cluster --- I think we already knew this.
[l] and [d] have a higher preference for [edy] than others.
[ey] also has a high preference for itself
[in] and [m] also have a preference for [ey] over [edy] (taking into account the number of tokens)
[in] (however) clearly has a preference for words ending [y] which are neither [edy] or [ey] (apparently [ky, ckhy] are big chunk of the difference)

(09-08-2024, 11:24 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Inspire by Patrick's recent presentation I thought I would look at the transition probabilities of End--End pairs. These are the likelihoods (as a fraction of 1) that a word with a given ending will be followed by a word with another given ending. (I'm surprised nobody has done this before, so if they have, please do say.)

Hi Emma,
as Torsten mentioned in another thread, You are not allowed to view links. Register or Login to view. did something roughly similar in 2003 (Table 3). But his data show deviation from the expected, and he processed the whole text, so results are not immediately comparable.

I think Tavi's choice of picking three datasets that are uniform for illustrations and scribe is interesting. It could be worth checking if there are major differences between Q13/Balneo.Scribe3 and Q20/Stars.Scribe2 (I guess there likely are).

(09-08-2024, 03:12 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.
(09-08-2024, 11:24 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Inspire by Patrick's recent presentation I thought I would look at the transition probabilities of End--End pairs. These are the likelihoods (as a fraction of 1) that a word with a given ending will be followed by a word with another given ending. (I'm surprised nobody has done this before, so if they have, please do say.)

Hi Emma,

as Torsten mentioned in another thread, You are not allowed to view links. Register or Login to view. did something roughly similar in 2003 (Table 3). But his data show deviation from the expected, and he processed the whole text, so results are not immediately comparable.

Ah, that's no surprise. He even splits into [ey] and [dy]. Suppose this is a wellworn groove!

Though Sazonov's table is very valuable, I think this area offers great opportunities for further research and new findings.

Yes, this is good stuff and also some comparative results could be informative.

In acordance with one of Sazonovs' conclusions, a poem like "La Divina Commedia" , should show some oddities.
You are not allowed to view links. Register or Login to view.

Some sample SelfCitation text by T.Timm, would that show a similar pattern ?
You are not allowed to view links. Register or Login to view.

Some sample text encoded with DonaldFisks' syllable cipher, or René Zandbergens', "NOT the solution to the Voynich MS" method
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

The last item especially in that both these methods spread one 'real' word over several vords and some letter-transition dependency is part of the system,
indeed if such a sytem was used then the dependencies should spread over more than 2 vords.

Emma May Smith

MarcoP

Emma May Smith

MarcoP

RobGea