The Voynich Ninja

Full Version: Relations among pattern studies?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
(11-08-2024, 03:19 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Even if there was a single scribe, they would need to (almost) unfailingly agree that spaces must be inserted in certain place, while also allowing a high level of optionality in other places. I feel that ascribing it all to esthetic choice is the least reasonable option. It leaves too much potential insight on the the table.

In my view, the space between words works to some extent like a glyph. The sequence of glyphs is highly predictable, and similarly, it is predictable where a space can appear. In essence, the space behaves no differently from the glyphs themselves.
(11-08-2024, 03:19 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I feel that ascribing it all to esthetic choice is the least reasonable option. It leaves too much potential insight on the the table.

That said, the main point I wanted to make is that raw token counts are---to a some degree---poor supporting evidence for any theory built around glyphs transition probabilities. If those counts are dependent on spaces and spaces dependent on esthetic sensibility, then we would need that sensibility to be itself statistical robust. Of course, that would not be scientific.

To be clear, I don't mean to ascribe any of this to aesthetic sensibilities.

The hypothesis I'm suggesting is only that "weak" breakpoints reflect a real systemic ambiguity, in each case with a calculable n% probability of being resolved one way or the other.  I believe the observation that [r] followed by [a] has an n% probability of spacing is as statistically defensible -- as far as it goes -- as the observation that [r] has an n% probability of being followed by [a] in the first place.  And generating an unbroken stream of glyphs randomly based on the latter set of probabilities, and then splitting it into pieces randomly based on the former set of probabilities, strikes me as a valid way of simulating the results of those two hypothetical processes interacting.

That said, I agree that the resulting word token counts are more vulnerable to error than other kinds of evidence might be, because they depend on not just one hypothesis, but on two at once -- one about glyph sequencing, another about spacing -- either of which could obviously be wrong even if the other were to be right.
(11-08-2024, 04:26 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.
(11-08-2024, 03:19 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I feel that ascribing it all to esthetic choice is the least reasonable option. It leaves too much potential insight on the the table.



To be clear, I don't mean to ascribe any of this to aesthetic sensibilities.

No worries. That was my reply to Torsten. I think we got crossposted by replying in the thread at the same time.
(10-08-2024, 03:26 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.For the stars section it is even possible to point to pages dominated by vords colored in plum (see You are not allowed to view links. Register or Login to view.) whereas the very next page is dominated by 'yellow' vords. Even another page within the very same section contains an unusual high number of vords colored in green (see You are not allowed to view links. Register or Login to view.).  Therefore I would suggest that you start by building a transition table for the stars section.

I've generated a separate set of transitional probability matrices for Quire 20 as a whole (and also Quire 13 for comparison), but since you've brought up the issue of single anomalous pages, I also generated matrices for f111r (lots of "plum") and f111v (lots of "yellow"); for f108r and f108v (the other two pages in the same innermost bifolio of Q20); and for that f108+f111 bifolio as a whole.

It's a lot of data, and I'm not sure how best to go about sharing it without uploading more tables than I suspect anyone would want to look at.  But here's a look at how variation in just one pair of matrices plays out: the matrices for [k] and [t].  This example isn't associated specifically with Torsten's [ed] (plum), [ho] (green), or [in] (yellow), but it features one anomaly that I believe may fall into the same general category as the ones he highlighted.

With both [k] and [t], the main difference of Currier B relative to Currier A is that the probability of a transition to [ch] or [o] sharply decreases, while the probability of a transition to [e+] or [a] simultaneously rises.  With [k], a transition to [e+] was already the highest-ranked option in Currier A, but it becomes even more probable in Currier B.  With [t], the transition to [e+] instead ends up rising from fourth place up to first place.  But overall, the pattern of [>ch] and [>o] going down while [>e+] and [>a] go up is consistent across [k] and [t].  Beyond this, [>y] also goes down, but not by as much.

[attachment=8992]

If we now compare Q20 and Q13, we can see the range of variation within a couple major subsets of Currier B.  They both individually show the same patterns that distinguish Currier B from Currier A, with [>e+] and [>a] at the top and [>ch], [>o], and [>y] ranked decisively below them.  But there are also a few differences -- for example, in Q13, [>y] ranks above [>ch], but in Q20, [>ch] ranks above [>y].

[attachment=8993]

Now let's take a look at our specific target bifolio, starting with [k].

[attachment=8994]

As we've seen, the overall shift from Currier A to Currier B is marked by a drop in the probability of [>ch], [>y], and [>o], but this particular bifolio shows their probabilities dropping yet further: on every one of the four pages, each of these three transitions has a lower probability than it does in Q20 overall.  As far as the telltale ratio of [>e+] and [>a] to [>ch], [>y], and [>o] goes, the bifolio presents an unusually extreme case of the general Currier B tendency.

Q20: [>e+] or [>a] = 80.29%  [>ch] or [>y] or [>o] = 16.63%
f108+f111: [>e+] or [>a] = 92.5%  [>ch] or [>y] or [>o] = 5.35%
f108r: [>e+] or [>a] = 90.65%  [>ch] or [>y] or [>o] = 8.88%
f108v: [>e+] or [>a] = 96.24%  [>ch] or [>y] or [>o] = 3.29%
f111r: [>e+] or [>a] = 92.48%  [>ch] or [>y] or [>o] = 4.43%
f111v: [>e+] or [>a] = 90.37%  [>ch] or [>y] or [>o] = 4.81%

But we find a striking anomaly when we examine the probabilities of [>e+] and [>a] relative to each other.  In the bifolio as a whole, [>e+] is nearly one and a half times as probable as it is in Q20 overall, while [>a] is only about three quarters as probable.  On f108r, f108v, and f111r, the shift is even more extreme: [>e+] gets higher and [>a] gets lower.  But then You are not allowed to view links. Register or Login to view. reverses course, with [>a] becoming more probable there than [>e+].

The same thing happens with [t].

[attachment=8995]

So You are not allowed to view links. Register or Login to view. stands out as peculiar for [k>a] and [t>a] being more probable there than [k>e+] and [t>e+].  But this is, curiously, an anomaly that's still consistent with the main shift that differentiates Currier B from Currier A: namely, a boost in the probabilities of both [>e+] and [>a] beyond their average Currier A levels.  It's just that on You are not allowed to view links. Register or Login to view. most of the boost has gone to [>a] for a change.

So that's a sample of what a more "localized" analysis of transitional probability matrices might look like, for what it's worth.  I don't know if it has any advantages over other types of analysis.  If there's any interest, I could go over the matrices for some other glyphs.

(10-08-2024, 03:26 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Note: The idea of using state transitions was already published by Donald Fisk back in 2017. (see You are not allowed to view links. Register or Login to view. 2017) For more details, see his You are not allowed to view links. Register or Login to view. and the discussion linked You are not allowed to view links. Register or Login to view..

Right, although in his case I believe he looked at transitions within words (with "start" and "end" points) but not across word breaks.  From his introduction: "One of the most surprising things about the Voynich Manuscript is that the glyphs within words follow a clearly discernible grammar, yet the words in sentences don't."
I know that glyph similarity calculations have been done before, but I thought I would work them up based on transition probabilities. The score is total of absolute differences between members of a pair in the transitions to different glyphs. So if the probability of transition to [o] is .13 for [sh] but 0.17 for [ch], then .04 is added to the difference. The minimum difference is 0 for identical glyphs, and 2 for glyphs which have no overlapping transition probabilities.



In the chart below, [K, T, F, P] are bench gallows, [C, S] are benches, and [e, E, H, i, I, J] are one two and three [e] or [i]. Spaces were totally eliminated so aren't included in the plot or the transition statistics. (Also, the plot is symmetrical as there's no ordering information for the pairs, just similarity.)



[attachment=8996]

The most interesting thing which jumps out at me is that [m]---which almost always finishes a word---is highly similar to [e]---which almost never finishes a word. The odd linestart statistics for [s], [d] and [y] make the line end look like a middle word position. Otherwise, I think the similarity is driven mostly by a few core relationships, such as the high likelihood of a following [y].

(I know it's not a great graph, but it's the one I have.)
About Patrick's presentation, it's making me wonder the same thing that usually crosses my mind whenever we enter the realm of steganography, nulls, verbose ciphers etc. In other words, whenever the true information has to be filtered out from the surrounding non-information. And that is the following:

If we remove all non-information loops, what are we left with? What does the actual information look like? Where is the actual information in Voynichese? If we isolate it, does it look "normal"?

And if the isolated hypothetical information does not look "normal", then what kind of system are we looking at? 

And if we don't know what kind of system we're looking at, then how can we know that the parts we just trimmed were non-informative in the first place?

For example, if I give you the low-entropy sequence "100 101 111 111 111", you don't really know which system I am using. Am I trying to write binary? A large number with space as separator? Something else entirely? 

I guess what I'm trying to say is, that as long as we don't know how information is encoded in any hypothetical meaningful part, we also cannot say whether low-entropy parts are meaningless. As long as we don't know the system, we cannot say what is a part of it and what isn't.
Exactly right. We cannot say, because we do not know. So, how can we find out? Our source of information can only be from the VMs and from the world of its creation. And that world is middle European and chronologically correspondent with the C-14 dating of the VMs parchment.

The VMs steganographic system is composed of circular bands of text set off by different types of Stolfi's markers. Apparently this methodology is considered too obvious to be taken seriously. That, in itself, is a most excellent bit of trickery.

Why take Stolfi's markers seriously? Because, by the artist's illustration on VMs White Aries, there is a marker that shares a common side with a tub that has a blue-striped pattern.

Paired patterns of bendy argent et azure and red (and white) galeros can be seen, if the relevant history is known. Versus nothing, if it's not. Structure repeatedly reaffirms the illustration's intended identifications. And these papal identifications reaffirm the significance of the markers and - ostensibly - the importance of the associated, circular bands of text.
(12-08-2024, 08:46 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.About Patrick's presentation, it's making me wonder the same thing that usually crosses my mind whenever we enter the realm of steganography, nulls, verbose ciphers etc. In other words, whenever the true information has to be filtered out from the surrounding non-information. And that is the following:

When exploring steganography, Jürgen Hermes' paper about the Polygraphia III by Johannes Trithemius is worth reading. Trithemius (1462–1516) devised a cipher where words are used to encode plaintext letters [see You are not allowed to view links. Register or Login to view.]. In theory, you could also track the frequency of specific glyphs within a given amount of text. For instance it could be necessary to count the number of curved glyphs between two gallows, or all instances of a glyph like EVA-[i] within a line. However such methods would require an enormous effort to encode a relatively short message [see You are not allowed to view links. Register or Login to view.].
Pages: 1 2 3