The Voynich Ninja
Discussion of "The Voynich Manuscript: Symbol roles revisited" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Discussion of "The Voynich Manuscript: Symbol roles revisited" (/thread-3507.html)



Discussion of "The Voynich Manuscript: Symbol roles revisited" - RenegadeHealer - 15-03-2021

Matlach, Vladimír & Janečková, Barbora & Dostál, Daniel (2020). "The Voynich Manuscript: Symbol roles revisited." PREPRINT (version 1.3, 20 September — 27 December 2020; version update 12 March 2021). You are not allowed to view links. Register or Login to view. retrieved 13 March 2021.

I'm always in awe of new quantitative approaches to understanding the structure of Voynichese. Statistics and quantitative textual analysis are probably not the only tools needed to solve this mystery, and may not even prove to be the most important ones. Even so, I have no doubt that these tools, wielded correctly, can yield a lot of valuable clues. Mathematics is not my strength, and so I found the math used by Matlach et. al. hard to wrap my head around. I'm not really in a position to judge whether their selection and implementation of statistical tools was ideal, and am curious to hear what experienced textual analysts here have to say on this matter.

On a positive note, the authors of this paper ask an ambitious and potentially helpful question: In the VMs text, what patterns of glyph occurrence should we expect to find, and how does this compare to what we do find? This question is ambitious because in such a unique and unprecedented text, "what we should expect to find" is far from clear or agreed upon. The authors compute metrics for specimens of natural language plaintexts and enciphered texts, as well as Torsten Timm's attempt at reverse engineering a meaningless VMs-like text. They demonstrate that in natural language texts, glyphs cannot be expected to recur at regular, mathematically predictable intervals. But they do exactly this in both the original VMs and Timm's synthetic VMs. The idea of the VMs being a natural language plaintext has never looked more untenable. None of this is new or earth-shattering.

The authors' goal is to identify Voynichese glyphs that might potentially be ligatures of other glyphs. They first demonstrate the viability of their ligature-finding tool on natural language samples. From what I gather, a large part of this involves comparing the observed occurrence of strings of 2~3 glyphs in a text, to the expected occurrence of that same string in a randomly generated pattern made of the same glyph set. (Please correct me on this if I misunderstand.) Thus, their null hypothesis appears to be "The VMs's text is meaningless". I think this is a wise starting point.

I'm happy to see that Matlach et al. have read, understood, and taken seriously the challenge to a meaningful VMs text put forth by Timm and Schinner. What would have been even more interesting, though, is if they had used the output of T&S's algorithm as a control when computing ngram frequency, rather than a more vaguely defined "random chance occurrence of glyphs". Because one of Timm's major points is that the arrangement of glyphs in the VMs is not random, but that doesn't necessarily mean it's meaningful. Matlach et al. do demonstrate that T&S's synthetic VMs and the original VMs are two distinct texts. But they have not falsified the null hypothesis. They have only demonstrated that Timm and Schinner's exact algorithm, as published in its current version, produces an output with statistically significant differences from the original VMs. It could be that T&S's algorithm is on the right track, but just needs some tweaking. Regardless, Matlach et al. have made a satisfactory case for a meaningful VMs text still being possible, at least for now.

A few things about these authors' hunt for ligatures make me worry. For one thing, the thorny problem of EVA [f] and EVA [p] strongly preferring the first lines of paragraphs goes unaddressed. If these two glyphs are ligatures, and are mostly confined to first lines, it seems logical that their component ngrams should occur, non-ligated, fairly regularly everywhere but first lines and labels. This is not what we find, though. The proposed components of EVA [f] and EVA [p] (EVA [id] and EVA [qd], respectively) do not occur anywhere in the text.

Speaking of which, the starring role played by EVA [i] in these authors' ligature formations is odd to me, in light of the mounting evidence from other researchers that EVA [i] is probably not an independent glyph.

Finally, the authors seem to get a bit subjective and arbitrary — "greedy" as the authors phrase it — as to which component glyphs are favored for each ligature candidate. I'd be willing to believe that that EVA [n] indeed is "a ligature of [i] + space", in other words [n] is simply the way [i] is written at the end of a vord. But I don't yet see a good reason to favor EVA [m] being a ligature of EVA [i] + [d], as opposed to, say, [i]+ [l].

I was hoping the authors would conclude their experiment by taking a reliable EVA transcription (why Takahashi?), substituting ligature candidates with their suspected component glyphs, and then performing statistical analyses of these substituted texts. I hope this is a part of their future publications.


RE: Discussion of "The Voynich Manuscript: Symbol roles revisited" - MichelleL11 - 18-03-2021

Hi, Renegade:

Thanks so much for notifying us that this has been revised.  I agree that the first half of this paper is just another way (although unique, I enjoyed learning about time series analysis) to showing mathematically the same thing that has been said so many other ways before -- the VM text just doesn't act like natural language.  But the one piece I found very interesting was this section at page 10, last paragraph:


   



Why do I find the overabundance of integer frequencies and common divisibility by 40 or 160 interesting?  Because I am on constant look out for the VM's "Period 17" data.  A seventeen-based periodicity is what began the process that finally broke the Zodiac Killer's 340 cipher.  Granted, it took a while, but without that initial clue, I'm sure Dave and company would still be running tests.

Could this periodicity of the Fourier Discrete Transform frequencies as integers (first) and then being divisible by 40 and 160 (second) be routes in to better understanding the text creation process?  My understanding is what Fourier Transforms do to time series data is essential act as filters to separate noise from non-noise in the data sets.  It is commonly used in electrical engineering to "pull out" regular signals when the time series data is messy.

For example, this is how it is discussed in Brockwell & Davis, a book cited as "an excellent introduction to the theory" of time series analysis in the cited reference -- Venables & Ripley (2013).  Here's a link to the whole book if anyone's interested -- 
You are not allowed to view links. Register or Login to view.



   



The dotted lines in this signal + noise graph are achieved (revealed) by using the Fourier Discrete Transform to know which portion of the data should be eliminated to achieve a "smoothing" of the data.  In this particular example, eliminating a particular fraction of the Fourier series expansion of the signal  (that fraction that was high-frequency (above f=.035)) allowed for an estimate of the underlying signal.

I simply don't know the math well enough, but could some sort of selective smoothing as was done with the electrical signal graphed above to find and smooth out its "noise component" allow for a true signal to be pulled out from the VM data?  Or is the pattern described in the paper sufficient to figure out some sort of useful pattern for understanding how the text was constructed without "smoothing" the data?  I'd love to hear from the paper's authors or someone with enough math background as to what that paragraph could mean practically.

But, that potential rabbit hole aside, I did want to comment that although I applaud the effort in the "ligatureness" analysis, I'm not particularly convinced.  That's because I don't see how turning some subset of combinations of glyphs not currently seen next to each other in the text into a single glyph is going to explain either the "chaotic behavior" OR, more importantly, the weird positioning of gallows glyphs within lines, paragraphs, and on the page in general.  Instead, I believe that it is this selective positioning that results in the "chaotic" behavior of these particular glyphs rather than some sort of glyph combination process.  

In fact, this what I would ask the authors to show:  knock down any selected, presently there common glyph combination into a single symbol and show that this causes that single symbol to change into exhibiting the chaotic behavior attributed to the gallows glyphs in the time series analysis.  With this kind of data, I would begin to buy into the idea that being a ligature can cause the "abnormal" time series behavoir. Until then, I hypothesize it is because gallows glyphs show up overwhelmingly in certain locations within the text at the line, paragraph and page level (unlike the rest of the glyphs which are more internally associated with each other).

I had hoped that Julian Bunn's page imagery data would support my assertion, but it doesn't seem he did the gallows -- or at least he didn't report them in his blog posting:  
You are not allowed to view links. Register or Login to view.

But with all that verbiage spilled, I still think this is very interesting and hope to get some insight from those more mathematically trained than I as to the applicability of such analysis to our problem.


RE: Discussion of "The Voynich Manuscript: Symbol roles revisited" - RobGea - 18-03-2021

Hi MichelleL11, here is some gallows stuff by Julian Bunn:

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

And this one , slightly less relevant:
You are not allowed to view links. Register or Login to view.


RE: Discussion of "The Voynich Manuscript: Symbol roles revisited" - RobGea - 22-03-2021

(15-03-2021, 01:23 AM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.— "greedy" as the authors phrase it — .

why Takahashi?

I think the use of "greedy" is as a computational term for the class of algorithm used: You are not allowed to view links. Register or Login to view.

Two advantages of Takahashi transcription are:
1) All folios except the Rosette (fRos) are transcribed. (Only the Zandberg-Landini transcription has better coverage)
2) It is very easy to parse out the words and needs very little pre-processing to get the data in a nice format for computation. i.e it parses like butter Smile

The Zandberg-Landini transcription is not so well known and not so easy to parse but i am sure its use will become more common.

- - - -
I do not understand most of this paper but currently i really, really like the ideas it contains.


RE: Discussion of "The Voynich Manuscript: Symbol roles revisited" - RenegadeHealer - 29-03-2021

(18-03-2021, 01:11 AM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Why do I find the overabundance of integer frequencies and common divisibility by 40 or 160 interesting?  Because I am on constant look out for the VM's "Period 17" data.  A seventeen-based periodicity is what began the process that finally broke the Zodiac Killer's 340 cipher.  Granted, it took a while, but without that initial clue, I'm sure Dave and company would still be running tests.

Could this periodicity of the Fourier Discrete Transform frequencies as integers (first) and then being divisible by 40 and 160 (second) be routes in to better understanding the text creation process?  My understanding is what Fourier Transforms do to time series data is essential act as filters to separate noise from non-noise in the data sets.  It is commonly used in electrical engineering to "pull out" regular signals when the time series data is messy.

For example, this is how it is discussed in Brockwell & Davis, a book cited as "an excellent introduction to the theory" of time series analysis in the cited reference -- Venables & Ripley (2013).  Here's a link to the whole book if anyone's interested -- 
You are not allowed to view links. Register or Login to view.

Hi Michelle. In addition to reading the source you provided, I'm going to have to pick my father-in-law's brain about Fourier transformations. He's an electrical engineer and ham radio fanatic, who invented and produces a machine called the You are not allowed to view links. Register or Login to view.. As best I understand it, the linearizer is a radio signal clarifier, that greatly increases the signal-to-noise ratio of an amplified transmission. This allows for greater bandwidth and more efficient use of power in transmitting information over radio waves. (I'm probably getting a lot of this wrong, so anyone who knows this better than I do, don't be shy about correcting me. I fix people, not machines.)

In any event, yes, the paragraph you quoted from Matlach et al. caught my attention as well, and I hope that someone more mathematically inclined than me can use it as a starting point for statistical analysis.

The number seventeen occurs in (at least) two interesting places in the VMs that I've noticed. It's the number of glyphs in each quarter of the middle ring of You are not allowed to view links. Register or Login to view. , as well as the number of single glyphs before the beginnings of lines on each half of the text of You are not allowed to view links. Register or Login to view. . Interesting that these two pages are next to each other on the same side of one bifolio. It seems you were the first on this forum to bring up Patrick Feaster's blog, where he plays with the idea that You are not allowed to view links. Register or Login to view. relates to the process by which Voynichese was created. Granted he's not the first to think of this, but he fleshes out the idea better than most.


RE: Discussion of "The Voynich Manuscript: Symbol roles revisited" - MichelleL11 - 29-03-2021

(29-03-2021, 03:51 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.[quote='MichelleL11' pid='44345' dateline='1616026273']
Why do I find the overabundance of integer frequencies and common divisibility by 40 or 160 interesting?  Because I am on constant look out for the VM's "Period 17" data.  A seventeen-based periodicity is what began the process that finally broke the Zodiac Killer's 340 cipher.

I have realized to my chagrin that it was a period-19 pattern in the Zodiac-340 cipher.  Will take this opportunity to correct that!  But l do agree 17 is a weird number for the VM (does this give me an excuse for my error  Rolleyes ? and still think something could come from that in the future).

FWIW, l sent an e-mail with my questions to Matlach’s academic e-mail and invited him to answer either me or on the board - so maybe we’ll be hearing more.  Of course, anything your father in law could add is much welcome.