The Voynich Ninja

Full Version: A new Timm & Schinner publication regarding the Malta conference
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
For reference, PCA done on all folios, using word frequency vectors, separates Currier languages A/B pretty well, even though the two clouds are contiguous; see You are not allowed to view links. Register or Login to view. posted by MarcoP here: You are not allowed to view links. Register or Login to view.

I wonder if low frequencies and unreliable transliteration make the cloud of points more scattered or "fuzzy".
(23-08-2023, 02:29 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.The problem is that when you say, "Let us, for the moment, assume two well-separated domains, A and B." there is an extent to which that is setting up a strawman. It is entirely possible to have multiple populations with overlapping tails in some feature space; the existence of overlapping tails isn't evidence against the existence of multiple underlying distributions.

Obviously we did not quite get your point before (maybe subconsciously caused by the slightly less than polite wording). The absence of a "domain boundary" in our figure 1 was never meant to serve as sufficient condition for the non-existence of two clearly separated text domains, but rather as necessary condition. Our proposed continuous A-B evolution is not based on this result (see our 2020 paper). In hindsight, we should have emphasized this more clearly in the text.

Nevertheless, the methodology behind direct inspection of the cosine distance is not complete garbage (despite its undoubted limitations). In your counter-example with the two bivariate normal distributions you have separated both subpopulations by approximately the maximum possible to ensure that still the effect is no longer visible.

[attachment=7546]

Note that a separation by 4σ (as in your example) to 5σ also approximately defines the boundary where the hypothesis of a single population becomes statistically highly insignificant. The VMS A/A, B/B, and A/B distributions show a much bigger overlap (according to your graph), so any separation effects are clearly under the sensitivity threshold of the cosine distance method. Unfortunately, the underlying actual distributions (and their corresponding metric distances) are not trivial to evaluate in this case, and yes, we are aware of the crux of small sample sizes.

Interestingly, this method most likely still produces results beyond random noise when applied to the VMS. In our 2020 paper we proposed a sorting order of the VMS sections based on token statistics, corresponding to the A-B evolution. The following table shows the averaged cosine distance between the respective section and Bio (our assumed endpoint of the A-B evolution).

Code:
Section  <cos(φ)>
Herbal A    0.17
Pharma A    0.24
Astro       0.25
Cosmo       0.47
Herbal B    0.37
Recipes B   0.53
Bio B       0.74

With the exception of Cosmo, the cos(φ) values monotonically grow with the proposed section ranking. Is this merely by coincidence? As for Cosmo: while the cosine distance would suggest to rank this section following Herbal B (rather than preceding it), we put more weight on our previous token-based analysis here (because of the extremely low folio number).

See also the following table showing the frequencies for the most common token types for the Biological section (shedy, ol, chedy, qokedy, qokain, qokeedy, qol, qokal, shey, and chey):
Code:
          Herbal (A)  Pharma (A)  Cosmo+Astro  Herbal (B)    Stars (B)    Bio (B)
shedy        (0.0%)    1 (0.0%)    17 (0.4%)    35 (1.1%)   113 (1.1%)   247 (3.6%)
ol        58 (0.7%)   39 (1.5%)    43 (0.9%)    35 (1.1%)   111 (1.0%)   233 (3.4%)
chedy      1 (0.0%)    1 (0.0%)    24 (0.5%)    62 (1.9%)   190 (1.8%)   210 (3.0%)
qokedy       (0.0%)    1 (0.0%)     5 (0.1%)    39 (1.2%)    61 (0.6%)   164 (2.4%)
qokain     1 (0.0%)    1 (0.0%)     6 (0.1%)     5 (0.2%)   105 (1.0%)   159 (2.3%)
qokeedy      (0.0%)      (0.0%)     4 (0.1%)     9 (0.3%)   137 (1.3%)   153 (2.2%)
qol        2 (0.0%)    4 (0.2%)       (0.0%)     1 (0.0%)    28 (0.3%)   116 (1.7%)
qokal      2 (0.0%)    3 (0.1%)    16 (0.3%)     9 (0.3%)    41 (0.4%)   107 (1.5%)
shey      40 (0.5%)   19 (0.8%)    23 (0.5%)     6 (0.2%)    84 (0.8%)    99 (1.4%)
chey      55 (0.7%)   21 (0.8%)    22 (0.5%)    20 (0.6%)   123 (1.2%)    94 (1.4%)

We agree that the rigorous statistical significance of the direct folio comparison, and in particular of our figure 1, is difficult to assess. More advanced methods like PCA might gain better insight, but come with their own potential string of problems.
(22-08-2023, 08:46 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Let me start by saying that I have full confidence in Lisa Fagin Davis' identification of the five hands.

Especially if one wishes to consider Lisa Fagin Davis's identification, requesting some supporting evidence becomes essential. This approach could potentially address certain inconsistencies within her work. For example, Lisa Davis associates all folios in Currier A with scribe 1, except for folio f58, which she attributes to scribe 3. However, without understanding the rationale behind assigning scribe 3 to folio f58, resolving this matter remains challenging.

After Davis's initial identification of the five scribes, it became relatively straightforward for her to discern between them. Consequently, providing a page-by-page documentation of her work should be a manageable task. However, Davis didn't address my concerns. Instead Davis claims that the Archetype software "was just a way to help me get started" and that she actually "examined every page manually before assigning scribes to each". This new statement contradicts her previous statements that she used the Archetype software for her Voynich manuscript analysis [see Davis 2020, p. 8 and Davis 2022 p. 2]. In my eyes such a response speaks for themself.


(22-08-2023, 08:46 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The second important point is that it is methodically wrong to base the number of scribes on textual statistics. There is no reason to believe that the two are linked to each other. In fact, people have been trying to find such a correlation since Lisa's results were published, so far without a clear correlation, apart from, of course, the original Currier A vs. Currier B correlation with his Hand 1 vs. Hand 2.

However Davis is arguing that "distinctive word-use patterns for each of these hands" exist [Lisa Davis 2022, p. 5].

Davis even argues "René Zandbergen has recently observed that the work of Scribe 4 (Language B) can be defined by two additional tests: the relatively small frequency of the [qo] bigram, and the equally small frequency of [ed]. In other words, in addition to the shape of the [k], [n], and [f], the frequency of [qo] and [ed] can help identify the work of Scribe 4." [Lisa Davis 2022, p. 6].

However, you wrote the referenced webpage in 2016. Davis on the other side didn't wrote anything about Scribe 4 before 2020. Therefore Scribe 4 is not even mentioned on your web page. This is what you actually wrote back in 2016: "The very common character combination <qo> is almost completely absent in the zodiac pages and the rosettes page, but appears everywhere else." [Zandbergen 2016, You are not allowed to view links. Register or Login to view.]. Anyway, since you wrote the webpage back in 2016 it is not possible to use this webpage as independent confirmation of a hypotheses published in 2020.

It is also wrong to associate the Zodiac/Astronomy folios f67 - f73 with Language B. Moreover, a small frequency of [ed] would indicate an association with Language A instead of Language B. Davis even writes herself on the very same page "A test for Language B is the frequent use of word-final [dy] ... and the bigram [ed], which shows the same pattern." [Davis 2022, p. 6]. It is also possible to point to numerous instances of chedy, shedy, otedy, ... etc. on these folios [see You are not allowed to view links. Register or Login to view.]. 

The Zodiac/Astronomy section contains exceptionally many labels, and it has long been known that the <qo> bigram is underrepresented in labels. Therefore the Zodiac/Astronomy section was neither attributed to Currier A or B. Additionally, there are other Voynich manuscript folios not attributed to "Scribe 4" that exhibit exceptionally low <qo> and <ed> frequencies (see for example the bifolio f1r, f1v, f8r, f8v). [see You are not allowed to view links. Register or Login to view. p. 4]

All these mistakes in just two sentences are not even an exception. 

For instance Lisa Davis also writes "Claire Bowern and Luke Lindemann of Yale University have recently conducted an initial analysis of word-frequency patterns in the work produced by each of these five scribes and have identified distinctive word-use patterns for each of these hands." [Lisa Davis 2022, p. 5]. However the referenced paper "The Linguistics of the Voynich Manuscript." by Claire Bowern and Luke Lindemann didn't contain such an analysis. Davis had probably the paper "Topic Modeling in the Voynich Manuscript" by Rachel Sterneck, Annie Polish, and Claire Bowern in mind. But also this paper didn't identify any distinctive word-use patterns for any of the five hands. The paper actually states "Overall, the results suggest that 'topic' as defined by NMF is not quite synonymous with hands. We cannot create a complete one-to one mapping between NMF topics and Voynich scribes" [Sterneck et. al., 2020, p. 12].
(30-08-2023, 07:08 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The Zodiac/Astronomy section contains exceptionally many labels, and it has long been known that the <qo> bigram is underrepresented in labels. Therefore the Zodiac/Astronomy section was neither attributed to Currier A or B. Additionally, there are other Voynich manuscript folios not attributed to "Scribe 4" that exhibit exceptionally low <qo> and <ed> frequencies (see for example the bifolio f1r, f1v, f8r, f8v). [see You are not allowed to view links. Register or Login to view. p. 4]
If the VMS is codified based on n-grams there is an easy reason for it. N-grams with <qo> are in general larger than n-grams without it. 
In the limited space of a label, with more than one way to  codify, a n-grams without <qo> will be preferred.
What is needed is a second look at the alphabet. The EVA might work well for computer analysis, but it does not work with the proper linguistic. If the EVA q was followed by u, it would be easy to assume that the language is Latin or any of the romance languages. According to logical thinking, EVA Q is a consonant. Any other other consonant, except Q would fit better.  
Further more, it has been determined by computer by computer analysis that EVA qo was a prefix. It was also discovered that the language has many different suffixes which alludes to a highly inflective language. 
In my search to improve EVA for transcription I came to the firm conclusion that EVA q stands for P. Therefore, PO is prefix that can be found in the medieval dictionaries. In Slovenian, PO is a prefix, a preposition, and the first two letters of the root word. As a prefix, it indicates finished action that can be applied to verbs in different tenses, as well as to nouns. Because of the multiple functions of PO, it is so frequent in Slovenian language.

While many VM letters show great likeness to Latin letters, the four tall glyphs are the exceptions. It is also interesting that the VM does not contain lettershapes  K and T.  Even the Latin writing often contain the  k-letter form, although c was used for the sound K. It is therefore reasonable to conclude that the EVA K and T can be used for transcription as well. 

Since I had changed EVA q to p, I needed to find another Latin eqivalent for EVA p. EVA p, as well as f, have similar properties - they both have unique shape alluding to the similarity of the sounds they represent. They also can stand alone as a separate word, which means that they contain unwritten semivowel, but at the same time, they can also be followed  by a vowel. I designated them for sv, cv, zw, sue sounds, which were written differently in different writing conventions. 

There is plenty of clues in the Voynich alphabet, if we compare it to alphabets used in Latin, German, Italian manuscripts to conclude that the language is Slavic. One legged f was most likely meant to be cv/zw (for CVET or Russian KVET - blossom), but they are used interchangeably int the VM.  The fancy two-loop tall glyph suits perfectly for the sound sv (like in swelling). 

Since both EVA p and f  act like phonetic syllables (sue, cue) or as bi-glyphs (sv, cv), they can be followed by a consonant or a vowel. This means that EVA p, f cannot be abbreviation for EVA ke/te.  A similar rule applies to k and t,  ch and sh. It looks like the author had difficulty spelling the words where the semivowel (which in OCS was a separate letter) was dropped. He often used space to indicate the missing semivowel, particularly if the semivowel was stressed.

If I had to guess why the author invented such a fancy letter form for SV and CV, I would say that  he was considering the importance of the words that start with these glyphs which are usually the initial letters in the first word in the paragraph. Most words are associated with spiritual things, like holy, Savioir. freedom. world, star, blessed, advice, etc. The letter would also make nice abbreviation for  sveti /holy.) Because of the flexible word order and the tendency to place the most important word at the beginnng, the VM paragraphs often start with SV.

Regardless of how many scribes had participated in the creation of the VM, there is only one language. The difference is caused by inflection  that change the suffix.
(30-08-2023, 07:08 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.However, you wrote the referenced webpage in 2016. Davis on the other side didn't wrote anything about Scribe 4 before 2020. Therefore Scribe 4 is not even mentioned on your web page. This is what you actually wrote back in 2016: "The very common character combination <qo> is almost completely absent in the zodiac pages and the rosettes page, but appears everywhere else." [Zandbergen 2016, You are not allowed to view links. Register or Login to view.]. Anyway, since you wrote the webpage back in 2016 it is not possible to use this webpage as independent confirmation of a hypotheses published in 2020.

I wrote the two related web pages around 2000. Any later dates can only have been a minor update.
But in any case that is irrelevant.

Textual statistics (frequencies of characters, words, combinations) were not an input to the handwriting analysis of Lisa. She may have pointed to them to indicate some correlation, but that is obviously a different thing.

It is premature to mix them. Future work may find correlations or it may not. This is because the textual statistics would be affected by a combination of:
- subject matter, in case the text is meaningful
- arbitrary impact from having too small samples, or samples of very different lengths
- encoding strategy, in case the text is meaningful and has been encoded.
- different scribes applying different ways to implement author's instructions (similar to the third point)
We cannot separate this yet.

The topic of this variation of textual statistics per page continues to interest me, and I am only too much aware of the shortcomings in the two pages I have on this topic.

In the first case I used words, but that has a number of inherent problems, not just the dimensionality and the undersampling. That's why I used bigrams in the second case, for which the sampling is much better. However, I made a sub-optimal use of the cosine function.

For the time being I am concentrating on other things...
(31-08-2023, 08:12 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I wrote the two related web pages around 2000. Any later dates can only have been a minor update.
But in any case that is irrelevant.

The relevant statement "The very common character combination qo is almost completely absent in the zodiac pages and the rosettes page, but appears everywhere else." was added in 2016 (see You are not allowed to view links. Register or Login to view.).

(31-08-2023, 08:12 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Textual statistics (frequencies of characters, words, combinations) were not an input to the handwriting analysis of Lisa. 

There is no doubt that Lisa was aware of this sentence before her first paper was published, as she cited this statement earlier at the BSA Annual Meeting in 2020 (see the video from the You are not allowed to view links. Register or Login to view. minute 31).

(31-08-2023, 08:12 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.She may have pointed to them to indicate some correlation,

Lisa unmistakably argues that these statistics would confirm her scribal distinctions: "External evaluation supports these scribal distinctions" [Lisa Davis 2022, p. 7]. But since these statistics were known beforehand, such an argumentation is not feasible.

Anyway, the entire debate surrounding external evidence supporting the scribal identifications only becomes necessary due to a contrast between the observations described by Lisa and our own findings. According to Lisa a horizontal crossbar for EVA-k indicates that the quill had been lifted after completing the vertical (two strokes), whereas a bowed crossbar more likely is the result of writing the glyph with a single stroke. To our "untrained" eyes "the variability of writing EVA-k appears very consistent throughout the entire manuscript: For nearly every page it is possible to find instances of EVA-k written with bowed and with horizontal crossbar, respectively. Additionally, on nearly every page some instances of EVA-k show an overlap between vertical stroke and crossbar, also indicating two strokes." (Timm & Schinner 2023, p. 8 You are not allowed to view links. Register or Login to view.). 

If we were mistaken it should be easy to provide a page-by-page documentation for all the instances of EVA-k written in one stroke that we might have overlooked. By providing such documentation, along with an explanation clarifying why all the instances of EVA-k with an overlap between vertical stroke and crossbar do not necessarily imply that the quill had been lifted after completing the vertical, this could effectively resolve the entire debate.
I really do not know what you are trying to argue.

If another palaeographer would go into a discussion with Lisa, then that would make sense, but not in this case.

As Karl Kluge already pointed out, the existence of several different hands does not by itself rule out that the text could be meaningless. Just imagine this scenario (which is one of many possible scenarios):

A draft could have been created by one person, and this was copied into a final form, on parchment, by collaborating scribes.

I'm not making any statement about the likelihood of this scenario.

From my personal perspective, I am just interested in the variation of textual statistics, more in particular how they can be explained, or quantified better. I am not yet ready to conclude whether there are gradual changes or quantum jumps. In case there are quantum jumps, these may or may not coincide with page boundaries, or bifolios. 

I have seen some indication that the main change from Currier language A to language B happens in the course of the zodiac section. All the jumps coinciding with bifolio changes that we observe can be ascribed to the pages order having been disturbed. (Even though it also isn't just as simple as that).
(01-09-2023, 02:11 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I really do not know what you are trying to argue.

If another palaeographer would go into a discussion with Lisa, then that would make sense, but not in this case.

"Along with anyone else, Torsten is more than welcome to disagree with me in accordance with his own observations, as no one should accept my conclusions without critical analysis." [You are not allowed to view links. Register or Login to view.].

To my eyes the <k>-glyphs in <sokeey> in f107r.P.42 and in <todky> in f107r.P.45 were written with two quill strokes. 
[attachment=7563]

According to Lisa the <k> written by Scribe 1 and Scribe 3 is "distinguished by a sharp angle at the top of the first vertical as the quill changes direction, a bowed crossbar, a round loop, and a very slight foot at the base of the second vertical." [Lisa Fagin Davis, 2022, p. 3 You are not allowed to view links. Register or Login to view.]. According to Lisa "a bowed bar tends to result from a smooth directional change from the top of the first vertical, while a horizontal crossbar is the result of lifting the quill after completing the vertical." [Lisa Fagin Davis, 2022, p. 3]

With other words I don't see what Lisa sees. 

For more details see also You are not allowed to view links. Register or Login to view. by Jan. B. Hurych (2007) as well as the You are not allowed to view links. Register or Login to view. by Timm & Schinner 2023.
(01-09-2023, 06:38 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.According to Lisa the <k> written by Scribe 1 and Scribe 3 is "distinguished by a sharp angle at the top of the first vertical as the quill changes direction, a bowed crossbar, a round loop, and a very slight foot at the base of the second vertical." [Lisa Fagin Davis, 2022, p. 3 You are not allowed to view links. Register or Login to view.]. According to Lisa "a bowed bar tends to result from a smooth directional change from the top of the first vertical, while a horizontal crossbar is the result of lifting the quill after completing the vertical." [Lisa Fagin Davis, 2022, p. 3]

Also found in the 2020 article: How Many Glyphs and How Many Scribes? Digital Paleography and the Voynich Manuscript.

There is also the question "Is the glyph formed by one or two strokes?" I suppose it refers to EVA-k, as the change of direction implies a single stroke. But how does one write EVA-k in a single stroke with a quill? Is it even possible? Surprised

Such a long vertical upstroke would be risky on a surface that is not perfectly smooth.

Or should this be understood as apparently one stroke (no visible disconnection)?
Pages: 1 2 3 4 5