The Voynich Ninja

Full Version: What are the characteristics of Labelese?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
Koen says: "It doesn't prove that these are True Labels, but it is still something that has to be taken into account." Maybe I am being unfair. I suppose this test told me something that I would have thought would be the case anyway and so the result hasn't advanced my thinking. However I definitely agree that result has some value, but more precise tests would also be of value.
What I think are the likely implications for voynichese as a whole of my thinking?

The voynichese contains a large proportion of "null word" filler text. I think likely a larger proportion of filler in the main text than the labels. Again I suspect that null words fall into spelling "clusters" of word with similar spelling I.e. variants of a standard spelling "formulae" or pattern. Though I have studied sentence text much less than label text, but I think it not unlikely that the author had a bias to using different null word clusters in different contexts than others. Different languages could just be different "null languages" i.e. changing mental biases to different null words/null formulae. This is all very speculative, but the most consistent explanation than I can come up with on the basis of my familiarity with the evidence.
One thing that I mentioned on Nick's blog, but that I also thought worth mentioning here is that if we have a large amount of filler text(null words) then most of the statistical tests applied to Voynichese are likely to be of very little use as they provide statistics related to the aggregate of null and non-null text, which are quite different things, and therefore those statistics don't tell us much about either of those.
I moved this thread to "text" since it's gotten rather large for "questions to experts".
(30-08-2019, 03:07 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Here's a slightly different view on the labels. Let's just consider the 300 zodiac labels.

None of them are 'very frequent' Voynich words. If we look at the five most frequent words, which I suspect to be:
daiin   chedy   Shedy   chol   aiin

and I count their occurrence in the MS using the table at the bottom of You are not allowed to view links. Register or Login to view. , then I arrive at 2549 word tokens (apart from counting errors).

The total number of word tokens in the MS is of the order of 38,000 .
This means that 6.4% of all word tokens is one of the above five, but among the labels they don't appear.

If one were to take (arbitrarily) 300 words in the manuscript, then the probability that none of them is one of the above five is 2 * 10^ -9.

This just confirms what we knew: the labels are not standard VMs text, but something different.

Dear René, 

what you write here doesn't fit the description we have given in our paper. We [font=Tahoma, Verdana, Arial, sans-serif]never argued that words are [font=Tahoma, Verdana, Arial, sans-serif]homogenous distributed. On the contrary we [/font]wrote [/font]"No obvious rule can be deduced which words form the top-frequency tokens at a specific location, since a token dominating one page might be rare or missing on the next one" (Timm & Schinner 2019, p. 3). [font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]For instance o[/font][/font][/font][font=Tahoma, Verdana, Arial, sans-serif]nly [/font]daiin[font=Tahoma, Verdana, Arial, sans-serif] is common for the whole manus[/font][font=Tahoma, Verdana, Arial, sans-serif]cript. But even [/font]daiin[font=Tahoma, Verdana, Arial, sans-serif] is not homogenous distributed within the VMS (see Timm 2015, p. 25). On page 6 we even use this observation as a counter argument to the language hypothesis [font=Tahoma, Verdana, Arial, sans-serif](see Timm & Schinner 2019, p. 6).[/font][/font]

[font=Tahoma, Verdana, Arial, sans-serif]Back in 2017 I have already answered a similar argumentation You are not allowed to view links. Register or Login to view.. It seems as if you didn't read the answer: [/font]

For the self-citation method it is something different if you write text or if you write labels. In one case you can copy words from previous lines in the other case you have some distance between text and the place you write. Most of the time you have only previous labels as source words available. Moreover if the labels are arranged in circular form you would probably turn the page while writing the labels. This makes it harder to copy labels which are just up side down. Therefore it is expected that more unique words are used as labels.

[font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]In some cases l[font=Tahoma, Verdana, Arial, sans-serif]abels have to fill a certain amount of space. In this cases it is not expected that the scribe chooses a word that doesn't fit into the given space. [/font][/font][/font]

chol is typical for Currier A and chedy and Shedy are only frequently used in Currier B. L[font=Tahoma, Verdana, Arial, sans-serif]abels[/font][font=Tahoma, Verdana, Arial, sans-serif] occur mostly in the Pharmaceutical section, the Astronomical section and the Cosmological section. Even if the Pharmaceutical section is counted as Currier A this are just the sections between Currier A and B (see Timm & Schinner 2019, p. 6). The pages in Currier B only rarely use [/font][font=Tahoma, Verdana, Arial, sans-serif]labels[/font][font=Tahoma, Verdana, Arial, sans-serif]. Therefore the only place where you can expect a word like [/font]chedy[font=Tahoma, Verdana, Arial, sans-serif] as label is the Biological section. And in the Biological section you can find at least a label [/font]otol Shedy[font=Tahoma, Verdana, Arial, sans-serif] in <f77v.L.1>.[/font]

[font=Tahoma, Verdana, Arial, sans-serif]There are three [/font][font=Tahoma, Verdana, Arial, sans-serif]labels[/font][font=Tahoma, Verdana, Arial, sans-serif] using [/font]daiin[font=Tahoma, Verdana, Arial, sans-serif]:[/font]
<f67r2.X.6>      tol daiin
<f68v2.R.12>     dchedal daiin
<f72r3.S1.9>     oteey daiin

[font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]There is one [/font][font=Tahoma, Verdana, Arial, sans-serif]label[/font][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif] using [/font][/font][/font]aiin[font=Tahoma, Verdana, Arial, sans-serif]:[/font]
<f68v2.R.11>    adairchdy aiin

[font=Tahoma, Verdana, Arial, sans-serif]There are also [/font][font=Tahoma, Verdana, Arial, sans-serif]labels[/font][font=Tahoma, Verdana, Arial, sans-serif] similar to [/font]daiin[font=Tahoma, Verdana, Arial, sans-serif] and [/font]chol[font=Tahoma, Verdana, Arial, sans-serif]:[/font]
<f68r1.S.17>     ordaiin
<f68r2.S.2>      odaiin
<f75r.L.7>       dainy
<f68r2.S.5>      dchol
Torsten,

this thread is about finding properties of the labels. The statistic I computed is one example of a very clear property, where the label text very distinctly differs from the main text. The purpose of my post was not at all to address your paper.

For any kind of explanation of the Voynich MS text, this is one of a list of properties that need to be explained or at least taken into account.
For any explanation that involves arbitrary text generation, like Gordon Rugg's or yours, it requires that 'something different' must have been done. Your earlier answer to a similar argument (where I didn't actually compute any probability figure) did not argue that this was not the case.

There are more such points, and these were already brought up when Gordon first presented his method, but these do not concern the labels.

So, not to leave any ambiguity, the post shows that the running text and the label text (specifically the zodiac labels) have quite different properties. The running text includes high-frequency words that never appear among the labels, but many if not most labels are 'valid' words that appear somewhere in the running text. This fits with the idea that the running text has sentences, while the labels only nouns, adjectives or numbers (for example). Of course it is not proof of it.

It might be even more interesting to make a similar analysis specifically for the pharmaceutical pages, because here the labels and the running text are mixed on the same pages, and the Currier language is essentially A, even though it has its own 'flavour'. However, I am not as familiar with these labels.
(03-09-2019, 05:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Torsten,

The purpose of my post was not at all to address your paper.


René, you argue that you expect to see some random results for the self-citation method and you falsely suggest that repeated self-copying has something in common with Gordon Ruggs Cardan Grill hypothesis:
(30-08-2019, 06:52 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.If all of the VMs text is somehow the result of a random process, in the style of Rugg or Timm, then we would see something different.


[font=Tahoma, Verdana, Arial, sans-serif]This clearly addresses the self-citation method described in my paper. You also characterize the self-citation method as some random process. [/font]
Please note that in our paper we "presented some evidence that not only frequency and similarity of tokens are correlated, but also similarity and relative position. The closer two words are (with respect to their edit distance), the more likely these words also can be found written in close vicinity (i.e. on the same page)" (Timm & Schinner 2019, p 6). With other words we argue that the VMS-text is far from random. Moreover our argumentation for a meaningless text is: "the high regularities of the VMS text significantly limit the maximal amount of information possibly hidden within the 'container', virtually rendering it useless" (Timm & Schinner 2019, p. 17).

(03-09-2019, 05:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.For any explanation that involves arbitrary text generation, like Gordon Rugg's or yours, it requires that 'something different' must have been done. Your earlier answer to a similar argument (where I didn't actually compute any probability figure) did not argue that this was not the case. 
[font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]
[/font][/font]

Back in You are not allowed to view links. Register or Login to view. I clearly say that it is something different: "For the self-citation method it is something different if you write text or if you write labels." 

In our paper we write: [font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]"As for the actual selection process of source words, it is clear from the results of section 2 (as well as simply suggested by the scribe's convenience) that they are to [/font][/font]be chosen at least from the same page." (Timm & Schinner 2019, p 10). We also say "However, all pages containing at least some lines of text do have in common that pairs of frequently used words with high mutual similarity appear." (Timm & Schinner 2019, p 3). With other words, the resonance effect requires at lease some amount of text. [font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]This means for the self-citation method it is expected that more repetition exists for pages full of text and more unique words for pages containing less text[/font][/font][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]. [/font][/font]A page using many labels did contain less words than a page full of text. Therefore a different outcome is expected (see You are not allowed to view links. Register or Login to view., p. 7ff).[/font]
One thing that I have touched upon that I think worth emphasising is that the way human beings think is very different from the way modern computers operate, though of course attempts have and are being made with artificial intelligence to emulate the kind of processes that operate within the human mind and in animal minds as well. (I do support the notion of Strong AI i.e. that the human mind is a computer in terms of the broad definition of it.)

A standard modern computer, or for that matter in many instances a much older computer, can achieve some tasks with ease that for most human beings are near impossible(savants may be an exception.) So when trying to implement or define an algorithm to mimic the behaviour of the human author of the Voynich(I am not aware of anyone yet claiming that the manuscript was written by machine.) then one must be aware of what a human is computationally reasonable capable of, given questions like cognitive load, cognitive biases and general cognitive processes.

Consistently picking a random(pseudorandom) number between 1 and 26 for a computer is trivial, but for a human being without some mechanical tool like a pack if cards or a many sided dice is very difficult. Human beings can easily have biases, say to their favourite number, so when repeatedly generating random numbers those subliminal biases can come into play. My brief study of labels makes me inclined to think those cognitive biases are evident there; this is one reason why I don't think a cardan grille or equivalent mechanism was used, though I don't completely rule out the possibility that some very simple mechanism or technique of that kind may have been applied. I think one has to consider what human beings are inclined to do to reduce the cognitive load on their brains.

So I will state what I think this means in the context or null words or filler text. Sometimes copying pieces of text from one line to another, as Torsten mentions  is mentally much easier than producing an original line of text. A bias to using similarly spelled words near or next to each other is mentally less draining than producing consistent more original null words even if those null words are within a narrow framework or structure of definition. If the null word filler text theory is correct the sheer quantity of null text the author would have had to produce is quite considerable, so unless an individual is absurdly disciplined cognitive shortcuts would have bound to have been employed either consciously or unconsciously to reduce the mental demands on them, I certainly would have done so. It could be argued that really I am talking about a kind of laziness on the part of the author, but given this quantity of text this laziness really makes sense.

In conclusion any encryption theory of the Voynich must take a realistic account of what would have been mentally practical for the author. So for example a very complex encryption procedure, whilst it may be easy for a computer to implement, could very easily be completely impractical for a human being to implement who aimed to produce a manuscript in a reasonable period of time. So coding simulation or encryption implementation algorithms can be very valuable, but we always should ask whether a human being could or realistically would be able to implement such an algorithm.(I should say that I am not pointing my finger at specific people in these remarks, but rather making a general point.)
I must add that I do think the author employed mental tools or techniques such as having null words conform to a standard format or formula, which makes them much easier to generate. Though having a way to easily recognise null words is also important and so some kind of formula would be necessary for that reason.
This looks like a very good approach, but I think we should expect a fairly high level of encryption (as compared to other 15th century manuscripts), since the author of the VMS must be the person who concieved the drawings IMO, and they were having meanings hidden to us until very recently (I'm talking about the biblical imagery hidden in plants). So I dont think a person who used images in such a way to express/hide his ideas would think differently when it comes to the text. 

Maybe he filled it with nulls if he had a clear motive, but I dont think that's the case; and IMO the text must contain some amount of information atleast.
Pages: 1 2 3 4 5 6 7 8 9 10