The Voynich Ninja

Full Version: Bigram = phoneme theory (language agnostic)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
(06-03-2019, 02:33 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.There are two ways Voynichese is positional... the groupings of glyph combinations, and where they can occur in a token.
The first goes very well with the natural idea of a 'verbose' cipher with homophones and nulls, the second not really. One way to resolve the apparent contradiction is to have the same glyph play different roles, just like the Latin con-, -us abbreviation, but that still restricts severely the possibilities.
(06-03-2019, 12:15 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Hello Geoffrey,

It should be noted that frequencies of very common bigrams (not just EVA-ed between "languages" A and B) vary a lot from page to page. For example f. 15v has the highest frequency of EVA-or, f. 58rv have the highest frequency of EVA-al. On the other hand some very common bigrams are missing or almost missing on some pages. For example there are no EVA-dy on f. 5v, 6r, 19v, 25v, 35v, no EVA-or on f. 26r.

If these very common bigrams stand for some cleartext or phoneme by themselves, to account for the high variability in frequency there may be homophones: other bigrams that play the same role. It should be possible (in principle) to identify them by finding an optimum partition of the common bigrams set (or any set of common patterns) that keeps the frequency of each group of bigrams (or patterns) as stable as possible over pages of large, relatively homogeneous portions of the VMs (one or several quires). I haven't tried it yet, maybe someone else has...

Thank you for this very useful and significant observation. I have to admit, sadly, that it brings me back to the "meaningless text" hypothesis, which I consider the null hypothesis in the linguistic analysis of the Voynich ms text. In this scenario, the author simply used whichever particular glyph combinations most suited his fancy while writing that particular page. In this case, one would analyze the glyphs and combinations and vords more as elements of artwork than as meaningful linguistic content. By this hypothesis, f. 15v has a lot of [or] in the same way that one particular painting may have a lot of the color blue, f. 58rv have a lot of [al] as another painting may have a lot of the color yellow, whereas on f. 26r the author didn't feel like putting any [or] on it, as a painter may choose not to use the color blue in a particular piece of artwork.

I repeat, I don't like the idea of this null hypothesis being true, and I very much hope it is wrong. But I think it's a mistake to lose sight of the possibility that it may be true, and we should always consider this possibility as we attempt to explain observed phenomena in our analysis.

But, analyzing the data under the natural language hypothesis, yes, your point about some bigrams being homophones of other bigrams is a plausible explanation. However, this hypothesis may then lead to further complications in attempting to analyze the rest of the text and develop a complete correspondence of the entire Voynich character/bigram inventory with the entire alphabet/abjad of the underlying language. I know this from extensive personal experience. I recall one hypothesis where my attempt to account for the phenomena you describe led me to posit that [ok], [yk], [ot], [yt], [ky], and [ty] all represented the same single phoneme! Of course the problem with such an analysis is that one quickly runs out of bigrams or characters to represent the rest of the language, and the text becomes an extremely repetitive cacophony of the same small number of sounds/letters. I have found the same issue with [or], [ar], [os], and [s]. Distinguishing them each as a distinct phoneme leads to problems making sense of particular individual passages of text; conflating groups of them as the same phoneme may solve the local problem, but leads to an illogical and unnatural linguistic structure of the character inventory and ms text as a whole.

Here is another example I have observed of the phenomenon that you cite: the final glyph [n] is ubiquitous throughout all sections of the ms text, usually as part of [ain] or [aiin]. But it is strangely absent on f. 27v, and it is strikingly rare among the astronomical star(/planet?) labels on f. 68r123. Only 3 of the 65 star labels contain [n], and one of them is in the quite rare medial position in [oiinar] on f. 68r1. There is also [ordaiin] on the same page, and [odaiin] on f. 68r2.

I also notice that [n] is quite rare among the plant/root *labels* in the pharmacological section: it occurs with any frequency on those pages only in the paragraph text.

This leads me to suspect that the [ain] / [aiin] suffix may be some kind of grammatical morpheme that occurs frequently in grammatical linguistic text, but not in isolated label names.
I've long suspected that words ending [in, iin, iiin] could be reflecting some aspect of prosody. I don't have the proof and looking for it hasn't turned up anything solid. It would go some way to explaining certain aspects of its distribution and the interrelationships of the words containing [i].

On the topic of bigrams:
  • The "gappiness" of the bigram table of all possible combinations speaks to a confounding factor inherent in the glyphs, such as sound values, which govern their combination.
  • Words can be constructed incrementally, one glyph at a time: [chedy] > [kchedy] > [okchedy] > [qokchedy].
  • There are a handful of sequences of single glyphs which suggest that the writer conceived of them as being discrete.
  • For any given bigram its place and relationship in a word is often governed by one glyph or the other, and not a combination of the two. For example: all bigrams of the form [ch*] will have a similar leftward relationship regardless of the second glyph. There are exceptions, but the rule is true enough to suggest that we're dealing with two glyphs with different identities rather than a whole with its own identity.
I really like the statement of @nablator:

(06-03-2019, 01:47 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.... I have a nagging feeling that we are missing something very obvious and very artificial in the way Voynichese is constructed.

This is because it coincides exactly with my own nagging feeling.
It one wants to solve a cipher, one should attack it at its weakest point, which is where it is showing the clearest signatures (i.e. in the statistics).
The Voynich text is not necessarily 'just' a cipher, but the same rule should be applied.

While the anomalous conditional entropy is long known, this is just a single number. What is more important is that this number reflects a completely different distribution of bigrams.

Regardless whether one wants to approach the problem by finding a method to convert Voynich text back to plain text, or the other way around, to find a way to convert a normal plain text to Voynichese, this is one of the most fundamental problems to resolve. (It is not the only one!!).

Clearly, attempts to just find a translation table for characters and a matching language can only fail, if this goes nowhere to explain the fundamental differences in the bigram statistics.

I also very much like a recent statement of @davidsch (in another thread) which was along the lines: "presenting lots of complicated graphics won't help". This is very true, as shown by all the repeated attempts to do the same thing over again. However, those very few who take the time to take a closer look have a much greater chance of not failing in the same way.

Find the weakest spot and find a way to explain in.
Bennett put the first quantitative information about this forward in the seventies. Ever since that time, I have not seen any reasonably successful explanation of 'what could have caused it',  apart from the generic proposal of a 'verbose cipher'.
Here we go slightly off the topic, but a "something very obvious" would be that we have a nomenclator lost.
I also like the words of nablator: I have a nagging feeling that we are missing something very obvious and very artificial in the way Voynichese is constructed.
  
 The most obvious thing of all is that when we look the script we don't see letters. Only two or three glyphs look like letters of the Roman Alphabet, but they only seem so. Everything we see are glyphs that we have to interpret trying to put us in the shoes of a man of the XV century.
  The most difficult thing is not to intepret the glyphs but to understand the mentality of this man
Which glyphs have the highest and lowest effects on entropy? That is, how would we normalise the entropy value of the text to natural language written in the plain with the fewest possible changes?
(09-03-2019, 12:25 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Which glyphs have the highest and lowest effects on entropy? That is, how would we normalise the entropy value of the text to natural language written in the plain with the fewest possible changes?

That is a very good question. From what I have seen, this cannot be confined to just a few glyphs or characters.
The ones that have the largest impact are the ones that create the most restrictive combinations.
And this seems to apply to almost all of them.

Jim Reeds did an experiment along these lines of thought in the 1990's.
One could take the most frequent bigram, and consider that this bigram is really meant to be a single character that has been written out with two symbols.
One could then replace them all, and repeat the process for the next most frequent pair.
One could do this repeatedly in the hope to 'normalise' the entire distribution of pairs.

While I don't think he showed the details, he did say that it did not lead anywhere.

If I may just refer once more to You are not allowed to view links. Register or Login to view. , then Figure 10 shows (I think) quite clearly how fundamental the difference is. Every little square is a bigram. For the plain text on the left, there are large areas where almost all combinations seem possible, while for the Voynich text on the right (FSG transcription) combinations are quite restricted all over the area.
(09-03-2019, 04:24 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Jim Reeds did an experiment along these lines of thought in the 1990's.
One could take the most frequent bigram, and consider that this bigram is really meant to be a single character that has been written out with two symbols.
One could then replace them all, and repeat the process for the next most frequent pair.
One could do this repeatedly in the hope to 'normalise' the entire distribution of pairs.

While I don't think he showed the details, he did say that it did not lead anywhere.

While I understand the point of your whole post, I struggle to translate the entropy value to how the glyphs work. I respect the number are correct but I guess I don't really have a "feel" for them. This is my failing, not the concept of entropy.

I do wonder what the successive stages of Jim Reeds' experiment would look like in terms of entropy. Or even something simpler focussing on the most restrictive glyph. If you just deleted every example of q at the start of a word followed by o. What would the outcome be?
Wait, surely that procedure described by Rene, by combining glyphs, would REDUCE entropy (degree of disorder)?
Or am I missing something?
Pages: 1 2 3