The Voynich Ninja

Full Version: Old Polish (geoffreycaveney's theory)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
Entirely correct, Rene. For me the greatest value in the experiment is that it negates the argument that Voynichese words don't contain enough information to be real language. They do, but the information is just watered down. 

Additionally, as I mentioned before, the original post contained two mistakes which helped increase entropy:

- the entire VM was used, not just one section 
- a number of spaces were accidentally removed 

At this moment I am working on a revision which should set everything straight. I will look at HA, HB, Q13 and Q20 separately. I just finished the optimal list for Herbal A and it looks completely different.
(17-10-2020, 07:54 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Well, the original objection from Marco still stands, namely that the selection of the VCI is based on just one of a great number of different possible verbose ciphers.
Let's say that this number is 100. Koen and Marco perhaps tried about that number of different combinations in the 'entropy hunting 3' work, but there are many more that they did not try. So let's stay conservative and use 100.
The next step is mapping the VCI to plain text characters. For an alphabet size of (say) 25, it is well known that there are 25! (factorial) different ways of doing that, which is of the order 10**25. However, that is clearly pessimistic. Let's again be conservative and assume that we know which are the 5 vowels and the 20 consonants. We can swap the vowels freely (5!) and for each consonant we would have 3 options. This gives 5! times 3**20 possibilities, which is close to one million, and there is no particular reason to pick one or the other except for a 'feeling' or 'intuition'. 
The probability of having it right is already 1 in 10**8.
Next, the words need to be adjusted to make them 'real old Polish words'. Again, being conservative, let's say that only every second word has two different options. Now there are 38 words, so that gives 2**19 different versions, which is half a million.
In the end, the possibility of having the right Polish text is conservatively estimated as 1 in 5 * 10**13.
How should this number be interpreted?
It means that you could have, with equal probability, followed 5 * 10**13 different procedures to arrive at an old Polish plain text without ever knowing, along the way, if it was the right one.

Of course there are many different possible decipherings of a verbose cipher or of any cipher. One could say the same thing of Linear B or of any previously unknown writing system or cipher. One could even say the same thing about the English text in Latin letters that we are writing right now. Technically, if we didn't know the script in advance, each letter could represent any sound. But that doesn't mean that we can therefore say that "the probability that the way we are reading these words as English is actually the 'right English text' is only 1 in 26! or 1 in 10**8". 

The critical issue is, having chosen one possible deciphering of the script as one's hypothesis, does it work or not? In other words, how closely do the Slavic VCI readings resemble the Old Polish / Silesian words? And how coherent is the Old Polish / Silesian text that is thus produced? Those are the key issues that I see in evaluating the interpretation.
Linear B was not solved by arbitrarily picking one out of a trillion of possible mappings, and claiming that the result is meaningful, while in reality it is not.
(17-10-2020, 10:05 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Entirely correct, Rene. For me the greatest value in the experiment is that it negates the argument that Voynichese words don't contain enough information to be real language. They do, but the information is just watered down. 
Additionally, as I mentioned before, the original post contained two mistakes which helped increase entropy:
- the entire VM was used, not just one section 
- a number of spaces were accidentally removed 
At this moment I am working on a revision which should set everything straight. I will look at HA, HB, Q13 and Q20 separately. I just finished the optimal list for Herbal A and it looks completely different.

Koen, I see a basic problem with this new idea of producing separate verbose ciphers to maximize entropy in each of four different sections of the ms text separately: If you are going to claim as a result that the "best" verbose cipher is completely different for each separate section, then this would logically lead to a hypothesis that each section is essentially composed with completely different writing systems, that just happen to use the same set of glyphs. This strikes me as quite implausible, to say the least. Does anyone really believe it is plausible that in the sequence "[okeody]", in one section the meaningful segments are ok+e+od+y, but in another section it is segmented as o+k+eo+dy, and possibly in yet another section it is rather segmented as o+ke+o+d+y? If anyone believes that this can actually be the case, then let's just be honest and admit that there's no hope of deciphering the text of the manuscript, if the script actually represents so many essentially entirely different writing systems.

I think the original decision to seek the verbose cipher with the best possible entropy and conditional entropy statistics for the entire manuscript text was correct.
I have a very basic comment to make in regard to the recent posts in this thread concerning the evaluation of this hypothesis and its probability of being correct:

It is simply not possible to properly evaluate a theory about a text written in an unknown script representing a Slavic language, if one does not have any specific linguistic knowledge of Slavic languages and Slavic linguistics. Without such knowledge, one cannot possibly be qualified to evaluate such a theory.
There are hundreds of wildly different Voynich theories...and yet the response when they are challenged is always the same.  

 I don't know why it always disappoints me.
(17-10-2020, 04:18 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.It is simply not possible to properly evaluate a theory about a text written in an unknown script representing a Slavic language, if one does not have any specific linguistic knowledge of Slavic languages and Slavic linguistics. Without such knowledge, one cannot possibly be qualified to evaluate such a theory.

Any theory that tries to explain the Voynich MS text as a case of language X has to meet the known properties of the Voynich MS text and the proposed language X.

Both.

It can be rejected without knowledge of language X, if it does not meet the properties of the Voynich MS.

Furthermore, the theory needs to be acceptable in terms of logic, and it has to be statistically sound.
Also these two points are independent of the proposed language.

Of course, if you find anyone who has knowledge of Slavic languages, and who thinks that your plain text is sensible, then your proposed solution could deserve more serious attention.

The famous proposed solution by Hauer and Kondrak failed in all these counts (i.e. not even considering the Hebrew language). Of course, it was also rejected by people really knowing Hebrew.
(17-10-2020, 05:55 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Any theory that tries to explain the Voynich MS text as a case of language X has to meet the known properties of the Voynich MS text and the proposed language X.
Both.
It can be rejected without knowledge of language X, if it does not meet the properties of the Voynich MS.
Furthermore, the theory needs to be acceptable in terms of logic, and it has to be statistically sound.
Also these two points are independent of the proposed language.
Of course, if you find anyone who has knowledge of Slavic languages, and who thinks that your plain text is sensible, then your proposed solution could deserve more serious attention.
The famous proposed solution by Hauer and Kondrak failed in all these counts (i.e. not even considering the Hebrew language). Of course, it was also rejected by people really knowing Hebrew.

I agree that the support of other Slavic scholars is critical in order to deserve and attract more serious attention. No argument there. 

I also agree that of course any theory has to meet the known properties of the Voynich MS text. But I do not see how your earlier points about the large number of total possible verbose cipher analyses of the script "refute" my theory or show it to be in contradiction to any known properties of the Voynich MS text in any way. My theory does not claim that it is the only possible verbose cipher analysis of the script and text. I do think it is interesting that I was able to develop my theory based on an independent statistical analysis that showed one particular possible verbose cipher analysis produced significantly higher conditional entropy statistics. But that is just an interesting observation, that is all. I do not claim at all that Koen's verbose cipher analysis and entropy statistics are concrete evidence in favor of my theory. It is better to have high entropy than low entropy, so it is a good sign, but in itself the improved entropy statistics of course do not provide evidence that the resulting interpretation represents a Slavic language or any particular language. 

But by the same token, neither do any of these statistics provide any evidence that my theory does not meet the properties of the Voynich MS either. In fact, the step of transforming the MS text from the EVA transcription into my Slavic VCI alphabet is by far the least debatable or arguable part of my method of interpretation that I have presented here. I am completely, entirely transparent about how I perform the EVA->VCI transformation. No actual data in the MS text is lost or changed as a result of this process. It is repeatable and reversible. Any statistical properties of the Voynich MS text that appear in the EVA transcription, will also appear in my VCI alphabet interpretation, with the suitable adjustments for a verbose cipher. 

(For example, the total number of occurrences of the EVA sequence [tar] will be identical to the combined total number of occurrences of the VCI sequences <pal>, <bal>, and <mal>, since VCI <p>, <b>, and <m> constitute the representations of all combinations that end in EVA [t]: [t], [ot], and [qot] respectively. Such statistics can be further broken down: EVA [qotar] = VCI <mal> exactly, with no ambiguity or possible alternate interpretations or representations at all. If the EVA transcription reads [qotar], my VCI interpretation must read it as <mal>, and if <mal> appears in VCI, it must represent [qotar] in EVA. I have given myself no degree of freedom whatsoever in performing this step. Then to analyze the statistics of VCI <bal>, it must represent EVA [otar] without a preceding [q], so the number of VCI <bal> = the number of EVA [otar] - [qotar]. Likewise, VCI <pal> must represent EVA [tar] without a preceding [o], so the number of VCI <pal> = the number of EVA [tar] - [otar]. With such entirely rule-based adjustments made for the entire script, all statistical properties of the Voynich MS text that appear in EVA, will also appear in VCI.)

The more debatable and arguable part of my method, by far, is the step where I interpret the Slavic VCI sequences as Old Polish / Silesian words. Here is where a certain small measure of ambiguity must be introduced into the interpretation, which I argue is in line with the measure of ambiguity and inconsistency that existed in Old Polish spelling in the medieval period as it was written in the Latin alphabet also. My very first post in this thread emphasized this point and cited substantial evidence of such Old Polish spelling practices. So yes, my interpretation can for example read the single VCI character <c> as Polish "c", "ć", "cz", "k", or "ch" -- but this is similar to the type of ambiguity that existed in actual attested medieval Old Polish spelling! Also, according to my interpretation, not every Old Polish / Silesian vowel is written in the VCI representation. I think this is reasonable as a medieval scribal abbreviated spelling practice, but it does introduce another debatable and arguable aspect of my method. 

For example, I think Ruby Novacna approached the discussion in the right way by analyzing and raising questions about my interpretation of VCI <pdzo> as Polish "pizdą", to which I responded with my own analysis and arguments. In my view, this is the process by which my theory and interpretation can be fairly evaluated.
(18-10-2020, 04:46 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.But I do not see how your earlier points about the large number of total possible verbose cipher analyses of the script "refute" my theory or show it to be in contradiction to any known properties of the Voynich MS text in any way

It only shows that the probability that your solution is the right one is astronomically low.
(18-10-2020, 05:29 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(18-10-2020, 04:46 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.But I do not see how your earlier points about the large number of total possible verbose cipher analyses of the script "refute" my theory or show it to be in contradiction to any known properties of the Voynich MS text in any way
It only shows that the probability that your solution is the right one is astronomically low.

But I did not merely choose my particular interpretation of the script at random from among all possible interpretations. Let us even set aside the possible statistical meaningfulness of Koen's particular verbose cipher analysis as opposed to all other possible such verbose cipher analyses. On its own merits alone, my VCI interpretation at least has an inherent internal logic that is far from random. No, it is not the only possible such logical system, and no, this alone does not mean it is necessarily correct. But its inherent internal logic does demonstrate that it cannot be treated as simply one possible system chosen entirely at random from among all imaginable existing possibilities. Allow me to summarize the logical features of my Slavic VCI interpretation of the script of the Voynich MS text:

1. EVA [o+X] = voiced counterpart of voiceless EVA [X]

2. EVA [cXh] ligature / [X+ch] = palatalized counterpart of EVA [X]

3. EVA [qo+X] = nasal consonant at same place of articulation as EVA [X] and [o+X]

It is not such a simple task to develop an interpretation of the script that incorporates all these systematic features of a typical consonant phoneme inventory of a language in such a logical way. It is far from being merely one random possible interpretation among the countless totality of all possible interpretations. How many such interpretations can one actually develop that treat the voiced/voiceless distinction, the plain/palatalized distinction, and the nasal phonemes in such a logical way? It is not so simple to do so at all, much less to do so in countless multitudes of different ways.
Pages: 1 2 3 4 5 6