Hi Koen,
the method I discussed for the Latin Aelredus file does not work for English. In general, English is rather far from Voynichese, according to these two measures. The closest sample in your corpus appears to be Eng_Polandrellovepoems, but that file only has 949 words that are long enough to be transformed by my regex, while the Latin file has 2863 (in both cases, I only considered the first 10k words). The English transformed file (marked 'q' in the plot) might be largely understandable, since less then 10% of the words are altered.
[
attachment=5075]
In order to transform the English file into something comparable with Voynichese, I used two different approaches.
Method 1
A more aggressive version of what I did for Latin. I used this regex:
sed -e 's/\([a-z]\)[a-z][a-z][a-z][a-z][a-z]*\([a-z]\)/\1z\2/'
Which "compresses" all words 6 or more character long, only preserving the first and last letter. This results in the point marked 'z', which is shifted to the left, but also has a low biword entropy. I then applied a second step where I prefixed character 'x' to 20% of the words: this increases entropy and the result 'z1' is comparable with the Voynichese samples.
Original:
wher is this prynce that conquered his right within ingland master of all his foon and after fraunce be
q:
wher is this prynce that cqed his right within ingland master of all his foon and after fraunce be
z:
wher is this pze that czd his right wzn izd mzr of all his foon and after fze be
z1:
wher is this xpze that czd his right xwzn izd mzr of all xhis foon and after fze xbe
Method 2
This time I did not alter words, so that H1 is unchanged. I modified word-order, by taking groups of 4 consecutive words and rearranging them randomly. This obviously increases biword entropy.
sorted:
wher is prynce this right his that conquered of within master ingland foon his all and after very be fraunce
Both methods result in unreadable text. Personally, I find something like Method 1 much more likely. All medieval manuscripts contain inconsistencies that likely result in higher entropy with respect to a modern edition: this could result in something similar to the second step. The first step, resulting in the production of homographs, is more difficult to explain. Bowern and Lindemann pointed out that "systematic conflation of phonemic distinctions, such as conflating all vowels to a single character" results in lower character entropy, similar to Voynichese; this should also result in an increased number of homographs (e.g. by collapsing t,th,d and all vowels, "time", "theme", "tome", "demo", "dime", "dome" could all be written "twmw"). I guess that transformations like this could preserve readability, but this clearly depends on how many different sounds are conflated. Anyway, I agree that readability is important and that the relevance of these results is quite limited.
The method discussed by RenegadeHealer is impressive in that it shows a complex transformation that can be solved effortlessly by the reader. Unluckily it results in higher entropy both at character and word entropy, so it does not seem to be similar to what happens with the VMS.
(29-12-2020, 12:55 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Does it still follow Zipf’s Law, for example, or does such wholesale substitution of the middle of words cause a loss of that quality?
Hi Michelle, since in the case of Latin I only altered long words, I expect that Zipf's Law is not significantly affected. But there are other Voynichese features that cannot be explained by the method I applied here (e.g. character entropy or line effects).
(29-12-2020, 12:55 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Does this kick “q” up to a very high level in the character stats or is the impact something that would be otherwise hidden?
I expect that the the frequency of 'q' is considerably bumped. This is one of many reasons that make clear that this is not how Voynichese was written. These experiments are just an attempt to understand more of what word entropy values mean.
(29-12-2020, 12:55 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.I know you are very interested in the reduplication stats and l would guess that this manipulation didn’t cause this - maybe do we need some sort of re-ordering of the words to get the repetition?
My opinion is that reduplication originates in the underlying text. If this is true, either the underlying language is not European or the text is highly anomalous in this respect. But here I am speculating. Anyway, a systematic re-ordering that generates repetitions would likely result in a lower biword entropy and Voynichese shows the opposite, when compared with ordinary European texts.
(29-12-2020, 12:55 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.I hope you don’t mind these questions and thanks again for sharing these results!
Questions are always welcome! Whenever I try something I seem to end up with more questions than answers: this is what makes this hobby so addictive! Of course, in order to put together something like a plausible theory, it is necessary to take everything into account. But I think that also exploring single features in isolation can be instructive.