The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10

(04-08-2020, 06:07 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Petr is a common name in many slavic countries. There was a Voynich researcher in Holland some years ago with that name.
Thai has many words ending in -tr , though neither of the two consonants is pronounced (at most a half-heard t).

Hi Rene,
Bakker is speaking of sounds. Hence, I don't think the Thai examples apply, while "Petr" and Anton's Anglo-Russian words probably do.
Also, since he his referring to "most languages", he appears to be aware that exceptions exist. He could have picked a better example, but I think what he is saying makes sense.

Bakker Wrote:Another possibility is to look at the distribution of sounds within a word, assuming that each letter sign in the text represents a sound ... There are letters that only appear in certain positions, and it is not uncommon for languages to have sounds that do not appear in all positions in the word — in most languages, for example, no word can end in -tr.

Hi Marco,

no problem there.
However, also in terms of sounds it is dangerous to make such generalisation.
In several slavionic languages there are sound combinations that most people would find difficult if not impossible to pronounce, but the kids in those countries can do it from an early age. These languages cannot be excluded in principle as a source language.

Regarding the "-tr" words in Russian, there are more. Litre, metre, spectre, register... all these words end in "-tr". I don't think they are Anglo-Russian, because massive borrowing from English began not so early in historical perspective, perhaps they rather were borrowed from German in transit from Latin or Greek. E.g. for German Liter, Meter, Burgermeister, in Russian you have Litr, Metr, Burgomistr, that is the "e" is left out.

There are also words ending with "-dr". Like Kedr (cedar), Kadr (frame), Sidr (cider) or the name Aleksandr (Alexander).

Petr and Aleksandr are from Greek, of course.

I guess there were some words ending in "-tr/-dr" in old Norse, like Baldr (god of Spring) or Gratr (don't know what this means but there's the Helrunar's album called that Smile

)

(03-08-2020, 02:30 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Even though his conclusion is a bit "self explaining" ( "The mere fact that it has not been decoded, means that it is not decodable"), I think the article is worth reading as a summary.

This sums up the article, and my thoughts on it, pretty well. Thank you for that, bi3mw. I'd gladly read a whole book by Peter Bakker, he has a writing style that's fun to read.

That said, I found his "absence of evidence is good enough evidence of absence for me" a bit of a letdown. For some reason I pictured a linguist explaining how the VMs could be at the same time both meaningful and probably undecipherable.

As JKP mentions, I think the real fruit of discovering how the VMs was made is greater understanding of how the human mind arranges and processes information. For example, if it is eventually revealed to be a code hiding a plaintext, I predict we will learn a great deal about the human mind's ability to arrange, associate, and hide information in symbols.

And then there are people who just like the VMs, and the process of trying to solve the mystery, for its own sake. It's definitely not appealing to everyone. For anyone highly practical, working on the VMs makes no sense, when comparing potential investments to potential payoff. The content (be there any) and premise are probably more prosaic than we often imagine. Even if the story of its creation and creator are interesting (as I predict they will be), the content itself will probably feel at least a bit anticlimactic.

I see gratr in German translates to burr or razor edge - the band is German.
I like this sort of music. Smile

German metal is outstanding!

(04-08-2020, 07:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Hi Marco,

no problem there.
However, also in terms of sounds it is dangerous to make such generalisation.
In several slavionic languages there are sound combinations that most people would find difficult if not impossible to pronounce, but the kids in those countries can do it from an early age. These languages cannot be excluded in principle as a source language.

As an example, try to pronounce 'trzy' (Polish for three) as a native English speaker. I still sound like a sneeze when I try Smile

TLDR at the bottom. I may have misunderstood something and/or made errors. I hope I did.

The discussion started by Bakker proceeded on the post I linked yesterday You are not allowed to view links. Register or Login to view.:
You are not allowed to view links. Register or Login to view. by Victor Mair. From You are not allowed to view links. Register or Login to view., I see that Mair is a Sinologist that has been at UPenn since 1979; the page also has a list of selected publications. Peter Bakker's publications are listed You are not allowed to view links. Register or Login to view..

Mair's post was also commented by You are not allowed to view links. Register or Login to view. (a computational linguist at Google):

Quote:The problem with injunctions such as Bakker's (or Victor's) that it is not worth trying to decipher something because it is probably a hoax, or there isn't enough text for Shannon unicity, or whatever, is that such appeals universally fail to have any stopping power on the enthusiast. And why should they? If the existence of dozens or hundreds of equally plausible previous "decipherments" of a corpus fail to dissuade them, why should other considerations?
Witness the hundreds of attempts to decipher the Phaistos Disk, or the Indus Valley corpus.

I am not sure I fully understand all that these linguists say, but I think they agree that the text is not decipherable.

Bakker points out that the script appears to be alphabetical: characters alternate like vowels and consonants, word length is comparable with alphabetically written languages. But if one tries to match the text with a written language, no matches are found. In addition to what Bakker says, two of the problems are the binomial distribution of word lengths and the high frequency of reduplication (in particular the fact that the most frequent word daiin often appears as daiin.daiin).

As You are not allowed to view links. Register or Login to view., one can then consider one of these ideas:

1. words are numbers: i.e. they are entries in a nomenclator, as in Rene's mod2 system
2. verbose ciphers

I understand that Bakker did not discuss these options because they are outside the scope of linguistic investigation. If one of these two hypotheses is true, the text as it appears tells us nothing at all about the underlying language, its morphology and grammar, with the possible exception that (if there are no null symbols) reduplication should still be a feature of the plaintext. In this perspective, the problem is purely cryptographic.

1. Number-based nomenclator

I think it is obvious that the "numbers" idea means "undecipherable". Almost 70% of the 8,000 ca Voynich word types are hapax legomena, they account for about 14% of word tokens. 20% of the tokens belong to types that appear less than 5 times. As is well known, also relatively frequent word types do not form long repeating sequences: many of the repeating sequences include reduplication. The You are not allowed to view links. Register or Login to view. for a moderately inflected language is more than 100,000.

The 14% of word tokens (hapax) are totally hopeless. But the main problem is the size of the search space ~8k^100k ~1.0E+390000 (for each language one wants to consider).

2. Verbose cipher

The verbose cipher idea looks better, but I think it also is unapproachable. A first implication of the idea is that labels are too short to be verbosely-encoded words (see Koen's comment You are not allowed to view links. Register or Login to view.). We should then conclude that labels are meaningless, but with a leap of faith we can still hope that there is something meaningful in the text. Anyway, since You are not allowed to view links. Register or Login to view., we should deduce that spaces are not significant, and treat the whole text as a uniform string. If we don't ignore spaces, words are too short for a verbose cipher.

We should set an upper bound on the length of the verbose-encoding of a plaintext character: without such a limit, the search space is infinite, but the smaller the threshold the higher the risk of missing the true decipherment. Bigrams could be a good starting point.

We have the beginning of our coded string:

fachysykalarataiinsholshorycthresykorsholdy

without a hint of how many words it encodes or in what language. We know that about 300 different bigrams appear in the manuscript and now we can try mapping each bigram to a character in the different alphabets of all the different languages and dialects we want to consider.

Again, the search space is huge, though immensely smaller than for the nomenclator: something like 300^26 ~ 2.5E+64 (for each target language, assuming they have an alphabet size similar to English). If one can process 1,000 options per nanosecond, searching this space will still take billions of billions of years.

We had to give up labels, so we are not in the happy position of You are not allowed to view links. Register or Login to view. and have small groups of words with a specific context. We have no words and no context. Mappings will have to be evaluated on the basis of the quality of the decoding of the main text, something difficult to automate and suspiciously similar to what Yokubinas, Cheshire and Ardic do.

With such a huge number of variables, one will likely get several locally good decipherments for various languages and it would be impossible to reliably judge without significant knowledge of XV Century forms of Georgian or Scottish Gaelic etc. I guess this is the problem that Sproat refers to when he mentions Shannon Unicity and the many different but equally plausible "decipherments".
Also, the Currier A/B drift means that the mapping will not be homogeneous: does a locally good decipherment break because it is not correct, or is it because the drift has an impact on the mapping?

This is much simplified: we don't know if 2 characters is a reasonable value for the verbose encoding of a single plaintext character, actually, a there are EVA trigrams that look like good candidates to encode a single character; different sequences could have different lengths, some plaintext characters could not be encoded at all (abjad), there could be null characters, null sequences, etc.
The idea of a verbously-encoded abjad could maybe allow us to preserve labels and word spaces (since the effects on word length could compensate). The search space would also be smaller (3.5E+49) but still many orders of magnitude too large and more ambiguity will be introduced by the absence of vowels.

TLDR I think that, if we must totally discard the phonetically-written-weird-language hypothesis, these three experts are right and the Voynich manuscript is either totally meaningless or undecipherable.

(04-08-2020, 06:23 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.TLDR I think that, if we must totally discard the phonetically-written-weird-language hypothesis, these three experts are right and the Voynich manuscript is either totally meaningless or undecipherable.

Their calculations are (from a summary reading) based on a brute force approach, for two specific cases. Brute force can usually be avoided by doing something more clever.
Many of the statistics we already have are making both specific cases rather unlikely.

The lack of repeating strings remains an interesting problem that has not yet been addressed very deeply. A verbose encryption with some degree of freedom would explain this anomaly easily, and explain why all previous attempts have failed. (Just one example).

(04-08-2020, 07:30 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(04-08-2020, 06:23 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.TLDR I think that, if we must totally discard the phonetically-written-weird-language hypothesis, these three experts are right and the Voynich manuscript is either totally meaningless or undecipherable.

Their calculations are (from a summary reading) based on a brute force approach, for two specific cases. Brute force can usually be avoided by doing something more clever.

Hi Rene,
the calculations I posted are my own. They are certainly based on brute force and do not assume anything clever. I just wanted to point out that the search space is huge: my feeling is that one is going to fall into local minima, maybe better than those found by Yokubinas, Cheshire, Ardic &C, but still inconclusive.

(04-08-2020, 07:30 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Many of the statistics we already have are making both specific cases rather unlikely.

I am not sure you mean what I hope you mean Smile

Please be more specific.

(04-08-2020, 07:30 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The lack of repeating strings remains an interesting problem that has not yet been addressed very deeply. A verbose encryption with some degree of freedom would explain this anomaly easily, and explain why all previous attempts have failed. (Just one example).

I don't know how a verbose cipher could be efficiently explored: there are so many possible options that I wouldn't know where to start. People at Google obviously are more qualified for something like that, but they don't sound too optimistic.

As I said, one of the prominent things to explain is why the most frequent word is reduplicated. I am still not aware of any possible explanation for that, but meaninglessness (at least of daiin, but then possibly of everything else?).

That's just it. Knowing where to start. Start where Stolfi's "Start Here" markers say to start. Except there are too many of them.

Take the best - the outer ring of White Aries.

Pages: 1 2 3 4 5 6 7 8 9 10

MarcoP

ReneZ

Anton

RenegadeHealer

DONJCH

Pepper

MarcoP

ReneZ

MarcoP

R. Sale