Koen:
I (still) haven't read the paper attentively, but my first impression has been that they do not argue anything definite in this respect. They are fine with the precondition that the text is anagrammed, and that this anagramming appears random from the viewpoint of the observer (i.e. no rule is observed). Whether there is a rule actually or whether there is not - what answer they assume to this question, I cannot say from only having glanced at the paper briefly. Maybe they assume nothing.
Anyway, I recognize two possibilities for anagramming not to introduce a one way encryption:
a) there is a deterministic procedure
b) there is no deterministic procedure, but encrypted words are anagrammed into vords on a 1-to-1 basis. E.g., a word "milk" is continuosly represented as "kmil" throughout the entire text, and the word "apple" - as "palep". A person accustomed to "milk "as "klim" and to "apple" as "palep" will have no difficulties in decryption, although there is no signle procedure which will reverse "klim" to "milk" and "apple" to "palep" both at the same time. Essentially this is anagrammic reference mapping of the whole vocabulary.
Option a) will have no problems with abjad, option b) will most probably have.
Not to say that the solution is as simple as anagrammed abjad - I don't think so. There are indicators that something more complex is going on there - mostly the combined characters (benched gallows in the first place, but not exclusively them), and also the gallows coverage. Combined characters suggest some kind of superposition (like, you know, in "XIV" "X" combined with "IV" is mapped to 14, while "I" combined with "V" is mapped to 4), while gallows coverage suggest some kind of operator applied to the glyphs covered. Both ideas reach beyond simple substitution and anagramming, whether abjad or not.
Sorry for intrusion, but deterministic algorithm doesn't automatically mean it's reversable. For example sorting is not reversable at all, but number multiplication is not reversable in real time. MD5 digest calculation is example of very simple deterministic procedure which is also irreversable. Of course it's doubtful that something similar was used to rearrange letters
Interesting discussion. So then what I would wonder next is, is there an anagramming procedure which can account for Voynichese statistics on the one hand and be reversible on the other? Because to get from Hebrew to Voynichese through anagramming alone, you'd probably need some degree of sorting, which means some degree of information loss, right?
(31-01-2018, 11:24 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Interesting discussion. So then what I would wonder next is, is there an anagramming procedure which can account for Voynichese statistics on the one hand and be reversible on the other? Because to get from Hebrew to Voynichese through anagramming alone, you'd probably need some degree of sorting, which means some degree of information loss, right?
They compared the text with
modern languages including
modern Hebrew. If there are significant differences, the match will be spurious. You have to match against languages known at the time the text was written. As they didn't do this, I don't think that the paper has anything useful to say about the Voynich Manuscript.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
On the more general issue of anagramming, there's no information loss if you reorder letters systematically, e.g. with "thing" becoming iht gn" and "night" becoming "gin th", but if you reorder letters alphabetically, "thing" and "night" both become "hignt" and you have lost information. Even then, there's still probably enough redundancy in the text to extract most of the original meaning computationally (but it would be slow and laborious to do this by hand) e.g. if your last word decodes as "right" and it's followed by "hignt", that should
probably decode as "thing" rather than "night".
I think the most useful aspect of this research is as a "filter" for searching, sorting, selecting, and identifying text in natural languages.
Just as one possible example, imagine a Google Books project in which a large number of books in different languages are scanned. OCR software is used to create PDF transliterations. The OCR software is "tuned" to recognize certain fonts and certain quirks that are specific to each language in order to get the best possible result.
Currently a human tells the software the font (e.g., German black-letter), the language, etc. If the technology developed by these researchers were incorporated into OCR software, then language recognition might become part of the basic OCR package in a more complete and practical way than present versions, and then certain aspects of creating the transliteration could be fine-tuned by software rather than by human intervention.
The software they have developed is not really specific to cryptology and doesn't really come at unknown texts in the most efficient ways (and there are many ways, depending on the structure of the text and how it was encrypted) and I doubt that that was their original impetus or primary goal. It's tantalizing, from a research point of view, to try it out on the Voynich to see what might happen, I don't blame them for trying it, but we can see from the results and also from the various critiques of their results that cryptology-aware researchers perceive the shortcomings of software developed for another purpose when applied to this more specialized problem.
---------------
I wouldn't be surprised, however, if this now becomes a race. Programmers are gamers (at least a high proportion of them are), and gamers are competitive. The media created a frenzy over this and there may be many new software developers who jump in the game to try their hand, especially considering the code was released and can now be readily modified.
Quote:So then what I would wonder next is, is there an anagramming procedure which can account for Voynichese statistics on the one hand and be reversible on the other?
That's exactly the question that the paper induces to ask. For sure there would be no complete answer, because the whole approach with anagramming leaves out of scope anything beyond the level of a single vord. But nevertheless it's a direction worth thinking in, I suppose.
Regarding pure anagramming, where I would look at would be supposed sets of homogenous objects that are labeled. For example, "Voynich stars" of f68r1, f68r2 or "Voynich moons" of f67r2. Twelve moons in f67r2 suggest the idea of months, but the problem is that the respective labels would not correspond to month names in other languages in terms of similarity to each other (e.g. the sequence of suffixes). Maybe the situation is better with anagramming? One could look here - with success not guaranteed, not in the least part because we have no guarantee that the labels ever stand for month names - the author may have been particularly careful in not betraying his code through obvious labels, and the plant mnemonics joining together botanical and pharma sections are somewhat suggesting that.
(31-01-2018, 07:45 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Rene: [...] What I mean is though, that Voynichese being the result of a one-way cipher is not compatible with linguistic solutions like we see proposed here.
Well, one has to be careful. I have not seen any acceptable linguistic solution, so these should not be used to judge whether a one-way cipher is unlikely.
At the same time, the authors of this paper propose that anagramming has been used. This is done at the word level.
While they can resolve many or most words using their computer algorithms, this would not have been possible for a human, within a reasonable amount of time.
Farmerjohn is right about the irreversibility. Even anagramming according to specified rules may be irreversible, such as sorting the characters alphabetically.
Furthermore, irreversible is not an absolute thing, as already indicated about w.r.t computer vs. human.
Finally, the anagramming at the word level is a special case. If one uses (a computer and) a dictionary, it no longer matters if the anagramming was done arbitrarily or according to some rule. The only problem is if several plain text words use the same set of characters ( 'greens' and 'genres' in their example).
*IF* the Voynich MS was anagrammed in this manner, at the word level, it would certainly have been according to some rule. Otherwise, the Zipf law would no longer be observed.
(01-02-2018, 12:55 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.I wouldn't be surprised, however, if this now becomes a race.
"Let's get ready to rumble!"

This is a better effort than any of the other articles I've seen so far:
You are not allowed to view links.
Register or
Login to view.
The links on that page seem to be shuffled around quite a bit, from time to time.
I guess you meant this one:
You are not allowed to view links.
Register or
Login to view.
which is indeed the best I have seen of all.
However, it is also not perfect. They overlook two things:
1) the statement:
Quote:So when the pair say that Hebrew was the highest scoring match for the manuscript without rating the likelihood, this is a bit of a meaningless boast. “Someone has to have the highest score,” says Argamon.
is doing insufficient justice to the close match given in Figure 4. At first I also considered that it could just be the end of the probability curve, but it is justified to call it an outlier and it is not understood why this happens. This has to be investigated further.
2) After settling on Hebrew as the source language, they do the remaining 'decoding' based on old Hebrew (Table 4), not modern Hebrew.
There are several other problems with the paper that have not been highlighted by anyone.
The bottom line remains, of course, that the conclusion: anagrammed Hebrew, or any Hebrew (old or new) is not valid.