The Voynich Ninja
Voynich anagramming - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Voynich anagramming (/thread-3262.html)

Pages: 1 2 3 4


RE: Voynich anagramming - bi3mw - 29-06-2020

(29-06-2020, 10:25 AM)doranchak Wrote: You are not allowed to view links. Register or Login to view.The file shows the anagram count, sorted letters shared by the anagrams, and the list of vords.
Again for comparison the sorted "all.txt":


RE: Voynich anagramming - Koen G - 29-06-2020

I wonder how many anagrams are left in Voynichese if we eliminate very rare attestations? 

Also, this won't affect the results much, but in a few cases we are looking at an anagram of EVA and not Voynichese, specifically when benched gallows are involved.


RE: Voynich anagramming - doranchak - 29-06-2020

(29-06-2020, 10:59 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I wonder how many anagrams are left in Voynichese if we eliminate very rare attestations?
Is there a vord dictionary with occurrence frequencies?
If so, maybe we could score anagram groups based on how common the member words are.


RE: Voynich anagramming - bi3mw - 29-06-2020

(29-06-2020, 11:06 AM)doranchak Wrote: You are not allowed to view links. Register or Login to view.Is there a vord dictionary with occurrence frequencies?
You mean something like this ?


RE: Voynich anagramming - doranchak - 29-06-2020

(29-06-2020, 11:16 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.You mean something like this ?

Perfect - thanks!

Results are attached.
Sample for Voynichese:

%coverage wordlength sortedletters anagrams
1.459642 2 lo [lo, ol]
1.3804572 4 cdehy [dchey, chedy, cheyd, yched]
1.1640184 2 dehsy [shedy, dshey]
1.0584384 3 chlo [olch, chol, lcho]
0.992451 2 or [or, ro]
0.9317426 2 ar [ar, ra]
0.9106266 2 cehy [echy, chey]
0.8156047 4 deekoqy [qekeody, qoekedy, qokeedy, qkeeody]
0.7496173 2 ehsy [shey, yshe]
0.7232223 2 dy [dy, yd]

Sample interpretation:
  • 1.3804572:  [font=Tahoma, Verdana, Arial, sans-serif]1.3804572% of the entire Voynich text is represented by these words that are anagrams of each other.[/font]
  • 4:  This group has 4 anagrams
  • cdehy:  The sorted letters shared by each word.
  • [dchey, chedy, cheyd, yched]:  The anagrams
Sample for English:

%coverage wordlength sortedletters anagrams
3.1306357 3 adn [dan, and, dna]
1.0553013 2 no [no, on]
1.0246451 2 asw [was, saw]
0.8078868 2 as [as, sa]
0.5659407 2 fmor [form, from]
0.5438335 3 aer [are, era, ear]
0.48521057 2 hist [hits, this]
0.4244203 2 an [na, an]
0.39555076 2 ahs [ash, has]
0.39310804 2 not [not, ton]

Seems like Voynichese has stronger "anagrammability" than English.
Would be interesting to compare this with other languages.
If other languages are highly anagrammable, then why?


RE: Voynich anagramming - Torsten - 29-06-2020

I have added the word counts for the most frequent examples. (Note: In two cases only a part of the word is readable. The words are ra* and olch***. They occur on folio 41r and 101v2.)

%coverage wordlength sortedletters [anagrams] (counts)
1.459642 2 lo [ol, lo]  (537, 15)
1.3804572 4 cdehy [chedy, dchey, yched, cheyd] (501, 18, 3, 1)
1.1640184 2 dehsy [shedy, dshey] (426, 14)
1.0584384 2 chlo [chol, lcho, olch***] (396, 3, 0)
0.992451 2 or [or, ro] (363, 10)
- - ar [ar, ra*] (350, 0)
0.9106266 2 cehy [chey, echy ] (344, 1)
0.8156047 4 deekoqy [qokeedy, qoekedy, qekeody, qkeeody] (305, 2, 1, 1)
0.7496173 2 ehsy [shey, yshe] (283, 1)
0.7232223 2 dy [dy, yd] (270, 3)
0.7232223 2 dekoqy [qokedy, qkeody] (272, 2)
0.67571133 2 adl [dal, ald] (253, 3)
0.58332896 2 chor [chor, rcho] (219, 1)
0.56749195 2 aiikno [okaiin, koaiin] (212, 3)
0.498865 2 hlos [shol, lsho] (186, 2)
0.49358603 3 eekoy [okeey, ykeeo, oekey] (177, 7, 3)


RE: Voynich anagramming - Koen G - 29-06-2020

Thanks, Torsten. There are a lot of very rare occurrences in there, so I wonder whether many of these cases aren't the result of some form of noise rather than structural anagramming possibilities.


RE: Voynich anagramming - ReneZ - 29-06-2020

This is all quite interesting!

With respect to Eva, let me put the question / comment in perspective.
Obviously, cht and tch are anagrams of each other.
Now should cTh not also be seen as an anagram of both?
A question that can't be answered but I tend to think that it makes sense.
So, in that respect, Eva is superior to all other alphabets  Wink (except of course Frogguy)

On the real question, anagrams are determined based on word boundaries, which are notoriously uncertain.
The results shown seem to be based on one of the Takeshi transliterations.
I wonder how different they would be for the other ones, with or without uncertain spaces.
Undoubtedly, they will still be high, but there could be considerable variation.

I have all the files available, but not the tool to check for anagrams...


RE: Voynich anagramming - bi3mw - 29-06-2020

(29-06-2020, 12:49 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I have all the files available, but not the tool to check for anagrams...

You can create a dictionary ( dictionary.txt ) and then use this AWK code ( a-z ).

Code:
#!/bin/bash
for x in {a..z}
do
    gawk -f anagram.awk dictionary.txt | grep ^${x} > ${x}.txt
    cat ${x}.txt >> all.txt
done
awk '{ print NF, $0 }' all.txt > all2.txt
sort all2.txt > all3.txt



RE: Voynich anagramming - Anton - 29-06-2020

Adding to what Rene said, some suggested anagrams would be just transcription errors (I found a couple from the Torsten's yesterday post). But it's obvious that there are many.

My initial thought was that anagramming is the product of scribe (for the given set of glyphs produced by an "algorithm" each scribe then sorts it in his own way). But with 15 anagrams out of the single set this is out of question.

What I could suggest as further steps in this direction:

a) observe frequency distribution of anagrams, i.e. for the given set of glyphs, is there a "dominating" anagram, and is this systematic across all sets

b) look what the vocabulary becomes if we reduce it by anagrams. That is, replace all anagrammed variants with the single word type (say, the sorted set).