19-05-2025, 03:55 AM
Hi All,
Based on vords coming before and after, grouping vords into vordgroups looks possible.
The past few weeks i have been programming in python to do some analysis and statistical informed guessing on grammar in voynechese. I am happy to announce to you these partial/preliminary results. There seem to be statistically significant groupings of vords that have either increased or decreased likelyhood to either precede or follow certain other groupings of vords.
What i am looking for
I would like to make some academic paper out of these results. That is why i am looking for the help of any academic voynich researcher that would like to collaborate. These results look statistically significant to my amateur eyes but I still have to do some p-value calculations, I think Chi Square would be the appropiate one for this.
assumptions
+ A and B are different languages.
+ each vord is matching a word in a real language
+ the real language has some form of positional grammar, ex like some prepositions come with a genetive case associated directly following the preposition
+ vords have only one meaning and every time a vord is used it means only that one meaning (statistics will still work even if this one isnt true)
method:
language A and language B are processed seperately
for each vord besides frequency also tally preceding and following vords, respecting paragraphs and ignoring line breaks
for all vords that appear at least 4 times, put them in little vordgroups up to around 5 vords each based on similarity in vords coming before and after them.
now score all vordgroups against eachother and merge the most similar ones, not looking at each individual vord transition but transitions from vordgroup to vordgroup
merge until desired amount of vordgroups left
score each of the most frequent vords against all groups to see if any would fit better in another group.
safeguards:
By just looking at the more frequent vords there is less wiggle room than when assigning unique vords to some group to increase the score.
Because the non frequent vords were counted in the total for the percentage calculation, this makes total transition frequency to each of the labeled groups lower.
cons/doubts/possible improvements:
The algoritm I built is made to find these patterns. It has not been tested on random noise or other language samples.
It takes a lot of time, the merging step takes aroung one hour on my poor old pc.
there may be some bugs in the searching algorithm, It is not very stable, the groups that come out are different every time. Probably some memory in python that gives a different order every time. Maybe it is an omen that the method may be flawed.
The output is very long, but with analysing over 27000 vords that is somewhat inevitable. A big part of the output is guessing for all the non-frequent vords in which group they would fit best.
It really feels great to be standing on the shoulders of giants and looking further than anyone before.
Thanks to the members of the voynich ninja and the maintainers of websites about voynich.
Maybe this work can help provide a break through for somebody else.
the results:
statistics about language A and language B are in the same file. first all A output, than all B output
line number chapter
1 Language A
13 initial groups
99 merging
380 vordgroup stats
847 transition tables
894 moving vords to other groups
1217 guessing but not adding to groups of other vords
4466 vordgroup stats
5024 transition tables
5079 Language B
5087 initial groups
5233 merging
5775 vordgroup stats
6149 transition tables
6197 moving vords to other groups
6850 guessing but not adding to groups of other vords
11434 vordgroup stats
11930 transition tables
You are not allowed to view links. Register or Login to view.
Based on vords coming before and after, grouping vords into vordgroups looks possible.
The past few weeks i have been programming in python to do some analysis and statistical informed guessing on grammar in voynechese. I am happy to announce to you these partial/preliminary results. There seem to be statistically significant groupings of vords that have either increased or decreased likelyhood to either precede or follow certain other groupings of vords.
What i am looking for
I would like to make some academic paper out of these results. That is why i am looking for the help of any academic voynich researcher that would like to collaborate. These results look statistically significant to my amateur eyes but I still have to do some p-value calculations, I think Chi Square would be the appropiate one for this.
assumptions
+ A and B are different languages.
+ each vord is matching a word in a real language
+ the real language has some form of positional grammar, ex like some prepositions come with a genetive case associated directly following the preposition
+ vords have only one meaning and every time a vord is used it means only that one meaning (statistics will still work even if this one isnt true)
method:
language A and language B are processed seperately
for each vord besides frequency also tally preceding and following vords, respecting paragraphs and ignoring line breaks
for all vords that appear at least 4 times, put them in little vordgroups up to around 5 vords each based on similarity in vords coming before and after them.
now score all vordgroups against eachother and merge the most similar ones, not looking at each individual vord transition but transitions from vordgroup to vordgroup
merge until desired amount of vordgroups left
score each of the most frequent vords against all groups to see if any would fit better in another group.
safeguards:
By just looking at the more frequent vords there is less wiggle room than when assigning unique vords to some group to increase the score.
Because the non frequent vords were counted in the total for the percentage calculation, this makes total transition frequency to each of the labeled groups lower.
cons/doubts/possible improvements:
The algoritm I built is made to find these patterns. It has not been tested on random noise or other language samples.
It takes a lot of time, the merging step takes aroung one hour on my poor old pc.
there may be some bugs in the searching algorithm, It is not very stable, the groups that come out are different every time. Probably some memory in python that gives a different order every time. Maybe it is an omen that the method may be flawed.
The output is very long, but with analysing over 27000 vords that is somewhat inevitable. A big part of the output is guessing for all the non-frequent vords in which group they would fit best.
It really feels great to be standing on the shoulders of giants and looking further than anyone before.
Thanks to the members of the voynich ninja and the maintainers of websites about voynich.
Maybe this work can help provide a break through for somebody else.
the results:
statistics about language A and language B are in the same file. first all A output, than all B output
line number chapter
1 Language A
13 initial groups
99 merging
380 vordgroup stats
847 transition tables
894 moving vords to other groups
1217 guessing but not adding to groups of other vords
4466 vordgroup stats
5024 transition tables
5079 Language B
5087 initial groups
5233 merging
5775 vordgroup stats
6149 transition tables
6197 moving vords to other groups
6850 guessing but not adding to groups of other vords
11434 vordgroup stats
11930 transition tables
You are not allowed to view links. Register or Login to view.
Code:
vordgroup 56: cheody 56 446
members: ['cheody', 'opar', 'shek', 'sheody', 'shody', 'she', 'opchedy', 'psheody', 'opchdy', 'cheky', 'tchy', 'chekaiin', 'ytedy', 'olkchedy', 'ytchey', 'ytody', 'cholky', 'chcphy', 'lkeey', 'ycheedy', 'shor', 'olky', 'sshey', 'shckhey', 'keol', 'teeody', 'shaiin', 'lkeeedy', 'ycheeo', 'cheoty', 'shekeey', 'chotal']
num members: 32
vord count: 446
groupname: cheody
lesser likely following : chedy 5.16% instead of 16.02%
more likely following : daiin 6.95% instead of 3.37%
lesser likely followed by : chedy 8.07% instead of 16.02%
more likely followed by : qokain 21.08% instead of 15.88%
coming from group <groupname> followed by <groupname> which has a relative size of <x>
---------------------------------------------------------
< other> -> 31.39% <cheody> 37.44% -> < other>
< chedy> -> 5.16% <cheody> 8.07% -> < chedy> rel size: 16.02%
< ol> -> 6.50% <cheody> 6.50% -> < ol> rel size: 12.64%
< aiin> -> 11.66% <cheody> 3.14% -> < aiin> rel size: 10.69%
< daiin> -> 6.95% <cheody> 3.59% -> < daiin> rel size: 3.37%
< qokain> -> 14.13% <cheody> 21.08% -> < qokain> rel size: 15.88%
< dar> -> 2.02% <cheody> 2.69% -> < dar> rel size: 2.14%
< okaiin> -> 0.90% <cheody> 0.90% -> < okaiin> rel size: 0.72%
< okain> -> 0.90% <cheody> 0.45% -> < okain> rel size: 0.78%
< okeey> -> 0.67% <cheody> 0.22% -> < okeey> rel size: 0.52%
< otar> -> 0.67% <cheody> 0.22% -> < otar> rel size: 0.58%
< otaiin> -> 1.57% <cheody> 3.36% -> < otaiin> rel size: 1.28%
< o> -> 6.73% <cheody> 3.36% -> < o> rel size: 4.40%
< oty> -> 2.02% <cheody> 1.35% -> < oty> rel size: 1.59%
< shol> -> 1.35% <cheody> 1.35% -> < shol> rel size: 0.81%
< am> -> 2.02% <cheody> 2.91% -> < am> rel size: 1.92%
< cheody> -> 2.24% <cheody> 2.24% -> < cheody> rel size: 1.90%
< chedaiin> -> 0.00% <cheody> 0.00% -> < chedaiin> rel size: 0.36%
< yteedy> -> 2.47% <cheody> 0.45% -> < yteedy> rel size: 0.53%
=========================================================