The Voynich Ninja

Full Version: Various Graphs and Analyses
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
(01-12-2025, 06:57 PM)srjskam Wrote: You are not allowed to view links. Register or Login to view.I had done some comparisons with natural languages.

Unfortunately many people have already tried to find some direct comparison with almost every modern or mediaeval language, and have all failed to find anything conclusive. It is an idea almost every newcomer to this manuscript has, and seems to be becoming a fruitless line of research.
(02-12-2025, 07:25 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thanks for the tables, but I don't see how to read them.  Are there ": : patterns" in them?
...
In natural languages these ": : patterns" seem rare.  Even when the German grammar seems to allow "in" or "vor" pair indifferently with "die" or "der", in any particular text you will find that "in" has a definite tendency to partner with "die" and "vor" with "der".  Or vice-versa.

The table wasn't the best (and also used very bad scheme for the colorblind), so I made a new one. This is simply pair counts from a German text (Das Lob der Narrheit by Desiderius Erasmus). Of course different combinations have different frequencies, but I think this shows pretty clearly that there are sets of words that prefer the company of another set: (sie|sich)(nicht|in|mit) and (in|mit|von)(der|den|dem). (Never mind the heretical capitalization, I did it for normalization.) This is arbitrarily cut at 20 most paired words.

[attachment=12783]

I have to clarify I'm not claiming this is what can be seen in the Currier B chart, it's just an experiment to see if there is any similarity. Not very much is the answer.

(02-12-2025, 07:25 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And you must be aware that many languages (including Classical Latin) make little or no use of prepositions and articles, and use word order or declensions instead. Articles and prepositions as separate words are a characteristic feature of Romance and Germanic languages.  

Yes, that's why I specified "languages that have prepositions and articles". Also add Greek to that soup.
(02-12-2025, 12:21 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.How do you get 41 for  or aiin ?  I get 52 in language B.

Are you sure those are Currier B matches? Running a query on daiin.net gives You are not allowed to view links. Register or Login to view. results for B and You are not allowed to view links. Register or Login to view. on the whole manuscript. If that wasn't the case, maybe the transcriptions have a different spacing, oraiin vs or   aiin?

(02-12-2025, 12:31 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Unfortunately many people have already tried to find some direct comparison with almost every modern or mediaeval language, and have all failed to find anything conclusive. It is an idea almost every newcomer to this manuscript has, and seems to be becoming a fruitless line of research.

The notion that Voynichese is a European language (that the proposer him/herself happens to speak) and vords are words 1:1 is of course getting very old, but I don't think this conclusively rules out it being a European language. If, say, vords are syllables or morphemes you'd still probably have small frequent words represented with a single vord, and thus show these effects.
I don't know if this helps, but I've found that there are two types of r and two types of s and two types of ch 
(02-12-2025, 06:56 PM)Doireannjane Wrote: You are not allowed to view links. Register or Login to view.there are two types of r

I use the GC transliteration for my analysis. There appears to be four variants of r . I like to convert them all to the same character.

[attachment=12785]

But also I loaded GC2a-n.txt into a text editor and counted the finds of  or aiin  ( '.oy.am.' in GC ). 52 seems to be about the right number.


[EDIT]
But now I have repeated the computation using the ZL transliteration and got

[attachment=12790]

44 is now the number under ZL. So that explains the difference. It is just a difference in the transliterations.
[/EDIT]
(02-12-2025, 06:37 PM)srjskam Wrote: You are not allowed to view links. Register or Login to view.The table wasn't the best (and also used very bad scheme for the colorblind), so I made a new one. This is simply pair counts from a German text (Das Lob der Narrheit by Desiderius Erasmus)... I think this shows pretty clearly that there are sets of words that prefer the company of another set: (sie|sich)(nicht|in|mit) and (in|mit|von)(der|den|dem). (Never mind the heretical capitalization, I did it for normalization.) This is arbitrarily cut at 20 most paired words.

Thanks!  

Indeed the word pairs that can occur are only a small subset of all possible pairs; and the distinction allowed/forbidden  is quite well marked in most cases.  

Also, there is a fairly small number of 2x2 submatrices that have all four counts positive (that is, quadruples {A1,A2} and {B1,B2} where all four A-B pairs are allowed).  And only a subset of these really follow the ": : pattern" (with R close to zero). 

But only some of the 2x2 submatrices that you marked really fit the ": : pattern"

Bugs apart, here are some of them, sorted by increasing R:
('mit', 'von')  ('der', 'den')  [[18 10] [18 10]] D =    0 R = 0.0000

('sie', 'sich') ('nicht', 'in')  [[12 10] [16 14]] D =  -8 R = 0.0116
('mit', 'von')  ('den', 'dem')  [[10 17] [10 18]] D =  -10 R = 0.0134
('mit', 'von')  ('der', 'dem')  [[18 17] [18 18]] D =  -18 R = 0.0143
('sie', 'sich') ('in', 'mit')    [[10  7] [14 13]] D =  -32 R = 0.0636
('sie', 'sich') ('nicht', 'mit') [[12  7] [16 13]] D =  -44 R = 0.0717
('in', 'mit')  ('der', 'dem')  [[16 10] [18 17]] D =  -92 R = 0.0959
('in', 'von')  ('der', 'dem')  [[16 10] [18 18]] D = -108 R = 0.1093
('in', 'mit')  ('der', 'den')  [[16 20] [18 10]] D = -200 R = 0.1928
('in', 'von')  ('der', 'den')  [[16 20] [18 10]] D = -200 R = 0.1928
('in', 'mit')  ('den', 'dem')  [[20 10] [10 17]] D = -240 R = 0.2967
('in', 'von')  ('den', 'dem')  [[20 10] [10 18]] D = -260 R = 0.3097
[Sorry for the misalignment.  Blame this shitty forum software...]

Thus while {mit,von} pair up indifferently with {der,den}, the other quadruples show at last a bit of bias.  Some of that can be attributed to sampling error (17 is essentially the same as 18, etc.)  The most biased of those quadruples is {in,von} x {den,dem}:  "in" definitely prefers to be followed by "den", and "von" by "dem".

And, again, these biases are a common feature of texts in natural language.  

Now let me see again the table for Voynichese...

All the best, --stolfi

PS here is the code that created that table, if you care:
Code:
#! /usr/bin/python3
# Last edited on 2025-12-02 17:14:25 by stolfi

from math import sqrt
from sys import stdout as out, stderr as err

def main():
  submats = ( \
    ( ('sie','sich'), ('nicht','in'),  ((12,10),(16,14)) ),
    ( ('sie','sich'), ('nicht','mit'), ((12, 7),(16,13)) ),
    ( ('sie','sich'), ('in','mit'),    ((10, 7),(14,13)) ),

    ( ('in','mit'),  ('der','den'),  ((16,20),(18,10)) ),
    ( ('in','mit'),  ('der','dem'),  ((16,10),(18,17)) ),
    ( ('in','mit'),  ('den','dem'),  ((20,10),(10,17)) ),

    ( ('in','von'),  ('der','den'),  ((16,20),(18,10)) ),
    ( ('in','von'),  ('der','dem'),  ((16,10),(18,18)) ),
    ( ('in','von'),  ('den','dem'),  ((20,10),(10,18)) ),
                                     
    ( ('mit','von'),  ('der','den'),  ((18,10),(18,10)) ),
    ( ('mit','von'),  ('der','dem'),  ((18,17),(18,18)) ),
    ( ('mit','von'),  ('den','dem'),  ((10,17),(10,18)) ),
  )
 
  for A, B, M in submats:
    ab, cd = M; a, b = ab; c, d = cd;
    D, T, R = compute_R(a,b,c,d)
    Af = f"{A}"; Bf = f"{B}"
    out.write(f"{Af:25} {Bf:25} [[{a:4d} {b:4d}] [{c:4d} {d:4d}]]")
    out.write(f" {D = :8d} {R = :6.4f}\n")
  return
  # ----------------------------------------------------------------------
   
def compute_R(a,b,c,d):   
  D = a*d-b*c
  if D > 0:
    D = -D
    T = b + c
  else:
    T = a + d
  L1 = abs(T/2 + sqrt(T*T/4 - D))
  L2 = abs(T/2 - sqrt(T*T/4 - D))
  R = min(L1,L2)/max(L1,L2)
  return D, T, R
  # ----------------------------------------------------------------------
   
main()
(02-12-2025, 09:25 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view....

You've really lost me now. I don't see the value or meaning of this metric. Let's take this chart of a hypothetical mystery language:

[attachment=12788]

To me it's reasonable to say from this that beta and delta form a class of words that is usually followed by another class of two and four. This is a "::" in the sense I mean it. What are the scores for beta-delta-two-four, alpha-beta-one-three, and beta-gamma-three-four? Isn't the first combination clearly more prominent than all other possible combinations?
(02-12-2025, 10:29 PM)srjskam Wrote: You are not allowed to view links. Register or Login to view.To me it's reasonable to say from this that beta and delta form a class of words that is usually followed by another class of two and four. This is a "::" in the sense I mean it. What are the scores for beta-delta-two-four, alpha-beta-one-three, and beta-gamma-three-four? Isn't the first combination clearly more prominent than all other possible combinations?

The combinations that I listed are all the 2x2 submatrices in those part of the matrix that you highlighted in brown.

You had three words {in,mit,von} that formed pairs with {der,den,dem}.  The fact that all nine pairs have significant counts is already an interesting observation.  

But, that given, it is natural to ask whether the choice of the right member among {der,den,dem} is independent of which of the other three words {in,mit,von} came before it.  Or whether there are biases, like "von" preferring to pair with "dem", and "mit" with "der" or "den".  The R number is a measure of such bias.  

But the formulas I gave only work for 2x2 matrices; that is, one of two words {A1,A2} followed by one of two words {B1,B2}.  Thus I broke your 3x3 submatrix into the nine possible 2x2 submatrices, and computed the R for each one.

It is possible to compute similar metrics directly for a 3x3 submatrix, but interpreting the result is complicated.

It is not worth computing the R if a 2x2 submatrix has a row or column with only zero (or small) elements.  That matrix will give a low R, but it will not mean anything.  

But otherwise it is worth listing the two word pairs and their R, even if some of the four counts are small.  For example, in your VMS-B table, the words from A ={okedy,otedy} pair up with the words from B={qokeedy,qokedy} with submatrix [[0,4.71],[2.04,4.06]].  That "0" means that otedy will pair with both B words more than expected by chance, but okedy will pair up significantly only with qokedy.  That is a significant asymmetry. (The R value comes out as ~0.29, quite far from zero.)

I don't know how to use that information to decipher the VMS, but the existence of such asymmetric pairs is one of the many bits of circumstantial evidence that the contents is meaningful.

All the best, --stolfi
Pages: 1 2