The Voynich Ninja
A One-Page Ledger Method for Generating Voynich-Like Text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: A One-Page Ledger Method for Generating Voynich-Like Text (/thread-5752.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11


RE: A One-Page Ledger Method for Generating Voynich-Like Text - Jorge_Stolfi - 23-05-2026

(23-05-2026, 02:44 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.The lines represent ED1 relationships. Two nodes are connected if one form can be transformed into the other by a single insertion, deletion, or substitution.  The actual physical lengths of the lines are not meaningful by themselves. The graph layout uses a spring-force algorithm that tries to pull highly connected regions together while pushing weakly connected regions apart.

Thanks!  Yes, I am familiar with that spring-force method for automatic graph drawing.

I wonder if one could use the width of the lines to convey some useful information? 

Like, let A and B be two words at edit distance 1, so that they would be connected by a line in that graph.  Let Fr(x,y) denote the estimated frequency of the biword x.y (occurrences of word type x immediately followed by word type y)in the text in question.  We could compute a "semantic similarity" S(A,B) of the two words by comparing the distributions Fr(A,y) with Fr(b,y), of of Fr(x,A) with Fr(x,B), or both. Then one could draw that graph with line width proportional to S(A,B).

For more meaningful results, you should consider only one section at a time, and only those sections with substantial text: Herbal A and B, Bio, and Starred Parags.  And exclude the pages with undefined topic, like f1r, f66r, f85r1, f86v5, f86v6, f86v3, etc., plus the bottom of You are not allowed to view links. Register or Login to view. (the star-less parags).  And beware of quantization and sampling noise when comparing the two distributions.

All the best, --stolfi


RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 23-05-2026

(23-05-2026, 03:22 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I wonder if one could use the width of the lines to convey some useful information? 

I fed your suggestions into codex. It took it a bit to get it, "I think" correct.  Let me know if you see any discrepancies and I'll fix them.  

Here's chol. I picked herbal scribe 1 so there's no chedy.

   

Full herbal scribe 1:

   

And it's text report:

Context folio range: You are not allowed to view links. Register or Login to view. to f66v; scribe: 1; mode: combined context; minimum similarity: 0.00; tokens used after word filters: 6617

Total vocabulary: 2053
Total tokens: 6617
Number of ED1 components: 248
Largest component size: 1795
Percent vocabulary in largest component: 87.43%
Percent tokens in largest component: 96.10%
Top 20 largest components by forms and token coverage:
    1:  1795 forms ( 87.43%),    6359 tokens ( 96.10%), top: daiin, chol, chor, dy, chy, shol, sho, cthy, dain, dar
    2:    3 forms (  0.15%),      3 tokens (  0.05%), top: okodar, qokodar, qokorar
    3:    2 forms (  0.10%),      2 tokens (  0.03%), top: cfhodar, cfholdar
    4:    2 forms (  0.10%),      2 tokens (  0.03%), top: cheoeees, cheoiees
    5:    2 forms (  0.10%),      2 tokens (  0.03%), top: oaorar, oporar
    6:    2 forms (  0.10%),      2 tokens (  0.03%), top: oeesody, oeesordy
    7:    2 forms (  0.10%),      2 tokens (  0.03%), top: okchaldy, opchaldy
    8:    2 forms (  0.10%),      2 tokens (  0.03%), top: pchooiin, pchroiin
    9:    2 forms (  0.10%),      2 tokens (  0.03%), top: qockhom, qockhor
    10:    2 forms (  0.10%),      2 tokens (  0.03%), top: sarar, satar
    11:    2 forms (  0.10%),      2 tokens (  0.03%), top: sheekal, sheekol
    12:    1 forms (  0.05%),      1 tokens (  0.02%), top: aiios
    13:    1 forms (  0.05%),      1 tokens (  0.02%), top: cfarsa
    14:    1 forms (  0.05%),      1 tokens (  0.02%), top: chaies
    15:    1 forms (  0.05%),      1 tokens (  0.02%), top: chakod
    16:    1 forms (  0.05%),      1 tokens (  0.02%), top: charochy
    17:    1 forms (  0.05%),      1 tokens (  0.02%), top: chckhaly
    18:    1 forms (  0.05%),      1 tokens (  0.02%), top: chckhan
    19:    1 forms (  0.05%),      1 tokens (  0.02%), top: chckom
    20:    1 forms (  0.05%),      1 tokens (  0.02%), top: chdlety

New UI:
Removing parts of pages would be a bit of a challenge. If enough pages are selected, removing a full page like You are not allowed to view links. Register or Login to view. shouldn't make a huge difference.

   
  • Uploaded Ed1_Network_Mapper_GUI.py to the repo which is creating these charts and output.  Not in the README descripion yet.
  • Updated Create_Mappings_Gui_v1.py.  This creates the mappings file used by the network mapper.  It will now accept either one of my json transcriptions or a text file. It strips Gutenberg headers if you wish to use those.
  • Voynich_Transcription_Export_v2.py creates the json transcriptions I use.  NOTE... most of my files are not set up to use non EVA transcriptions even though this file will export them.



RE: A One-Page Ledger Method for Generating Voynich-Like Text - Jorge_Stolfi - 23-05-2026

(23-05-2026, 04:12 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I fed your suggestions into codex. It took it a bit to get it, "I think" correct.

Thank you!

I cannot tell if the line thicknesses are right of wrong, but they suggest other more specific tests.

Could you please do daiin, on the Starred Parags section?

All the best, --stolfi


RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 23-05-2026

(23-05-2026, 07:09 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(23-05-2026, 04:12 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I fed your suggestions into codex. It took it a bit to get it, "I think" correct.

Thank you!

I cannot tell if the line thicknesses are right of wrong, but they suggest other more specific tests.

Could you please do daiin, on the Starred Parags section?

All the best, --stolfi

Recipe/Stars full backbone

   

daiin network

   

and I ran another with your update on Scribe 2 chedy.  Here, you can see the darker connection lines

   


RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 23-05-2026

(23-05-2026, 07:09 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(23-05-2026, 04:12 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I fed your suggestions into codex. It took it a bit to get it, "I think" correct.

Thank you!

I cannot tell if the line thicknesses are right of wrong, but they suggest other more specific tests.

Could you please do daiin, on the Starred Parags section?

All the best, --stolfi

Ah, figured out your contexts.  Here's all 3.  Both, left and right.

Both
   

Left
   

Right
   

And since the distinction is subtle, I put all 3 into an animated gif. 

   


RE: A One-Page Ledger Method for Generating Voynich-Like Text - Torsten - 26-05-2026

(21-05-2026, 03:29 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Sorry, I don't understand this argument. 

Take for example
  56 otedy 56 oteedy
   2 ytedy 12 yteedy


If, after a suitable warm-up period, the words otedy and oteedy are equally frequent (as shown), and the mutation process can create ytedy from otedy, it should also create yteedy from oteedy. Then ytedy and yteedy should be equally frequent too.  But their ratio is only 1:6.

As I see it, the only ways your model would create the above counts are (1) the mutation of the prefix o->y is sensitive to whether the suffix is edy or eedy, or vice-versa, or (2) the seed text had those four words in those approximate skewed ratios (maybe no ytedy at all), and the mutation rules cannot create enough ytedy from otedy or from yteedy to raise the ytedy:yteedy ratio above 1:6.  Isn't that so?

If (1) is the case, then the method is even more complicated than it seemed at first.

If (2) is the case, then the method must be relying a lot more on the seed text being "Voynichese-like".  Which essentially replaces the question "how could the Author have generated the VMS text" to "how could the Author have generated a seed text with the same vocabulary and word frequencies as as the VMS text". 

No?

No, there are far more words than just these four. They exist in a multidimensional network of dozens of related forms. The scribe doesn't choose between "otedy" and "oteedy" in isolation — he chooses among the entire visible pool of similar words:

-kedy-keedy-key-keey
o-okedy (118)okeedy (105)okey (63)okeey (177)
y-ykedy (23)ykeedy (30)ykey (8)ykeey (58)
ot-otedy (155)oteedy (100)otey (57)oteey (140)
yt-ytedy (24)yteedy (28)ytey (13)yteey (28)

And the network extends far beyond these sixteen forms. Here is a larger sample for the ok-/k-/t-/ot- prefix group alone (not even including the y- or qo- variants):

ok-k-t-ot-
-aiiinokaiiin (4)kaiiin (3)taiiin (1)otaiiin (1)
-aiinokaiin (212)kaiin (65)taiin (42)otaiin (154)
-ainokain (144)kain (48)tain (16)otain (96)
-anokan (5)kan (3)tan (1)otan (5)
-aiirokaiir (6)kaiir (—)taiir (—)otaiir (4)
-airokair (22)kair (14)tair (13)otair (21)
-arokar (129)kar (52)tar (43)otar (141)
-ailokail (1)kail (1)tail (—)otail (1)
-alokal (138)kal (23)tal (20)otal (143)
-amokam (26)kam (9)tam (—)otam (47)
-osokos (8)kos (3)tos (4)otos (4)
-orokor (34)kor (26)tor (23)otor (46)
-olokol (82)kol (37)tol (48)otol (86)
-ooko (8)ko (2)to (2)oto (9)
-yoky (102)ky (25)ty (16)oty (115)
-eyokey (63)key (14)tey (11)otey (57)
-eeyokeey (177)keey (44)teey (20)oteey (140)
-eeeyokeeey (27)keeey (11)teeey (1)oteeey (8)
-cheyokchey (32)kchey (21)tchey (22)otchey (31)
-chyokchy (39)kchy (29)tchy (24)otchy (48)
-shyokshy (10)kshy (5)tshy (5)otshy (4)

Twenty-one suffix rows, four prefix columns, eighty-four cells — and nearly every cell is filled with an attested word. There is no 1:1 relationship between two similar Voynich words. Each word exists in a network of variants. The frequency of any specific word reflects its entire history of being produced and being used as a source for further copying — across all production sessions and across the full evolutionary gradient.

The "ok-" and "ot-" forms are consistently more frequent than "k-" and "t-" forms — because "ok-" and "ot-" are the full forms while "k-" and "t-" are the prefix-reduced variants. But within each prefix group, the ratios aren't uniform either — "okeey" (177) is more frequent than "okey" (63), "okaiin" (212) is more frequent than "okain" (144), reflecting which forms the scribe happened to use as sources more often.

Some cells are empty — "kaiir" (—), "tail" (—), "tam" (—). Not because a rule forbids them but because the scribe never happened to produce them. Different contingent choices would fill different cells.

That is what I mean by "frequent words are more likely to be selected as copying templates, generating more variants." Not that each word generates its variants at equal rates — but that the entire network of similar words feeds back on itself, with frequent forms generating more variants and rare forms generating fewer.

(For a more complete word grid documenting these networks across the entire VMS vocabulary, see You are not allowed to view links. Register or Login to view. or [Timm 2014](You are not allowed to view links. Register or Login to view.), pp. 66–82)


RE: A One-Page Ledger Method for Generating Voynich-Like Text - Torsten - 26-05-2026

(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Why would a realistic copy-mutate system stay conservative? There is nothing that dictates this. 
The number of rules observed over tens of thousands of words are quite complex, and they are indeed rules.

This is, in a way, backwards logic.
We see that the word variations are very strict. Therefore, if the text was generated by modifying previous words, it would have to have followed strict rules. That is the correct direction of the logic.
There is no reason to assume that there would be very strict rules (which are then broken somewhat gradually).

EDIT:
Let's do some rough counts.
The word chedy could be considered to have four characters. 
Limiting to edit distance 1:
Each of these could be changed into another, leading to 4 times, say, 20 options.
Each of these could be deleted, leading to 4 more.
A new character could be added in each of 5 slots, so 5 times 20 more.
6 pairs could be swapped (not sure if that counts as edit distance 1).
We are close to 200 alternatives. 
Possibly 10 exist.

We can consider two alternative methods for a creation of a meaningless text using word permutations.

Method A: 
first, a vocabulary is set up using word patterns and their variations
then, a text is composed by somewhat aribitrarily picking words from this vocabulary/dictionary

Method B:
a text is generated by creating new words from previous ones 'on the fly'
then, the resulting vocabulary is the collection of all these words

It should be clear that the very limited set of allowed permutations much better fits with method A than method B.

The calculation of 200 possible edit-distance-1 modifications treats each EVA character as an independent substitutable unit. But the scribe doesn't work in EVA — he works at the stroke  level or at the level of whole prefix- and suffix groups. The natural stroke-level modifications of "chedy" are far fewer than 200:

- "ch" → "sh" (one stroke) = "shedy"
- "e" → "ee" (one stroke) = "cheedy"
- "edy" → "ey" (replace edy with ey) = "chey"
- "ch-" → "ok-" (replace the prefix) = "okedy"

The scribe didn't use an algorithm to find all theoretically possible modifications. He did what humans tend to do. Why bother to invent new modification rules if the set generated so far is already sufficient? Therefore he repeated with the repertoire of modifications rules he had already used before. The "200 possible" alternatives include modifications like replacing "e" with "k" or "d" with "q" — changes no scribe would make because they cross stroke-family boundaries and replacement patterns. The gap between 200 and 10 isn't evidence for a pre-designed vocabulary. It's evidence that the modification process operates either at the stroke level or at the level of whole prefix or suffix groups, but not the EVA character level.

And more fundamentally: it is impossible to produce all things that are theoretically possible. Every choice determines what comes next. After writing "chedy", the scribe sees "chedy." He modifies it to "shedy". Now "shedy" is visible. He modifies that to "shey." Each step constrains the next. The 190 "unmade" modifications were never live options — the scribe was never at a point where those were the natural next step from what was visible on the page. This is why on one page '-edy' forms dominate while on the next page '-ol' or '-aiin' forms do — the visible source words differ, producing different paths. Note: On You are not allowed to view links. Register or Login to view. (You are not allowed to view links. Register or Login to view.) there is even one paragraph with "-ody" words and one paragraphs with "-edy"-words.  

This is true of any text. In theory Shakespeare could have written an uncounted number of possible plays. He wrote his version of Hamlet. We don't ask "why didn't he write one of the other possible plays or a different version of Hamlet?" because each word chosen determined what came next. The same applies here. The variants we see are the path taken. The variants we don't see are paths the scribe was never standing at the fork to take — or chose not to.


RE: A One-Page Ledger Method for Generating Voynich-Like Text - ReneZ - 26-05-2026

(4 hours ago)Torsten Wrote: You are not allowed to view links. Register or Login to view.The calculation of 200 possible edit-distance-1 modifications treats each EVA character as an independent substitutable unit. But the scribe doesn't work in EVA

I also was not using Eva in my example, by treating 'ch' as a single unit.

Interestingly, it does not even make a great difference.
from ch to sh in Eva is a single change, as is the equivalent S to Z in Currier.
Similarly, from ch to cth is a single change, as is the equivalent S to X in Currier.

One could argue that the same is true if one looks at the writing, which is of course the correct thing to do.

My main point was that there are a great majority of possible changes that are (apparently) forbidden.
This existence of a very large set of relatively strict rules strongly suggests, that there is still something else going on. It is not just a matter of creating meaningless words based on small changes to previous words.

The rules are non-trivial too. A relatively simple (potential) change from e to a is allowed in some contexts but not in others.
One can introduce a f intruding in a ch, before a ch , but not before one or two e 's.

Etc, etc.