The Voynich Ninja

Full Version: A One-Page Ledger Method for Generating Voynich-Like Text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Here's a very quick preliminary test. In my opinion, the result is acceptable for such a short text.

[attachment=15818]

Code:
#Word Types: 2177
#Word Tokens: 9502
#Words top 120
1 123 daiin
2 96 daiinal
3 63 qokainal
4 61 qokeey
5 54 okaiin
6 50 cheol
7 50 chey
8 50 chor
9 49 chol
10 47 chdy
11 47 chhy
12 47 okaiinol
13 42 cheor
14 42 okol
15 39 qokedyol
16 39 sheody
17 38 chedy
18 37 sheedy
19 36 chinal
20 36 qokeedyol
21 35 chear
22 33 cheey
23 32 cheydy
24 32 sheey
25 31 dair
26 31 okedy
27 31 otal
28 31 shdy
29 30 dain
30 29 chdyol
31 29 oteedy
32 29 sheol
33 28 otaiinol
34 28 sheor
35 27 olor
36 27 oteey
37 27 oteeydy
38 27 qokeeydy
39 27 shol
40 26 aiin
41 26 chain
42 26 cheeydy
43 26 okeey
44 26 otey
45 26 qokeydy
46 26 qoky
47 25 chedyol
48 25 choiin
49 25 okeedy
50 24 choldy
51 23 aiiin
52 23 chal
53 23 choky
54 23 okar
55 23 olol
56 23 shedy
57 23 shey
58 22 aiir
59 22 cholky
60 22 okeol
61 22 olky
62 21 chety
63 21 okody
64 21 shdyol
65 21 sheoraiin
66 21 shor
67 21 ykeol
68 20 aiinal
69 20 cheo
70 20 dady
71 20 okeal
72 20 oteal
73 20 qokal
74 20 ytchy
75 20 ytor
76 19 chty
77 19 okey
78 19 oldy
79 19 otar
80 19 otchy
81 19 ykol
82 18 cheedy
83 18 cheeey
84 18 cheoar
85 18 cheoor
86 18 choraiin
87 18 okady
88 18 olar
89 18 olchy
90 18 otear
91 18 oteeey
92 18 oteody
93 18 qoin
94 18 shal
95 18 ykaiin
96 18 ytaiin
97 18 yteol
98 17 chchy
99 17 okor
100 17 otain
101 17 oteol
102 17 oteydy
103 17 qochy
104 17 qokdyol
105 17 qokedy
106 17 qoteo
107 17 sheydy
108 17 shody
109 17 ykeedy
110 16 chir
111 16 choty
112 16 okair
113 16 okeeydy
114 16 olkydy
115 16 otaiin
116 16 qoal
117 16 qokaiin
118 16 qokeeody
119 16 qokor
120 16 qoty
(28-05-2026, 09:58 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Here's a very quick preliminary test. In my opinion, the result is acceptable for such a short text.

And you see what I was saying about word length.  You have no 1, 2 or 3 length words so you distribution is way to the right.  Voynich is word length 5 centered, not 6.

Look at this output from my generator:

Output tokens: 8274
Output vocabulary size: 2525
Output hapax count: 1551
Voynich comparison tokens: 8519
Voynich vocabulary size: 2436
Voynich hapax count: 1642

So, your vocabulary is in the ballpark, maybe a little light.  You add those length 1 and 2 tokens and that'll bring up that vocabulary count and likely bring your word length closer to the Voynich if you keep the token count the same.

Now, look at my top 20 bigrams and trigrams compared to the Voynich.  It still needs some tweaking but it's in the ball park.  Is yours?

Code:
Top 20 output bigrams (100 pages):
  ho: 2062
  ch: 1689
  sh: 1551
  ai: 1319
  ol: 1241
  he: 1196
  ey: 1037
  in: 989
  da: 884
  ii: 851
  ar: 761
  or: 626
  th: 621
  ct: 611
  od: 579
  hy: 568
  ha: 513
  cp: 451
  ph: 421
  ot: 363

Top 20 Voynich bigrams (100 pages):
  ch: 3082
  ho: 1920
  ai: 1433
  in: 1374
  ii: 1281
  ol: 1239
  he: 1116
  da: 1116
  sh: 1033
  dy: 1024
  hy: 948
  ok: 930
  or: 922
  qo: 764
  ot: 749
  ar: 562
  ey: 554
  od: 548
  ee: 541
  eo: 467

Top 20 output trigrams (100 pages):
  sho: 850
  aii: 736
  hey: 734
  cho: 729
  hol: 706
  iin: 638
  dai: 537
  cth: 529
  cph: 388
  she: 362
  che: 361
  hod: 307
  ckh: 259
  oda: 252
  ain: 249
  hoy: 243
  dar: 190
  cfh: 188
  chy: 187
  har: 184

Top 20 Voynich trigrams (100 pages):
  cho: 1236
  iin: 1181
  aii: 1139
  dai: 728
  che: 718
  hol: 596
  chy: 492
  sho: 467
  hor: 467
  cth: 435
  tch: 420
  kch: 392
  edy: 366
  qok: 359
  she: 295
  hey: 291
  ody: 253
  qot: 251
  otc: 244
  heo: 241

Look at my word length compared to the Voynich then compare it to yours.

[attachment=15819]

Look at my zipf curve compared to the Voynich. Does yours compare?  I'm betting that until you fix the word length it'll be a "no".

[attachment=15820]

These are the the very basic numbers a generator needs to be "plausible" with before you can even call it a generator.  AND, to be a workable generator it has to be mostly blind about it.  I do use a seed page. After that, it generates the rest on it's own just using my ledger.  And all the numbers you see mine generate are emergent.  I didn't create any code to force a zipf curve.  I do have code that manages word length and that's something a scribe could do with their eyeballs.

The big thing you have to think about.  If you sat down and tried to create something that looked like the Voynich and YOU were in the 15th century, what "tools" would you need other than a quill, some ink and maybe some ponce to erase things with.  Any generator we come up with today needs to be human 15th century doable. Which is why the Naibbe cypher looks great and would technically be human doable but would require dice and cards and lookup tables... extremely cumbersome.  Those are the type things you need to consider.  

Creating Voynich-like text is easy.

Creating a method that explains how a 15th century scribe could do it, that's the hard part.

Edit: If you want to really see how you're doing, put up some real Voynich comparison charts, word length for word length.
Ran some new tests on my generated code vs the Voynich.

This test measures novelty, predictability, word beginnings, word middles, word endings, and how strongly words cluster into families.  It's from some code I developed months ago.  It basically trains on half the text then tests on the other half to see how they compare.   

In this test the generated code is mostly in the ballpark with the exception of bigram reject. That's basically a measure of, look at the bigrams on one half of the text and then see which of those bigrams are in the other half.  In the case of my generator, it's pretty low which means, it's producing too many close relatives. It stays inside the learned families extremely well, but the real herbal text appears to generate more new combinations while still obeying its structural constraints. 

BPC is bits per character which measures entropy or how predictable text is.  BPC2 uses bigrams, BPC3 trigrams.  ΔBPC is the difference between the two.

The ED1 tests look at how many words in the training set are edit distance 1 or copies of words in the test set.  Token asks how many words have a close relative in the test set and type asks how many unique words have a close relative.  Again, it's off by a bit and is probably related to the bigram reject difference.  97% of the unique training words are ED 1 or 0 to a test word.  

And I think I know why these are issues. I tried to keep everything a copy or ED1 from it's source and it almost never generates a random new word.  It appears there's a bit more novelty in the Voynich than I'm allowing for.

So again, not perfect.  In the neighborhood?  Mostly, but needs work.

MetricHerbal VoynichRK42Difference
Bigram reject %0.56550.1044much lower
BPC22.44722.8717higher
BPC32.19782.5697higher
ΔBPC0.24940.3020higher
Initial BPC3.06923.1479slightly higher
Medial BPC3.33063.4473slightly higher
Final BPC2.60142.9747higher
ED≤1 Token95.7988%98.9295%higher
ED≤1 Type88.0416%96.9503%much higher
(28-05-2026, 10:22 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And you see what I was saying about word length.  You have no 1, 2 or 3 length words so you distribution is way to the right.  Voynich is word length 5 centered, not 6.

Problem solved—not perfect, but good enough.

[attachment=15831]

============================================================
PAGE 49
============================================================

chedyol ckh otecheo otdy otain qok sheol choeeyky
ytchy ykhol okor cheolky oteydy okar chockhy ytol
oteydy ch oldain sheol daar sheol cheesey char
l olal ykol cholody okodar choolchdy qokshey shain
qokchy otalshy oka qoteshy ok daseky ira otair
oteydy aiirody qooraiin cheky chky oleeol ote shey
qotor aiindy oleey chody qotey kal ytal oldyol
ok qo otdy choty oldyol otockey dair otardar
sheekchey qokchedy sheey choindy qocphy qotey ykchy chteey
oteor rot ykeey qokychdy oteal che chchy qokeeydy
qo ch shcthy cheedalaiin aiolky qokaiiin otal aithar
ykchy okear dair oteor sheey qo aiiinol

============================================================
PAGE 50
============================================================

sheol oka ykeydy oka ykchy chear otey aiir
lol yld choolky aiir chear osh oka choldar
charain qochy cheor otdady ol ok shoky oleydy
qokeey ytal ol ytey shdy shkshy chty daiin
chty okoraiin otor chodaiin ollchy olardy darody yteey
ykar cheor ykeey daiin okedy okar okaiin qofaiin
shkear otosey daiin sheey oleydy okeey qokeey qoeeor
ol checphey qochey chindy ch shdchy cheokeor daiin
chey daiir ykhol oltaiin okol ch shedy shechy
chor che che chedy chtain shedy chekydy daiin
qot oteeo qoin qolchy okchy cheol qot oksheo
okshor shol ykdy qotarain okaiin qolshey cheol
(29-05-2026, 05:16 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(28-05-2026, 10:22 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And you see what I was saying about word length.  You have no 1, 2 or 3 length words so you distribution is way to the right.  Voynich is word length 5 centered, not 6.

Problem solved—not perfect, but good enough.

Now you're looking much more Voynich and you have at least one metric in the ballpark.  I'm guessing adding those short words reduced all the long words and brought that chart back closer to reality.  You're now matching word length distribution for a natural language.  The Voynich tends to have slightly shorter words than a natural language.  And, that bell shaped standard distribution should also start to bring your zipf curve into line.
(29-05-2026, 10:52 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And, that bell shaped standard distribution should also start to bring your zipf curve into line.

[attachment=15835]
In conclusion, I would like to clarify once again that my model is based primarily (87.4%) on a table containing 485 entries (syllable combinations). The ledger words only fill in the gaps not covered by short_words and fixed words. They therefore account for only a small percentage, but they are necessary to ensure that all words are generated. Otherwise, the model as a whole would not function properly.

Proportions of word generation:
-----------------------------------------

Short words:  1634 (17.2%)
Common words:  4474 (47.1%)
Common combinations:  2195 (23.1%)
Ledger words:  1197 (12.6%)
Total: 9500

The high proportion of predefined words suggests that the author may indeed have worked with a tables book or combinatorics. While this would be labour-intensive, it would still be feasible for a 15th-century author.

Of course, that doesn't prove the manuscript is meaningless. A table book could also encode a real language or a cipher system.

[attachment=15847]
(26-05-2026, 04:11 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.
(26-05-2026, 03:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.I prefer to analyze the text the scribe actually wrote rather than speculate about the texts he didn't write — since that number is endless.

I will politely disagree with this statement. A model does not just have to explain what the scribe wrote. It also has to explain why the text did not fall apart into garbage.  A lot of objections in these discussions are really variations of “why don’t we see this?” or “why didn’t the scribe do that?” They are questions about constraint.

In my own experiments, most copy/mutate systems fail pretty quickly unless the constraints are tight enough. They drift, bloat, repeat badly, or start producing obvious nonsense. And, given enough generations, even my generator has issues. That failure is informative. It tells us the Voynich text is not just “anything goes” mutation. So I think the unwritten text matters too. If a proposed method could easily generate endless bad Voynichese, then we need to explain why the manuscript consistently avoided doing exactly that.

If we don't explore the possibilities of what wasn't created then we'll never have acceptance for what is created.

You raise a fair point — but it's a different question from the one I answered. Zandbergen asked why some unspecified modifications are absent. I said: because the scribe took one path and not another, and we can only analyze the path taken. That stands.

Your question is different: why doesn't the text fall apart into garbage? That's a question about constraint, and it has a straightforward answer.

The copying process inherits structure from the source. Each new word is a modification of an existing word that already has valid structure. A modification of a valid word almost always produces another valid word — because the stroke-level constraints are preserved by the modifications taken. The structure doesn't need to be maintained by external rules — it's carried forward by the copying process itself.

And the scribe has eyes. If a modification produces something that looks wrong, he sees it and doesn't write it. Your own "don't look stupid guardrails" describe this exactly. The scribe isn't following a ledger. He's looking at what he produced and judging whether it looks acceptable. Human aesthetic judgment is the constraint. 

His visual judgment follows the stroke-level structure of the writing system. Schwerdtfeger described in 2008 four design rules: (1) line-glyphs can follow line-glyphs or 'a'; (2) curve-glyphs and 'a' can follow curve-glyphs; (3) the 'l'-glyph can be used as a curve-glyph or as a line-glyph; and (4) gallows glyphs count as curve glyphs (see Timm & Schinner 2020, p. 10). These aren't rules the scribe memorized — they're descriptions of what visual similarity produces. A scribe modifying strokes naturally stays within these constraints because crossing them would look wrong to his eye.

This is also why your generator struggles — and why any generator will. For the human mind it is difficult not to repeat itself. The variation comes from imperfect repetition — not from randomness. The consistency comes from habitual patterns — not from explicit rules. A computer needs both dice (for variation) and rules (for consistency) as separate mechanisms. A human produces both from one source — his own cognitive habits. That's why modeling a human scribe in Python is, as you put it, "a nightmare." You can't model the human mind like a computer — because the human produces variation and structure from the same process, while the computer needs separate mechanisms for each.

The manuscript doesn't fall apart because every word was seen and accepted by a human before it became part of the source pool for the next word. The constraints are the pattern recognition abilities and the habits of the scribe himself.
(30-05-2026, 07:30 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The manuscript doesn't fall apart because every word was seen and accepted by a human before it became part of the source pool for the next word. The constraints are the pattern recognition abilities and the habits of the scribe himself.

I couldn't agree with you more. And again, that brings human psychology into the equation. Which patterns?  And then, how do you model those patterns?  At what point do the weights and rules become more than a 15th century scribe can do?  That's where I am now.

For examaple:

In the other post on <ed> I mentioned I did some machine learning, trying to find syllables.  I refined that down to where only 3 words in Takahashi and 14 in Zandbergen/Landini can't be mapped has having good, learnable n-grams in Scribe 1 content.  It came up with a count of 150 main n-grams and 50 more that that act as connectors.  I added that learned chunk data into my generator and it produced the code blocks I pasted below.  

The ledger model still attempts to do a mutation and validates it. The new word is then, it's passed off to the chunk model to see if it contains valid n-gram chunks with valid neighbors.  It looks more Voynich like that previous runs and still matches statistics pretty well. 

But, is this still human doable.  I think so but, it may require more than just the scribe's eyes . There may have been an audible component, where the scribes were familiar enough with the output tokens that they pronounced them.  I know I pronounce Voynich words to see if they sound right. I pronounce words as I write them.  Might a Voynich scribe do the same?

Code:
<f1v>
<f1v.P1.1;K>        fchol.shaiin.odad.fchol.odd.ose.cfhealy.chol
<f1v.P1.2;K>        chotey.yoo.or.syaiir.cther.ckhea.chtain
<f1v.P1.3;K>        pchol.ycheeo.chol.shodd.pose.odad
<f1v.P1.4;K>        pydaey.okain.yor.pchol.oydar.pchol.kshiy.yorr
<f1v.P1.5;K>        fchol.os.fchol.s.daiin
<f1v.P1.6;K>        osdar.okod.qos.daiim.pshol.ycheeo
<f1v.P1.7;K>        oar.okeol.oateos.opeos.daiin.fchol.chool.mor
<f1v.P1.8;K>        os.chol.oos.pchol.kaiin
<f1v.P1.9;K>        cfhoain.fcho.chal.rokain.cchol.fchol.cthar.s
<f1v.P1.10;K>        kor.chol.chols.ydain.s.lchol.sshey
<f1v.P1.11;K>        chool.diiin.fchol.kor.s.fchol.yocho

<f25r>
<f25r.P1.1;K>        ros.dloctha.sory.dain.eey.dloctha.koshey.shol.daraii
<f25r.P1.2;K>        kshy.dlocta.ycho.kshy.dain.ikan
<f25r.P1.3;K>        cthooldar.chodaiils.cthy.okaiirg.chodaiil.ctholdar
<f25r.P1.4;K>        chotey.odar.cfhol.csol.shory.dloctha.ecthoary
<f25r.P1.5;K>        kdail.dlocta.dlocfha.cthey.dair.dain.daid
<f25r.P1.6;K>        cphoy.dlocta.dloctha.tchol.dloctha.cthoary
<f25r.P2.7;K>        okar.chal.dair.cphoy.cfal.dlocta.okaiir.eos
<f25r.P2.8;K>        oycheey.cphodaiil.daire.chhealy.chealy
<f25r.P2.9;K>        oar.tchol.dlocta.tchol.dlocfha.dlocpha.dlocka.dlocpha.cphealy
<f25r.P2.10;K>      kaiirg.eckhoary.fchol.tchol.dlocphe.dair
<f25r.P2.11;K>      kail.dlocka.os.pchil.otchol.dlocka.ckdey.pchol.dsory
<f25r.P2.12;K>      dlocpha.pchol.cthol.oaky.chodaiila.ctoil

<f50v>
<f50v.P1.1;K>        dan.cphoy.chor.dshkoldy.dan
<f50v.P1.2;K>        kyey.teody.ydain.cphar.dkchar.y
<f50v.P1.3;K>        cfhol.cphesaiin.chor.dshpoldy.sckhey
<f50v.P1.4;K>        cshyds.kodshey.cphoy.cthar.chor.kon.choydr.dain
<f50v.P1.5;K>        oas.chol.cphar.shok.shon.otol
<f50v.P1.6;K>        shos.rchol.oiin.scphey.ooiin.sho.cphar.oydar
<f50v.P2.7;K>        y.ro.oeykoal.rchol.kchol.fchol.chodor
<f50v.P2.8;K>        rol.olys.ytais.eolys.dshpoldy.cthey.y.sho
<f50v.P2.9;K>        okol.fchol.dshkoldy.y.dshpoldy
<f50v.P2.10;K>      cfhyl.chor.ytain.stchey.chsy
<f50v.P2.11;K>      eoe.ckhar.rchol.dshpolday.ddchar.oteos
<f50v.P2.12;K>      cphey.ckhey.oekoal.oteol.chey.stchey.cfhol
<f50v.P2.13;K>      y.oas.chey.chol.alg
<f50v.P2.14;K>      pydeeyl.rol

Page-count rule: output pages = Voynich pages = 100
Voynich pages available: 225
Voynich pages compared: 100
Generated pages: 100

Output tokens: 8234
Output vocabulary size: 2463
Output hapax count: 1557
Voynich comparison tokens: 8519
Voynich vocabulary size: 2436
Voynich hapax count: 1642
Ledger invalid: 0

Operation counts:
  external_copy: 1065
  external_copy_fallback: 14
  external_ed1: 1961
  external_short_copy: 13
  initial_gallows_construct: 88
  initial_gallows_seed: 10
  local_copy: 951
  local_copy_fallback: 164
  local_ed1: 1954
  novel_ed1: 1830
  seed: 184

Gallows action counts:
  attested_ed1: 2903
  gallows_delete: 54
  gallows_insert: 63
  gallows_substitute_in: 35
  gallows_substitute_out: 121
  gallows_swap: 162
  initial_construct: 88
  initial_seed: 10

Top 20 output bigrams (100 pages):
  ho: 2214
  ch: 1934
  he: 1295
  sh: 1274
  ol: 1254
  ai: 1116
  ey: 997
  in: 887
  da: 815
  th: 778
  ct: 749
  or: 739
  ii: 720
  ar: 685
  ha: 510
  od: 499
  hy: 468
  cp: 382
  eo: 373
  ee: 368

Top 20 Voynich bigrams (100 pages):
  ch: 3082
  ho: 1920
  ai: 1433
  in: 1374
  ii: 1281
  ol: 1239
  he: 1116
  da: 1116
  sh: 1033
  dy: 1024
  hy: 948
  ok: 930
  or: 922
  qo: 764
  ot: 749
  ar: 562
  ey: 554
  od: 548
  ee: 541
  eo: 467

Top 20 output trigrams (100 pages):
  cho: 969
  hol: 836
  sho: 702
  cth: 682
  hey: 677
  aii: 571
  dai: 520
  iin: 505
  che: 497
  cph: 352
  ain: 333
  she: 332
  tho: 275
  hod: 262
  oda: 226
  kch: 221
  ckh: 218
  eey: 204
  cfh: 186
  dar: 181

Top 20 Voynich trigrams (100 pages):
  cho: 1236
  iin: 1181
  aii: 1139
  dai: 728
  che: 718
  hol: 596
  chy: 492
  sho: 467
  hor: 467
  cth: 435
  tch: 420
  kch: 392
  edy: 366
  qok: 359
  she: 295
  hey: 291
  ody: 253
  qot: 251
  otc: 244
  heo: 241

After generating 1000 pages, it starts to look less Voynich-like but still has familiar patterns.

Code:
<f500v>
<f500v.P1.1;K>      shokcheey.ods.lochy.ol.dar.cfhaiin.shkaiir.moaokchy
<f500v.P1.2;K>      opsho.cfhol.kooishey.ol.cfhaiin.ol
<f500v.P1.3;K>      oyr.maokchy.kachys.ool.chol.oal.chol.shoasho
<f500v.P2.4;K>      fool.oal.keishey.oalg.cthol.lochy.y.ol
<f500v.P2.5;K>      maokhy.cfhol.dain.ool.oal.cpod
<f500v.P2.6;K>      oaokchy.ol.ool.ol.oaikdon
<f500v.P2.7;K>      cphey.y.shear.ord.ckhaiin.shtasho.cthey.ckhhaeiin.oaidoon
<f500v.P2.8;K>      ool.maokhg.okchey.okeey.sor.cpod.lol
<f500v.P2.9;K>      old.deydd.cthol.she.dain
<f500v.P2.10;K>      ol.shey.dair.kol.ols.ostairin.tol.roe.ol
<f500v.P2.11;K>      sheo.old.cfhol.shory.ol.cod.shyoolso.ror.cfhoaiin
<f500v.P2.12;K>      maokhy.ois.shoolsho.lshoolsho.oil.toche.skaiir

And here's page 10,000.  I assumed if the model was going to collapse, it would have done so by this point.  I can't provide any stats to compare to Voynich other than to say it's a bit heavy on longer words which could be adjusted.

Code:
<f5000v>
<f5000v.P1.1;K>      kore.dain.shey.chy.otaiin.qosoier.kore
<f5000v.P1.2;K>      raiisoey.cthal.cthodaiils.pdeiimy.airg.qosoilr.ylsoier
<f5000v.P1.3;K>      dain.chal.y.chal.oor.chodain.cthodaiils.odaiin
<f5000v.P1.4;K>      dpdaeim.cthy.ore.qosoier.daicthy.y.kshy
<f5000v.P1.5;K>      qosoier.odar.dain.kshy.olsdeoey.eefy
<f5000v.P1.6;K>      chy.dain.oor.qoosoieor.dair.lshytoo.cthodaiils.qore
<f5000v.P1.7;K>      dain.airg.cphey.kodaiin.dair.alshyoe
<f5000v.P2.8;K>      fairg.s.sshyood.qosoilr.chok
<f5000v.P2.9;K>      sory.olsoior.olsooer.dair.alshyey.alshiey
<f5000v.P2.10;K>    fair.cthodaiils.fairg.olsoier.oor.sshysod.dair.pachys.s
<f5000v.P2.11;K>    oir.alshyoe.alshyee.qosoier.shey.chok.olsoar.ckhealy
(30-05-2026, 04:28 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Of course, that doesn't prove the manuscript is meaningless. A table book could also encode a real language or a cipher system.

Ran some of my tests on your output.  Hope you don't mind a critical analysis.  

TestHis OutputTakahashi SliceComment
Tokens9,4479,447Matched sample size
Types1,7572,739Far fewer unique words
Hapax6831,856Far fewer one-off words
Type/Token Ratio18.6%29.0%More repetitive vocabulary
Hapax / Type Share38.9%67.8%Major novelty deficit
Mean Word Length5.044.87Very close
Zipf Slope-1.021-0.841Too steep. Common words dominate too much
Length Distribution JSD0.0047-Excellent match
ED≤1 Token Hit89.55%-Lower than expected
ED≤1 Type Hit77.52%-Weak word-family structure
Bigram Reject Rate2.94%-Reasonably good
Trigram Overlap79.84%-Moderate match

My eyeballs say your text looks good, almost too good perhaps. The Voynich has it's own license on weird. Which I suspect is why your hapax and novelty are low.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19