The Voynich Ninja - A One-Page Ledger Method for Generating Voynich-Like Text

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Here's a very quick preliminary test. In my opinion, the result is acceptable for such a short text.

[attachment=15818]

Code:
#Word Types: 2177

#Word Tokens: 9502

#Words top 120

123 daiin 

96 daiinal 

63 qokainal 

61 qokeey 

54 okaiin 

50 cheol 

50 chey 

50 chor 

49 chol 

47 chdy 

47 chhy 

47 okaiinol 

42 cheor 

42 okol 

39 qokedyol 

39 sheody 

38 chedy 

37 sheedy 

36 chinal 

36 qokeedyol 

35 chear 

33 cheey 

32 cheydy 

32 sheey 

31 dair 

31 okedy 

31 otal 

31 shdy 

30 dain 

29 chdyol 

29 oteedy 

29 sheol 

28 otaiinol 

28 sheor 

27 olor 

27 oteey 

27 oteeydy 

27 qokeeydy 

27 shol 

26 aiin 

26 chain 

26 cheeydy 

26 okeey 

26 otey 

26 qokeydy 

26 qoky 

25 chedyol 

25 choiin 

25 okeedy 

24 choldy 

23 aiiin 

23 chal 

23 choky 

23 okar 

23 olol 

23 shedy 

23 shey 

22 aiir 

22 cholky 

22 okeol 

22 olky 

21 chety 

21 okody 

21 shdyol 

21 sheoraiin 

21 shor 

21 ykeol 

20 aiinal 

20 cheo 

20 dady 

20 okeal 

20 oteal 

20 qokal 

20 ytchy 

20 ytor 

19 chty 

19 okey 

19 oldy 

19 otar 

19 otchy 

19 ykol 

18 cheedy 

18 cheeey 

18 cheoar 

18 cheoor 

18 choraiin 

18 okady 

18 olar 

18 olchy 

18 otear 

18 oteeey 

18 oteody 

18 qoin 

18 shal 

18 ykaiin 

18 ytaiin 

18 yteol 

17 chchy 

17 okor 

17 otain 

17 oteol 

17 oteydy 

17 qochy 

17 qokdyol 

17 qokedy 

17 qoteo 

17 sheydy 

17 shody 

17 ykeedy 

16 chir 

16 choty 

16 okair 

16 okeeydy 

16 olkydy 

16 otaiin 

16 qoal 

16 qokaiin 

16 qokeeody 

16 qokor 

16 qoty

(28-05-2026, 09:58 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Here's a very quick preliminary test. In my opinion, the result is acceptable for such a short text.

And you see what I was saying about word length. You have no 1, 2 or 3 length words so you distribution is way to the right. Voynich is word length 5 centered, not 6.

Look at this output from my generator:

Output tokens: 8274
Output vocabulary size: 2525
Output hapax count: 1551
Voynich comparison tokens: 8519
Voynich vocabulary size: 2436
Voynich hapax count: 1642

So, your vocabulary is in the ballpark, maybe a little light. You add those length 1 and 2 tokens and that'll bring up that vocabulary count and likely bring your word length closer to the Voynich if you keep the token count the same.

Now, look at my top 20 bigrams and trigrams compared to the Voynich. It still needs some tweaking but it's in the ball park. Is yours?

Code:
Top 20 output bigrams (100 pages):

  ho: 2062

  ch: 1689

  sh: 1551

  ai: 1319

  ol: 1241

  he: 1196

  ey: 1037

  in: 989

  da: 884

  ii: 851

  ar: 761

  or: 626

  th: 621

  ct: 611

  od: 579

  hy: 568

  ha: 513

  cp: 451

  ph: 421

  ot: 363

Top 20 Voynich bigrams (100 pages):

  ch: 3082

  ho: 1920

  ai: 1433

  in: 1374

  ii: 1281

  ol: 1239

  he: 1116

  da: 1116

  sh: 1033

  dy: 1024

  hy: 948

  ok: 930

  or: 922

  qo: 764

  ot: 749

  ar: 562

  ey: 554

  od: 548

  ee: 541

  eo: 467

Top 20 output trigrams (100 pages):

  sho: 850

  aii: 736

  hey: 734

  cho: 729

  hol: 706

  iin: 638

  dai: 537

  cth: 529

  cph: 388

  she: 362

  che: 361

  hod: 307

  ckh: 259

  oda: 252

  ain: 249

  hoy: 243

  dar: 190

  cfh: 188

  chy: 187

  har: 184

Top 20 Voynich trigrams (100 pages):

  cho: 1236

  iin: 1181

  aii: 1139

  dai: 728

  che: 718

  hol: 596

  chy: 492

  sho: 467

  hor: 467

  cth: 435

  tch: 420

  kch: 392

  edy: 366

  qok: 359

  she: 295

  hey: 291

  ody: 253

  qot: 251

  otc: 244

  heo: 241

Look at my word length compared to the Voynich then compare it to yours.

[attachment=15819]

Look at my zipf curve compared to the Voynich. Does yours compare? I'm betting that until you fix the word length it'll be a "no".

[attachment=15820]

These are the the very basic numbers a generator needs to be "plausible" with before you can even call it a generator. AND, to be a workable generator it has to be mostly blind about it. I do use a seed page. After that, it generates the rest on it's own just using my ledger. And all the numbers you see mine generate are emergent. I didn't create any code to force a zipf curve. I do have code that manages word length and that's something a scribe could do with their eyeballs.

The big thing you have to think about. If you sat down and tried to create something that looked like the Voynich and YOU were in the 15th century, what "tools" would you need other than a quill, some ink and maybe some ponce to erase things with. Any generator we come up with today needs to be human 15th century doable. Which is why the Naibbe cypher looks great and would technically be human doable but would require dice and cards and lookup tables... extremely cumbersome. Those are the type things you need to consider.

Creating Voynich-like text is easy.

Creating a method that explains how a 15th century scribe could do it, that's the hard part.

Edit: If you want to really see how you're doing, put up some real Voynich comparison charts, word length for word length.

Ran some new tests on my generated code vs the Voynich.

This test measures novelty, predictability, word beginnings, word middles, word endings, and how strongly words cluster into families. It's from some code I developed months ago. It basically trains on half the text then tests on the other half to see how they compare.

In this test the generated code is mostly in the ballpark with the exception of bigram reject. That's basically a measure of, look at the bigrams on one half of the text and then see which of those bigrams are in the other half. In the case of my generator, it's pretty low which means, it's producing too many close relatives. It stays inside the learned families extremely well, but the real herbal text appears to generate more new combinations while still obeying its structural constraints.

BPC is bits per character which measures entropy or how predictable text is. BPC2 uses bigrams, BPC3 trigrams. ΔBPC is the difference between the two.

The ED1 tests look at how many words in the training set are edit distance 1 or copies of words in the test set. Token asks how many words have a close relative in the test set and type asks how many unique words have a close relative. Again, it's off by a bit and is probably related to the bigram reject difference. 97% of the unique training words are ED 1 or 0 to a test word.

And I think I know why these are issues. I tried to keep everything a copy or ED1 from it's source and it almost never generates a random new word. It appears there's a bit more novelty in the Voynich than I'm allowing for.

So again, not perfect. In the neighborhood? Mostly, but needs work.

Metric	Herbal Voynich	RK42	Difference
Bigram reject %	0.5655	0.1044	much lower
BPC2	2.4472	2.8717	higher
BPC3	2.1978	2.5697	higher
ΔBPC	0.2494	0.3020	higher
Initial BPC	3.0692	3.1479	slightly higher
Medial BPC	3.3306	3.4473	slightly higher
Final BPC	2.6014	2.9747	higher
ED≤1 Token	95.7988%	98.9295%	higher
ED≤1 Type	88.0416%	96.9503%	much higher

(28-05-2026, 10:22 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And you see what I was saying about word length. You have no 1, 2 or 3 length words so you distribution is way to the right. Voynich is word length 5 centered, not 6.

Problem solved—not perfect, but good enough.

[attachment=15831]

============================================================
PAGE 49
============================================================

chedyol ckh otecheo otdy otain qok sheol choeeyky
ytchy ykhol okor cheolky oteydy okar chockhy ytol
oteydy ch oldain sheol daar sheol cheesey char
l olal ykol cholody okodar choolchdy qokshey shain
qokchy otalshy oka qoteshy ok daseky ira otair
oteydy aiirody qooraiin cheky chky oleeol ote shey
qotor aiindy oleey chody qotey kal ytal oldyol
ok qo otdy choty oldyol otockey dair otardar
sheekchey qokchedy sheey choindy qocphy qotey ykchy chteey
oteor rot ykeey qokychdy oteal che chchy qokeeydy
qo ch shcthy cheedalaiin aiolky qokaiiin otal aithar
ykchy okear dair oteor sheey qo aiiinol

============================================================
PAGE 50
============================================================

sheol oka ykeydy oka ykchy chear otey aiir
lol yld choolky aiir chear osh oka choldar
charain qochy cheor otdady ol ok shoky oleydy
qokeey ytal ol ytey shdy shkshy chty daiin
chty okoraiin otor chodaiin ollchy olardy darody yteey
ykar cheor ykeey daiin okedy okar okaiin qofaiin
shkear otosey daiin sheey oleydy okeey qokeey qoeeor
ol checphey qochey chindy ch shdchy cheokeor daiin
chey daiir ykhol oltaiin okol ch shedy shechy
chor che che chedy chtain shedy chekydy daiin
qot oteeo qoin qolchy okchy cheol qot oksheo
okshor shol ykdy qotarain okaiin qolshey cheol

(29-05-2026, 05:16 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(28-05-2026, 10:22 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And you see what I was saying about word length. You have no 1, 2 or 3 length words so you distribution is way to the right. Voynich is word length 5 centered, not 6.

Problem solved—not perfect, but good enough.

Now you're looking much more Voynich and you have at least one metric in the ballpark. I'm guessing adding those short words reduced all the long words and brought that chart back closer to reality. You're now matching word length distribution for a natural language. The Voynich tends to have slightly shorter words than a natural language. And, that bell shaped standard distribution should also start to bring your zipf curve into line.

(29-05-2026, 10:52 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And, that bell shaped standard distribution should also start to bring your zipf curve into line.

[attachment=15835]

In conclusion, I would like to clarify once again that my model is based primarily (87.4%) on a table containing 485 entries (syllable combinations). The ledger words only fill in the gaps not covered by short_words and fixed words. They therefore account for only a small percentage, but they are necessary to ensure that all words are generated. Otherwise, the model as a whole would not function properly.

Proportions of word generation:
-----------------------------------------

Short words: 1634 (17.2%)
Common words: 4474 (47.1%)
Common combinations: 2195 (23.1%)
Ledger words: 1197 (12.6%)
Total: 9500

The high proportion of predefined words suggests that the author may indeed have worked with a tables book or combinatorics. While this would be labour-intensive, it would still be feasible for a 15th-century author.

Of course, that doesn't prove the manuscript is meaningless. A table book could also encode a real language or a cipher system.

[attachment=15847]

(26-05-2026, 04:11 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.
(26-05-2026, 03:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.I prefer to analyze the text the scribe actually wrote rather than speculate about the texts he didn't write — since that number is endless.

I will politely disagree with this statement. A model does not just have to explain what the scribe wrote. It also has to explain why the text did not fall apart into garbage. A lot of objections in these discussions are really variations of “why don’t we see this?” or “why didn’t the scribe do that?” They are questions about constraint.

In my own experiments, most copy/mutate systems fail pretty quickly unless the constraints are tight enough. They drift, bloat, repeat badly, or start producing obvious nonsense. And, given enough generations, even my generator has issues. That failure is informative. It tells us the Voynich text is not just “anything goes” mutation. So I think the unwritten text matters too. If a proposed method could easily generate endless bad Voynichese, then we need to explain why the manuscript consistently avoided doing exactly that.

If we don't explore the possibilities of what wasn't created then we'll never have acceptance for what is created.

You raise a fair point — but it's a different question from the one I answered. Zandbergen asked why some unspecified modifications are absent. I said: because the scribe took one path and not another, and we can only analyze the path taken. That stands.

Your question is different: why doesn't the text fall apart into garbage? That's a question about constraint, and it has a straightforward answer.

The copying process inherits structure from the source. Each new word is a modification of an existing word that already has valid structure. A modification of a valid word almost always produces another valid word — because the stroke-level constraints are preserved by the modifications taken. The structure doesn't need to be maintained by external rules — it's carried forward by the copying process itself.

And the scribe has eyes. If a modification produces something that looks wrong, he sees it and doesn't write it. Your own "don't look stupid guardrails" describe this exactly. The scribe isn't following a ledger. He's looking at what he produced and judging whether it looks acceptable. Human aesthetic judgment is the constraint.

His visual judgment follows the stroke-level structure of the writing system. Schwerdtfeger described in 2008 four design rules: (1) line-glyphs can follow line-glyphs or 'a'; (2) curve-glyphs and 'a' can follow curve-glyphs; (3) the 'l'-glyph can be used as a curve-glyph or as a line-glyph; and (4) gallows glyphs count as curve glyphs (see Timm & Schinner 2020, p. 10). These aren't rules the scribe memorized — they're descriptions of what visual similarity produces. A scribe modifying strokes naturally stays within these constraints because crossing them would look wrong to his eye.

This is also why your generator struggles — and why any generator will. For the human mind it is difficult not to repeat itself. The variation comes from imperfect repetition — not from randomness. The consistency comes from habitual patterns — not from explicit rules. A computer needs both dice (for variation) and rules (for consistency) as separate mechanisms. A human produces both from one source — his own cognitive habits. That's why modeling a human scribe in Python is, as you put it, "a nightmare." You can't model the human mind like a computer — because the human produces variation and structure from the same process, while the computer needs separate mechanisms for each.

The manuscript doesn't fall apart because every word was seen and accepted by a human before it became part of the source pool for the next word. The constraints are the pattern recognition abilities and the habits of the scribe himself.

(30-05-2026, 07:30 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The manuscript doesn't fall apart because every word was seen and accepted by a human before it became part of the source pool for the next word. The constraints are the pattern recognition abilities and the habits of the scribe himself.

I couldn't agree with you more. And again, that brings human psychology into the equation. Which patterns? And then, how do you model those patterns? At what point do the weights and rules become more than a 15th century scribe can do? That's where I am now.

For examaple:

In the other post on <ed> I mentioned I did some machine learning, trying to find syllables. I refined that down to where only 3 words in Takahashi and 14 in Zandbergen/Landini can't be mapped has having good, learnable n-grams in Scribe 1 content. It came up with a count of 150 main n-grams and 50 more that that act as connectors. I added that learned chunk data into my generator and it produced the code blocks I pasted below.

The ledger model still attempts to do a mutation and validates it. The new word is then, it's passed off to the chunk model to see if it contains valid n-gram chunks with valid neighbors. It looks more Voynich like that previous runs and still matches statistics pretty well.

But, is this still human doable. I think so but, it may require more than just the scribe's eyes . There may have been an audible component, where the scribes were familiar enough with the output tokens that they pronounced them. I know I pronounce Voynich words to see if they sound right. I pronounce words as I write them. Might a Voynich scribe do the same?

Code:
<f1v>

<f1v.P1.1;K>        fchol.shaiin.odad.fchol.odd.ose.cfhealy.chol

<f1v.P1.2;K>        chotey.yoo.or.syaiir.cther.ckhea.chtain

<f1v.P1.3;K>        pchol.ycheeo.chol.shodd.pose.odad

<f1v.P1.4;K>        pydaey.okain.yor.pchol.oydar.pchol.kshiy.yorr

<f1v.P1.5;K>        fchol.os.fchol.s.daiin

<f1v.P1.6;K>        osdar.okod.qos.daiim.pshol.ycheeo

<f1v.P1.7;K>        oar.okeol.oateos.opeos.daiin.fchol.chool.mor

<f1v.P1.8;K>        os.chol.oos.pchol.kaiin

<f1v.P1.9;K>        cfhoain.fcho.chal.rokain.cchol.fchol.cthar.s

<f1v.P1.10;K>        kor.chol.chols.ydain.s.lchol.sshey

<f1v.P1.11;K>        chool.diiin.fchol.kor.s.fchol.yocho

<f25r>

<f25r.P1.1;K>        ros.dloctha.sory.dain.eey.dloctha.koshey.shol.daraii

<f25r.P1.2;K>        kshy.dlocta.ycho.kshy.dain.ikan

<f25r.P1.3;K>        cthooldar.chodaiils.cthy.okaiirg.chodaiil.ctholdar

<f25r.P1.4;K>        chotey.odar.cfhol.csol.shory.dloctha.ecthoary

<f25r.P1.5;K>        kdail.dlocta.dlocfha.cthey.dair.dain.daid

<f25r.P1.6;K>        cphoy.dlocta.dloctha.tchol.dloctha.cthoary

<f25r.P2.7;K>        okar.chal.dair.cphoy.cfal.dlocta.okaiir.eos

<f25r.P2.8;K>        oycheey.cphodaiil.daire.chhealy.chealy

<f25r.P2.9;K>        oar.tchol.dlocta.tchol.dlocfha.dlocpha.dlocka.dlocpha.cphealy

<f25r.P2.10;K>      kaiirg.eckhoary.fchol.tchol.dlocphe.dair

<f25r.P2.11;K>      kail.dlocka.os.pchil.otchol.dlocka.ckdey.pchol.dsory

<f25r.P2.12;K>      dlocpha.pchol.cthol.oaky.chodaiila.ctoil

<f50v>

<f50v.P1.1;K>        dan.cphoy.chor.dshkoldy.dan

<f50v.P1.2;K>        kyey.teody.ydain.cphar.dkchar.y

<f50v.P1.3;K>        cfhol.cphesaiin.chor.dshpoldy.sckhey

<f50v.P1.4;K>        cshyds.kodshey.cphoy.cthar.chor.kon.choydr.dain

<f50v.P1.5;K>        oas.chol.cphar.shok.shon.otol

<f50v.P1.6;K>        shos.rchol.oiin.scphey.ooiin.sho.cphar.oydar

<f50v.P2.7;K>        y.ro.oeykoal.rchol.kchol.fchol.chodor

<f50v.P2.8;K>        rol.olys.ytais.eolys.dshpoldy.cthey.y.sho

<f50v.P2.9;K>        okol.fchol.dshkoldy.y.dshpoldy

<f50v.P2.10;K>      cfhyl.chor.ytain.stchey.chsy

<f50v.P2.11;K>      eoe.ckhar.rchol.dshpolday.ddchar.oteos

<f50v.P2.12;K>      cphey.ckhey.oekoal.oteol.chey.stchey.cfhol

<f50v.P2.13;K>      y.oas.chey.chol.alg

<f50v.P2.14;K>      pydeeyl.rol

Page-count rule: output pages = Voynich pages = 100

Voynich pages available: 225

Voynich pages compared: 100

Generated pages: 100

Output tokens: 8234

Output vocabulary size: 2463

Output hapax count: 1557

Voynich comparison tokens: 8519

Voynich vocabulary size: 2436

Voynich hapax count: 1642

Ledger invalid: 0

Operation counts:

  external_copy: 1065

  external_copy_fallback: 14

  external_ed1: 1961

  external_short_copy: 13

  initial_gallows_construct: 88

  initial_gallows_seed: 10

  local_copy: 951

  local_copy_fallback: 164

  local_ed1: 1954

  novel_ed1: 1830

  seed: 184

Gallows action counts:

  attested_ed1: 2903

  gallows_delete: 54

  gallows_insert: 63

  gallows_substitute_in: 35

  gallows_substitute_out: 121

  gallows_swap: 162

  initial_construct: 88

  initial_seed: 10

Top 20 output bigrams (100 pages):

  ho: 2214

  ch: 1934

  he: 1295

  sh: 1274

  ol: 1254

  ai: 1116

  ey: 997

  in: 887

  da: 815

  th: 778

  ct: 749

  or: 739

  ii: 720

  ar: 685

  ha: 510

  od: 499

  hy: 468

  cp: 382

  eo: 373

  ee: 368

Top 20 Voynich bigrams (100 pages):

  ch: 3082

  ho: 1920

  ai: 1433

  in: 1374

  ii: 1281

  ol: 1239

  he: 1116

  da: 1116

  sh: 1033

  dy: 1024

  hy: 948

  ok: 930

  or: 922

  qo: 764

  ot: 749

  ar: 562

  ey: 554

  od: 548

  ee: 541

  eo: 467

Top 20 output trigrams (100 pages):

  cho: 969

  hol: 836

  sho: 702

  cth: 682

  hey: 677

  aii: 571

  dai: 520

  iin: 505

  che: 497

  cph: 352

  ain: 333

  she: 332

  tho: 275

  hod: 262

  oda: 226

  kch: 221

  ckh: 218

  eey: 204

  cfh: 186

  dar: 181

Top 20 Voynich trigrams (100 pages):

  cho: 1236

  iin: 1181

  aii: 1139

  dai: 728

  che: 718

  hol: 596

  chy: 492

  sho: 467

  hor: 467

  cth: 435

  tch: 420

  kch: 392

  edy: 366

  qok: 359

  she: 295

  hey: 291

  ody: 253

  qot: 251

  otc: 244

  heo: 241

After generating 1000 pages, it starts to look less Voynich-like but still has familiar patterns.

Code:
<f500v>

<f500v.P1.1;K>      shokcheey.ods.lochy.ol.dar.cfhaiin.shkaiir.moaokchy

<f500v.P1.2;K>      opsho.cfhol.kooishey.ol.cfhaiin.ol

<f500v.P1.3;K>      oyr.maokchy.kachys.ool.chol.oal.chol.shoasho

<f500v.P2.4;K>      fool.oal.keishey.oalg.cthol.lochy.y.ol

<f500v.P2.5;K>      maokhy.cfhol.dain.ool.oal.cpod

<f500v.P2.6;K>      oaokchy.ol.ool.ol.oaikdon

<f500v.P2.7;K>      cphey.y.shear.ord.ckhaiin.shtasho.cthey.ckhhaeiin.oaidoon

<f500v.P2.8;K>      ool.maokhg.okchey.okeey.sor.cpod.lol

<f500v.P2.9;K>      old.deydd.cthol.she.dain

<f500v.P2.10;K>      ol.shey.dair.kol.ols.ostairin.tol.roe.ol

<f500v.P2.11;K>      sheo.old.cfhol.shory.ol.cod.shyoolso.ror.cfhoaiin

<f500v.P2.12;K>      maokhy.ois.shoolsho.lshoolsho.oil.toche.skaiir

And here's page 10,000. I assumed if the model was going to collapse, it would have done so by this point. I can't provide any stats to compare to Voynich other than to say it's a bit heavy on longer words which could be adjusted.

Code:
<f5000v>

<f5000v.P1.1;K>      kore.dain.shey.chy.otaiin.qosoier.kore

<f5000v.P1.2;K>      raiisoey.cthal.cthodaiils.pdeiimy.airg.qosoilr.ylsoier

<f5000v.P1.3;K>      dain.chal.y.chal.oor.chodain.cthodaiils.odaiin

<f5000v.P1.4;K>      dpdaeim.cthy.ore.qosoier.daicthy.y.kshy

<f5000v.P1.5;K>      qosoier.odar.dain.kshy.olsdeoey.eefy

<f5000v.P1.6;K>      chy.dain.oor.qoosoieor.dair.lshytoo.cthodaiils.qore

<f5000v.P1.7;K>      dain.airg.cphey.kodaiin.dair.alshyoe

<f5000v.P2.8;K>      fairg.s.sshyood.qosoilr.chok

<f5000v.P2.9;K>      sory.olsoior.olsooer.dair.alshyey.alshiey

<f5000v.P2.10;K>    fair.cthodaiils.fairg.olsoier.oor.sshysod.dair.pachys.s

<f5000v.P2.11;K>    oir.alshyoe.alshyee.qosoier.shey.chok.olsoar.ckhealy

(30-05-2026, 04:28 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Of course, that doesn't prove the manuscript is meaningless. A table book could also encode a real language or a cipher system.

Ran some of my tests on your output. Hope you don't mind a critical analysis.

Test	His Output	Takahashi Slice	Comment
Tokens	9,447	9,447	Matched sample size
Types	1,757	2,739	Far fewer unique words
Hapax	683	1,856	Far fewer one-off words
Type/Token Ratio	18.6%	29.0%	More repetitive vocabulary
Hapax / Type Share	38.9%	67.8%	Major novelty deficit
Mean Word Length	5.04	4.87	Very close
Zipf Slope	-1.021	-0.841	Too steep. Common words dominate too much
Length Distribution JSD	0.0047	-	Excellent match
ED≤1 Token Hit	89.55%	-	Lower than expected
ED≤1 Type Hit	77.52%	-	Weak word-family structure
Bigram Reject Rate	2.94%	-	Reasonably good
Trigram Overlap	79.84%	-	Moderate match

My eyeballs say your text looks good, almost too good perhaps. The Voynich has it's own license on weird. Which I suspect is why your hapax and novelty are low.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19