The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9

I prefer giving people benefit of the doubt, so to me there is still possibility that this is a carefully designed and thorough attempt at identifying some structure in the MS, just with a few technical problems and somewhat badly explained too. But for this to be true, for the attempt to actually turn out thorough and well designed, the paper should greatly outperform the presentation given so far. So far I haven't seen anything that would make me curious enough to attempt replicating this beyond the first couple of steps. And normally I'm very curious about new Voynich theories and ideas. I think I was the first in this thread to actually run parts of the provided Python code and report on it, including running it with a different seed and comparing the results of the first model, and I immediately posted a few questions. I haven't got a single satisfactory answer from the topic starter to any of my questions, just some generic or contradictory replies which made no sense to me. Maybe it's a communication issue. I'll wait for the paper, but my hopes are not high.

(27-04-2025, 03:18 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Two small issues in preprocess_eva_seed.py:

1. It extracts at token_index = 0 the folio ID ($F) from the page header line:

f1r,0,a,0
...
f1v,0,a,0
...
f2r,0,b,228
...
f2v,0,b,228

The test line.strip().startswith("<!") doesn't work because line includes the line header.

2. It keeps comments between "<!" and ">" that should be skipped. For example:
f1r,13,doodle,13

Note: the removal of comments would also fix the page header issue because the page header is a comment. Something like this would work: line = line.remove(r"<!.*?>")

For checking the stability of results you could try You are not allowed to view links. Register or Login to view. the "reference" transliteration (no comments, no extended EVA). The Takeshi Takahashi transliteration is outdated (1999).

Yes! I temporarily closed GitHub to make some updates. I have already fixed the preprocessing. I will post it here shortly.

Approaching the question from a new perspective has been highly enriching. It is always valuable to start from a dialectical process as the engine of scientific inquiry.

Leaving behind the hypothesis of lunar phases, which might have involved a certain degree of arbitrariness, as Radio said, we have completely reformulated the analysis toward a model of thematic structure based on clustering pages, aiming to determine whether macroblocks exist. Also, we fixed the pre-processing code to clean EVA as Nab suggested.

By applying a systematic pipeline that combines thematic modeling (LDA, NMF), hierarchical clustering, and strict statistical validations (Random Forest, Permutation Test, Bootstrap, Hopkins Statistic), along with comparative controls using authentic and random texts, the results are pretty interesting.

Without forcing any parameters, the pages of the Voynich Manuscript tend to cluster spontaneously into three large macroblocks, which notably coincide with the traditional divisions that have been proposed for a long time (Botany, Zodiac/Astronomy, Pharmacology, and Recipes). Each section has different repeated words.

The macroblocks are derived automatically from the page topic distributions (LDA) combined with hierarchical clustering, and are not manually imposed.

These groupings do not appear in the control texts (a point I have taken into account very much following earlier discussions) or in synthetic random controls, and they disappear when the text is shuffled. This may support the hypothesis that there is a real and coherent internal structure within the Voynich Manuscript, likely of a thematic nature.

I maintain that the existence of blocks that vanish when permutation or ablation tests are applied indicates a structured linguistic intention.

As with any analytical process, errors are always possible, but that is the beauty of dialectical progress: advancing together as a community to explore new directions.

I will share the complete pipeline shortly.

Hi Urtx13,

I only skimmed this thread but maybe you can address this question i have based on the code in the github:

Is this summary correct:
You seem to train a TF classifier model on the complete voynich, all folios, with some value added to each folio during training.
Then you test the model and all the folios come back correctly identified by the classifier model.

This is not how you normally test classifiers. Normally when you have limited example data, you remove part of the data from the training, and validate that your classifier trained on the other data is classifying the removed parts correctly. When you present a classifier with data it was trained on, it will give you the classification it was trained to produce. Maybe this process is hidden in one of the libraries you call, but i dont see it happen inside your code itself.

Can you elaborate on this?

(29-04-2025, 02:32 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.Hi Urtx13,

I only skimmed this thread but maybe you can address this question i have based on the code in the github:

Is this summary correct:
You seem to train a TF classifier model on the complete voynich, all folios, with some value added to each folio during training.
Then you test the model and all the folios come back correctly identified by the classifier model.

This is not how you normally test classifiers. Normally when you have limited example data, you remove part of the data from the training, and validate that your classifier trained on the other data is classifying the removed parts correctly. When you present a classifier with data it was trained on, it will give you the classification it was trained to produce. Maybe this process is hidden in one of the libraries you call, but i dont see it happen inside your code itself.

Can you elaborate on this?

Yes, you are completely correct. That is why I am doing some changes to the code. Since the aim may differ a little bit I oponed another topic. Check it please!

I'm a maths ignoramus so finding it hard to follow the details of this theory, but I have a question - if the folios are not in their original order but have been reshuffled over the years, does the cyclical pattern still hold up?

(29-04-2025, 04:40 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.I'm a maths ignoramus so finding it hard to follow the details of this theory, but I have a question - if the folios are not in their original order but have been reshuffled over the years, does the cyclical pattern still hold up?

No worries!

When we shuffle the text, the pattern disappears. But when we analyze the original manuscript, the pattern re-emerges clearly. This suggests that the thematic structure follows an internal logic that withstands the physical disorder of the folios. So even if the current folio order is not original, the cyclical thematic structure still emerges robustly.

So I come from a non-code background and I struggle with math. I’m not an expert in linguist. I’m not an expert in anything really I’m just a teacher . I likely have nothing really to offer here, but I do like to ask questions and to understand things . I also am fairly new to the community, but I’ve also sat and read as much as I could in the last couple weeks.
This is hard for me because there’s a lot of back-and-forth and some of its questions is of its arguments, but from what I can see I think you changed your stance but I’m not quite sure could you clarify ? Have you gone from four lunar phases to three main ideas ?

Also - You said “the pages of the Voynich Manuscript tend to cluster spontaneously into three large macroblocks, which notably coincide with the traditional divisions that have been proposed for a long time”

How do they spontaneously cluster? Is it by token is a by line I’d love to know how they do that according to the data that you have and also what are the three large micro blocks ? What are their shape or shape because I don’t like math ..What’s their flavor? I guess .

The tools are using is it a algorithm? Is it a python code or is it an AI when you train your AI? How do you train it like is it ChatGPT or like are you just getting in there and actually coding an AI?
I’m interested in your process- using technology etc.

Also I edited this like 5 times because I am struggling with this forums text box lol.

Have my own theory I’ve been working on for the past two months. I don’t really wanna talk much about it cause it’s nowhere near ready. Learning how others do things and see things is How scientific study works after all.

Welcome to the forum!
Just a heads up, I've done some mild statistics work before and I still don't know what's happening in this thread...

It looks like while I've been trying to wrap my head around the earlier hypothesis based on four phases, it may have been superseded by a new hypothesis based on three macroblocks.

Still, is this a fair layperson's summary of the earlier hypothesis, stripped of the methodology used to develop it? -->

"The text of the Voynich Manuscript cycles repeatedly through four distinct phases with identifiable features, producing a kind of large-scale textual rhythm. Shuffling the folios out of order destroys this cycle, which thus appears to be a real larger-scale structural pattern."

After reading through the thread (but not checking out the Github repo, which has been offline), I'm still unclear on the period of this proposed cycle. There's talk of identifying the phases of individual "folios," but parts of the discussion make me think this may really refer to pages (such as f1r) rather than to literal folios (such as f1, including You are not allowed to view links. Register or Login to view. and f1v). And then:

(24-04-2025, 10:41 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.These cycles show up between folios and inside them, and they break when shuffled. So it’s not just the order — it’s baked into the content.

All this leaves me uncertain as to the cycle's period: is there one phase per page, or per folio, or can pages contain multiple phases, or....?

Also, as to the shuffling: was this carried out on individual pages (f1r, f1v, f2r, f2v, f3r, etc.), or whole folios (f1, f2, f3, f4, f5), or bifolios (i.e., keeping the bifolios themselves intact, but organizing and nesting them differently)? I wonder whether this makes a difference, bearing in mind Oshfdk's suggestion:

(24-04-2025, 11:52 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.For example, there can be differences in how a page was structured based on where in a quire it appears, which could produce cyclical features.

Urtx13 wrote (possibly in reply to that last quotation):

(24-04-2025, 12:02 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.We also checked within individual folios, to rule out recto/verso or binding effects. The structure persists locally.

I'm curious what form this last experiment took. What specifically was ruled out, and what patterning persisted at what local level?

Pages: 1 2 3 4 5 6 7 8 9

oshfdk

Urtx13

Urtx13

davidd

Urtx13

Pepper

Urtx13

anyasophira

Koen G

pfeaster