The Voynich Ninja

Full Version: Hoax theory discovery by running lang analysis program according to my methodology
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
The proposed methodology aims to discover the exact processes with which VMS might have been created if it is made for the sole purpose of impresisng people back then and possibly generating life changing income.

(I wont be able to run any programs for the below protocol so this is why Im posting this here)

As per You are not allowed to view links. Register or Login to view.

..apparently the deviation is significant eg. 2.5 for Voynichese compared to 3.5-4+ for Latin or Greek and other such metrics.

Assuming:

-> When copying and glyphizing they might want to further obfuscate, eg cutting out words, we standardize the scribes' behavior into potential patterns of 'organic transformation'

Methodology:

First data set - Take a body of texts they might have had access too, ideal assumption, texts plausibly available to a Central European scribe in the early 15th century, Latin herbals, Arabic medical texts, Hebrew manuscripts, and similar period-appropriate works.

Second set - broader - assuming we may not know precisely what materials the scribes had access to -  casting a wider net across contemporaneous written traditions.

Run simulations against VMS and preferably whole and in parts by scribe author, as well as permutations, to find patterns of 'organic transformation' potentially done by the scribes that yield Voynichese 2.5 or other metrics from transforming the originals using various patterns.

Expected results:

We might find out an array of algorithms that are logical to a human scribe that produce the 2.5 distribution instead of 3.5 or 4+. Some of them look UNCANNILY PROBABLE EG skip first 3 lines and last 3 of a standard page over a 40 page window surprisingly yields this. Bonus points if we find out scribe 2 taught scribe 3 their tactic and 1 and 4 had their separate method, and they both yield 2.5 by copying the middle of the page in a circle.

Potential difficulties: 1. Too many probable patterns yield it. 2. Can't discern scribes (small difficulty)

Finalizing: potentially get a hoax hypothesis that is A to Z(pun intended) a functional hypothesis, describing the full process of the scribe crafting it for the gold grab, that actually makes sense because of their psychology when creating the VMS.


I hope for someone here to run it and let us know the results?
(31-05-2026, 04:20 PM)maskci Wrote: You are not allowed to view links. Register or Login to view.Run simulations against VMS and preferably whole and in parts by scribe author, as well as permutations, to find patterns of 'organic transformation' potentially done by the scribes that yield Voynichese 2.5 or other metrics from transforming the originals using various patterns.

I'd be happy to run simulations if I knew what 'organic transformation' means.

(31-05-2026, 04:20 PM)maskci Wrote: You are not allowed to view links. Register or Login to view.Some of them look UNCANNILY PROBABLE EG skip first 3 lines and last 3 of a standard page over a 40 page window surprisingly yields this.

Really? Why?
(31-05-2026, 04:20 PM)maskci Wrote: You are not allowed to view links. Register or Login to view.I hope for someone here to run it and let us know the results?

Run what? Simulations? Algorithms? 

It feels like this theory is basically "has anyone tried to work out how voynichese may have been written? maybe that will solve it".
(31-05-2026, 06:33 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I'd be happy to run simulations if I knew what  'organic transformation' means.

High probability transformation methods (possibly low effort to optimize energy expenditure) from original source to Voynichese, that are potentially easy and intuitive.

A hoaxer's goal is to fill pages convincingly enough to get paid, and it is a question whether quickly. Their education and experience must have been thorough and extensive. So we can also assume they know perfectly well how to fake texts, feeling confident about it, optimizing for speed and not making mistakes in the process, might be safe and effective for them.

We could run a sort of a preliminary, take some key things the source database has, and try to see what kind of select pieces of data already produce something closer to Voynichese, maybe only taking first letter of a word, maybe last, maybe only both, maybe skipping lines. Pre removing words below x character count.

Id love to run tests myself but can't now.

I'd try to see the degree of movement from the 3.5-4.5 entropy based on what simple transformation, of the ones I mentioned, to the original db is done. Then I'd think of how to run it against VMS to try to derive the exact method they used.

The key is creating a couple databases of text samples they could have used to work in a team across such scale, to turn everything into glyphs. It seems they didn't go letter for letter, and possibly wanted to optimize for speed, so the methods suggested by others who studied entropy seem very arduous to an experienced team, unnecessary. From what I gather certain parts seem rushed, below standard quality, and some aspects especially the flashy ones, feature amazing levels of detail and craftsmanship. I know this is my impression, but I believe this is simply to tick all the boxes of the rich patrons, it has so much, and something exceptional in every category. But most of it is exactly just good enough to pass all tests.

I'm not sure if I forgot to emphasize - the most important aspect of the main databse for my proposed study is preparing the data by pages, rather than a body of text. Assuming they would read and copy from the sources page by page. So, this db needs to have fields like page number, line, and analyze it from the organic perspective of scribes in a team.

Not only analyzing the text wholly, but by transformations from the perspective of a scribe looking at page A and B in all configurations. Also A->B transition might be a variant, not just page by page, but integrsting the methods, trying to predict their supposed working environment.
(31-05-2026, 06:34 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.It feels like this theory is basically "has anyone tried to work out how voynichese may have been written? maybe that will solve it".
It remains significant to me that Timm and Schinner said:
Quote:Keep in mind that the VMS was created by a human writer who had complete freedom to vary some details of the generating algorithm on the spur of a moment. An exact reproduction of all of his/her mental rules is not only most likely impossible, but would still leave the problem of unpredictable random (aesthetic) decisions.
and
Quote:Most likely, it is impossible to devise an exact mathematical proof that an arbitrary set of strings is truly meaningless, or not. This would involve a general method to compute upper boundaries to the Kolmogorov complexity.
Plucking a scholarly work's concessio statements like this borders on strawman, so let me disclaim they are embedded in larger points worth considering. But the paper that launched this ship casts serious doubt on the idea that there is even a strict algorithm to be found. Without fully devolving into a semantic debate, I have some reservations calling the process that generated the Voynich "the generating algorithm" if we think a significant part of it is likely to be aesthetic decisions, though I take the main direction of the authors' point, that it's one thing to claim that this supports the idea that auto-citation can create patterns we see in the manuscript, but its another for them to have to prove the manuscript was generated that way down to a very refined level of detail.

This distinction between a strict algorithm and a copy process that can be algorithmically modelled becomes much more acute if you're hoping to make fine-grained identifications. (Or if you're hoping other people will, as the case may be.) It may well be that this is possible, and that Timm and Schinner conceded too much, so people are free to take this on as a challenge. After all, arguing stronger claims were possible is a whole genre in academic writing. But people should heed where, and more importantly why, they limited their conclusions if they are hoping to extend them.
Your methodology is somewhat lacking in detail, as it stands it would be difficult to "run".

) "The key is creating a couple databases of text samples"
   --Please do so , afaik there are few of these, OCR some relevant texts, correct them , decide what to do about abbreviations if any, etc.

) "High probability transformation methods (possibly low effort to optimize energy expenditure)"
   --How is this to be quantified?,  what algorithms should / could be used?
  --If by "energy expenditure" you mean biophysical energy then how do we answer the question 'how many Calories does it take to write daiin with a quill pen?'

) "We could run a sort of a preliminary, take some key things the source database has,"
 --'sort of',  'some key things'  these terms are not very helpful, please specify.

) "produce something closer to Voynichese."
  --by what metric?  There are many statistics for Voynichese, which ones do you suggest ?

Hopefully  you can add some more detail to your method for it to be followed.
(31-05-2026, 08:08 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.) "The key is creating a couple databases of text samples"
   --Please do so , afaik there are few of these, OCR some relevant texts, correct them , decide what to do about abbreviations if any, etc.

) "High probability transformation methods (possibly low effort to optimize energy expenditure)"
   --How is this to be quantified?,  what algorithms should / could be used?
  --If by "energy expenditure" you mean biophysical energy then how do we answer the question 'how many Calories does it take to write daiin with a quill pen?'

) "We could run a sort of a preliminary, take some key things the source database has,"
 --'sort of',  'some key things'  these terms are not very helpful, please specify.

) "produce something closer to Voynichese."
  --by what metric?  There are many statistics for Voynichese, which ones do you suggest ?

Hopefully  you can add some more detail to your method for it to be followed.

I will think about a properly detailed plan soon, but here are some specifics about what I already thought of. 

1. The first stage would be to understand texts scribes had access to, sources of inspiration, collect by country, gather the data, clean it up. Use this as per country, add some simulation if we need more data points of course, or date back known texts based on reference.

The original database should contain bifolia, and work based on that presumption.

2. The second stage would involve working with this database, by running entropy and NLP tests for these texts, on a bifolia basis, they get processed 1 by 1, but under the category of 1 book by book
A. Two variables, remove all words shorter than x, longer than y, run it until exhaustion and curves are produced.
B. First Letter Remains, Last Letter Remains, Length based with variable deviation
C. A+B runtime, get the curves.

3. The third stage would be to take the top cases produced by point 2 and run them against the VM.
Do it sequentially: a bifolio of VM [bound to] bifolio of source book, analysis. Use all possible datasets of the books running them by VM bifolia, to try to find a glyphizing matrix.

This way we get a set of possibilities, 3.1. glyph per word unprocessed, 3.2. Glyph per word processed, 3.3. glyph per characters, and 3.4. Glyph per character separately.

Common combo in VM - try to match it similarly treating these combos as glyphs. 

Just getting the best results and trying to find even singular rules. Where something matches close to perfectly.

There are closest matches in this methodology and they prove or disprove this theory. 
Then it is about tweaking the parameters, maybe change how they read, top to bottom, right to left, think of more ways what was easiest to them, how they divided labor.

Example of a small breakthrough: VM pages type scribe 1 type dialect A, >against eg. Specific Latin book pages 40-42, in variant 2.B.2.<, yield 50 counts of character Z matching against 49 counts in the Source Book, 40 counts of combo Y, matching 40 patterns in Source book, 20 counts of combo X... and so on.
You make a hidden assumption that the Voynich author took some existing text and transformed it to get Voynich Manuscript.

It doesn't have to be true, even if VM is a hoax. The author didn't need any text as his starting point. He could just "freestyle", create a stream of words without any inspiration.

And even if he had an inspiration, we don't know the nature of his transformation. So how could we recognize it? 

You suggest some methods in the latest post but in a rather loose way. Maybe try to refine them and use them with some selected text.
Is this to prove gibberish or prove meaning ? Just curious.

Also-First data set - Take a body of texts they might have had access too, ideal assumption, texts plausibly available to a Central European scribe in the early 15th century, Latin herbals, Arabic medical texts, Hebrew manuscripts, and similar period-appropriate works.

That first data set your describing sounds incredible- but “take a body of texts” makes it sound like there are hundreds of transcribed manuscripts out there with corrected ocr and correct abbreviations ready to process.  Texts in  Latin, local dialects , Hebrew, etc.  Unless I am mistaken- this is the actual
Work that takes time and must be done carefully. Your step one is more like step 1-100. Just because a lot of manuscripts have pictures online doesn’t mean the text and ready to process.
(31-05-2026, 08:01 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.But the paper that launched this ship casts serious doubt on the idea that there is even a strict algorithm to be found.

I agree with this point. 

A strict algorithm could be found if it involved not generating a meaningless text on the fly, but a transformation of a meaningful text. And this may still be only in theory.

To use a parallel: 
homophonic ciphers are substitution ciphers where (among other things) the encoder has a choice, for each character, between several code symbols. He can pick one at will.
This means that, when having the code table available, the encoding is non-deterministic, while the decoding is fully deterministic. The encoding is stilll strict, in a way, but it will not be possible to set up a machine that creates the same cipher text as any known (historical) example.

Setting up such a machine, that successfully emulates the human behaviour in spite of this problem, is not proof of method. I rather see it as evidence for successful tuning to the result.