![]() |
Could synthetic “fake Voynich” tests tell us what we’re dealing with? - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Could synthetic “fake Voynich” tests tell us what we’re dealing with? (/thread-4920.html) |
Could synthetic “fake Voynich” tests tell us what we’re dealing with? - Phoenixjn - 09-09-2025 Hello, Rather than trying to decode the VMS outright, I think we can certainly answer other questions about the MS with the help of AI. For example, we should be able to decide with high probability whether the MS has real meaning or not. And I propose a system for doing that. My own take: what best fits the evidence is that someone commissioned the VMS to be exactly what it is: a mysterious, impressive book that nobody can read because it contains no meaning. This is highly plausible and fits with the physical evidence, linguistic evidence and cultural evidence of the time. I believe it's a pseudo-text, made to look like a language, without actual meaning, for the purpose of impressing people, and I think AI can help us decide one way or another. The question to explore is what kind of system is this text most like? The idea is to generate large sets of synthetic manuscripts under different assumptions and see which "universe" the Voynich statistically belongs to. For example:
Then we can measure each synthetic universe against a Voynich "fingerprint panel" (word shapes, entropy, Zipf’s law, affix patterns, section differences, etc.). Rather than asking what does it say? this approach asks "what system is it most like?" If structured pseudo-language consistently fits better than ciphered Latin or conlang universes, that's powerful evidence. This wouldn’t solve the translation, but it would be an important step in understanding the MS and it would be one box checked off. Does this kind of “synthetic benchmarking” sound worth trying? Has anyone attempted something like this at scale? Anyway, here's where AI did a lot of the work in building an outline for how the experiment might go with only off-the-shelf tools. The goal is to see which universe (ciphered language, real language, conlang, structured pseudo-language, shorthand/abbreviation, etc.) best reproduces the Voynich’s full statistical “fingerprint.” No, I don't have expertise in this kind of research. I'm only seeing where AI can point to help us check off some boxes and let those with the expertise run with it. 1) Define the universes (generate many fakes) Make 200–2,000 synthetic manuscripts, each matched in length and page/line structure to the VM. Each fake follows one hypothesis with tunable knobs: A. Ciphered Natural Language
B. Real Language (no cipher) shaped to VM layout
C. Conlang (meaningful but invented)
D. Structured Pseudo-Language (no semantics)
E. Shorthand/Abbreviation Universe
2) Build the Voynich “fingerprint panel” Compute the same metrics for the true VM and for every synthetic manuscript: Token/Type structure
Local dependencies
Morphology & segmentation
Positional/structural signals
Compressibility / model fit
Clustering/embedding
3) Scoring: which universe fits best? Use multiple, complementary criteria:
4) Robustness checks
5) Tooling (all off-the-shelf)
6) Workflow & timeline (lean team) Week 1–2: Data wrangling (VM EVA, Latin/Italian corpora), page/line schema, metric code scaffolding Week 3–6: Implement generators A–E; unit tests; produce first 500 synthetics Week 7–8: Compute full fingerprint panel; initial ranking Week 9–10: ABC fitting per universe; robustness/ablations Week 11–12: Write-up, plots, release code & datasets (repro pack) 7) Readouts you can trust (what “success” looks like)
8) “Citizen-science” version (solo, laptop-friendly)
9) Pitfalls & how to avoid them
RE: Could synthetic “fake Voynich” tests tell us what we’re dealing with? - oshfdk - 09-09-2025 (09-09-2025, 03:34 PM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.Does this kind of “synthetic benchmarking” sound worth trying? No, at least not to me. Many reasons, but the primary ones are: 1) Modern AIs appear to be all-knowing universal systems that can reliably produce high quality content across many domains, but they can't and they won't. 2) Moreover, even if this experiment was possible and we got a magic machine that would produce, say, 100 enciphered Latin herbals, and 100 enciphered Arabic herbals, and 100 pseudo language herbals, and we got lucky and found out that according to statistical metric A the Voynich MS is 80% Latin herbal, 10% Arabic herbal and 10% pseudo language, what are we going to do with this information? I see absolutely no use for it. You certainly won't be able to exclude that the Voynich MS is X or Y based on the similarity argument, the maximum you can say it's not very similar to X or Y. It still could be X or Y, just an unusual specimen of X or Y. So, no actual useful information. RE: Could synthetic “fake Voynich” tests tell us what we’re dealing with? - Phoenixjn - 10-09-2025 (09-09-2025, 05:03 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(09-09-2025, 03:34 PM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.Does this kind of “synthetic benchmarking” sound worth trying? Or, now or in the future, we do generate thousands of fake texts reliably with the help of AI, the tests do work, and they do consistently and repeatably pin the VMS with high confidence to a particular universe of texts. I wouldn't call AI a magic machine. ChatGPT 5 is already a PhD expert in everything related to language (and math/stats/physics). I think there will be a way to leverage that to perform this experiment, if not now then probably within a few versions. RE: Could synthetic “fake Voynich” tests tell us what we’re dealing with? - oshfdk - 10-09-2025 (10-09-2025, 12:54 AM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.Or, now or in the future, we do generate thousands of fake texts reliably with the help of AI, the tests do work, and they do consistently and repeatably pin the VMS with high confidence to a particular universe of texts. I'm not sure about others, but I will just dismiss this result as irrelevant without much thinking. Showing that the Voynich MS is similar to a particular set of texts doesn't prove that it belongs to that set. A snake is more similar to a garden hose than to a dog, but I doubt that it would be reasonable to treat a snake as garden inventory. RE: Could synthetic “fake Voynich” tests tell us what we’re dealing with? - dexdex - 10-09-2025 (10-09-2025, 12:54 AM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.ChatGPT 5 is already a PhD expert in everything related to language (and math/stats/physics).No it isn't. For one, a PhD will refuse to answer a senseless question. |