Speculative fraud hypothesis - Printable Version

Speculative fraud hypothesis - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+--- Thread: Speculative fraud hypothesis (/thread-4877.html)

Pages: 1 2 3 4 5 6 7 8 9

RE: Speculative fraud hypothesis - Jorge_Stolfi - 13-09-2025

(Yesterday, 07:53 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.You are assuming the existence of a fixed seed text from which words are copied and modified. That is not the case. .... In the self-citation model, the Voynich text functions simultaneously as both the source and the outcome of the copying process.

No. I explicitly wrote that the source text available for copying grows as the algorithm progresses, and that is why the word pair distribution changes and tends to the random x random limit. But you need a seed text to start the process. You admit so much:

Quote:The algorithm requires only a minimal seed (e.g., a single line of text) to initialize. ... In our implementation, we used line f103v.P.9 of the VMS as seed—<pchal shal shorchdy okeor okain shedy pchedy qotchedy qotar ol lkar

But that seed already shows the distinctive, non-trivial, non-"European" Voynichese word structure. How did the Author come up with that seed, and why?

If you use a short text as a seed, for the first page or so you would get only repeated fragments of that text, with a few mutations. Did you find any part of the VMS where the text looks like that?

If the mutation probability is cranked up in order to hide that "small seed" effect, then the mutation procedure must be complicated enough to preserve the structure of Voynichese words, and tuned to produce each segment with the right probabilities. But then the generated text would quickly lose the "repetitiveness" character that was supposed to justify your method. Indeed the algorithm would quickly become equivalent to a zero-order Markov model, with a word distribution that is an attractor of the mutation procedure M. Namely, a distribution P such that P(x) = sum(P(y)*Prob(M(y) = x) : y}.

Quote:... to generate a corpus of more than 10,000 words. The resulting text contained 7,678 Voynich words (70%) and 3,156 non-Voynich words (30%).

But surely the percentage of Voynich words was higher than 70% at the beginning (when it was mostly copies of fragments of the seed line) and less than 70% near the end (where most words were the result of multiple mutation steps). And the percentage must have been decreasing; unless the mutation procedure was complicated and finely tuned as per above. And the word pair distribution must already have been visibly tending to that of a zero-order Markov model, namely random x random.

Here are some tests of your algorithm with a 14-word seed text in English (a bit longer than the one you used above). The mutation algorithm randomly deletes a letter, with increasing prob if the word is long; or inserts a letter chosen with the approximate English letter frequency, with increased prob if the word is short; or replaces a random letter by a loosely similar letter (vowel by vowel, stop by stop, sibilant by sibilant). (This algorithm is not trivial and somewhat "tuned" to English, but I suppose that this is still considerably simpler and less "tuned" than the mutation procedure you used for Voynichese, correct?) For each combination of parameters, the algorithm was used to generate N = 100000 words, and the first and last 100 were printed.

[EDIT: changed slightly how the {p_mutate} parameter is used and re-created the examples.]

seed = ['the', 'native', 'hue', 'of', 'resolution', 'is', 'sicklied', 'over', 'with', 'the', 'pale', 'cast', 'of', 'thought']

=== N = 10000 p_reset = 0.100 p_mutate = 0.100 ===

resolution is sicklied over sicklied over with the pale cast the
native hue oj resolution i sicklied over wigh the rpale resolution i
sicklied over wigh the rpale sicklied over wigh oj resolution i
sicklied over wigh the rpale resolution i sicklied over wigh the
rpale is wigh hue oj resolution e sicklied over wigh the rpale
resolution i sicklied oser wigh the rpale sicklied over wigh oj
resolution i sicklied over wigh the rpale resoluetion i sicklied
over wigh the rpale is wigh hue oj resolution e sicklied over wigh
the rpale resolution i sicklied oser wigh the cast
...
e sicklied over widh oser wigh lthe rpale sicklied over wigh oj oj
resolution be o sicklied over wigh rpane fesolution i sicklied orer
wih phe rpale resolutuona i sicklied is wygh the rpale is wigh wigh
dhe rpale wogh resolution i sicklied i siklied oser wih the cst the
native hue oj i sicklied sicklied over wtigh wiygh phe rpal wigh the
npale wygh hue o suckliied oer sicklied wigh resolution e sickliud
over sicklied ogver hue ij resoluion pavo cart rpale relolution i
sicknied ozer nwigh the the i sicklied thw pale the rale over wigh
el sicklied

Note that the first 100 generated words are essentially repetitions of fragments of the seed text. As the algorithm progresses, the first few words that happened to be copied become increasingly more likely to be copied, so that the text at first becomes even more repetitive. Then, as mutations accumulate, the output becomes random tosses of variations of those few lucky words.

=== N = 10000 p_reset = 0.100 p_mutate = 0.700 ===
resalution i sicklied over us sickhied over witj the pale cst ov hu
otf resolution is sicklyed ovepr wizth the plale casnt of thoufght
reasalution ogf ogf ogf oxf uxf uxj uxj uxj uxj uxj uxr uxr uxr wxr
pale dst ol hu otf renolution is sicklyid vepr wuzth he psale cesnt
lof thoufgh reasalutione ogf obf og yxf uxm uxj uxl uxmj us sickhiet
ovur wtj thu pyle kst oz fu ot resolutio is vicklyed vepr wizth the
dlale os sichied over witl ethe pane fcst af riesolution is sickvied
ovev wits the pale casat af thought resalutio e
...
dgsis snipkle seqr wuzp ie sily evnt jos ghumgh gl obj tult refti i
liihiut ujs abn ymy wisai ht ewnj ussvv utlve fhea vbmsg l
rsilultian wns yqr obq vqon wbes cdasa il movwen pask rejoltion em
rkebsoluta uh vklyeqe m casa vicklyep cvepr vckloed ghofbv
reasluione ogb qaj pan thoght rjaluio zwoj chus balbe cyat of exvwh
ij fe k resovutio ys vcklyed nepr oizkh ut reolta is evckloed evw
ovqery os tjhoyghd rvulutin oaje pnlae apesvtpb tmovgh asaluqone
uigf ebov oqqd caonat om lenoliotifo iin scklii sn if vcklyeq vwh iz
hu t resolutio is vcknyed vepr

Here, the high mutation probability eventually renders the seed irrelevant, and the output soon becomes a zero-order Markov text with word distribution defined by the mutation procedure. The output does not look like English at all, because the mutation procedure is not sufficiently complicated and tuned.

=== N = 10000 p_reset = 0.700 p_mutate = 0.100 ===
witf cast o is resolution us resolution of thought o is resolution
us resolution tought casp u witf is of resolution the native native
native native native native native witf cast nativi is resolution
resolution resolution us thought o resolution native natijve native
resolution tought native native native pale resolution native natine
witf resolution o is of native palw cast o is resolution of o is
resolution nativi native cast us thought ir native witf cysp u witf
hue o thought ntivi resolution resolution naytive natijve cast cast
o is resolution resolutigon native resolugtion cast native witf cysp
native is
...
tought is is is resolution uf resolutigan natinde rslution witf
resolution xo native resolution native rsolution o o ih resolution
reslugtion is resoltion is resomution native is witf is ovwr of with
us witf thougmt resolution cysp hought witf o is resolution witf
mytie resolugion resolution native cast sicklied native us
resolution natuve native palw o sicklied resolution resolution
native o resolution of u cast is onf hought sative cast with
resolution resolution native u native is natinde resolution ovwr
ntive resootion native thw is natinbe resolution is wvitf witf witf
u sicklied witf native natuve i witf reslugtion es

=== N = 10000 p_reset = 0.400 p_mutate = 0.400 ===
of the cast hue iwith rasolution pale cast thought ast hue iwith
sicklied pave caist of hue owith of native pae caszt thought ast
pive kaist of vue owith o owixh of native paw caszt thought art
owixh of nnative pae cuszt thoughb pav cast owith thought ast hue
caw caszt bhought art native of the ast pale cast vue owith cost
thought ast huw iwit rasolution pale cast thought asbt nnativi pae
owith w owaxh of native pai caszt thought art owixh cast thought
asbt nnativi owikh of cost dhought as huw iwib rasolution thoghb pav
nntivi o owixh
...
thouht cal pott thoubht raslution though caszt tfought ast sickiek
cist iwibv hyu thyukht casx oj o sibbkeg heua abx art oh fr tougrt
ieitv ceost dhooght thouhgt qule theuhg ptule hae itf buist huo
natie thdee totj fbough pawlea iu owdth uf rasolution paa cuszt
thoghb paywv cwst oiith cast oh piva casc vue dhgc ckaft ast
gthought thighb owiph wiph hyught of hue thoghct iwib tloghqa oh fr
of raoltaon casg cuszt s hlue iabt ast oj iwit ihf oth fr tougrt
kqahw iwith caszt iwtexh r tvough vue csq of eh oiith l huef vasg
sicglied pavek

All the best, --jorge

RE: Speculative fraud hypothesis - dexdex - 13-09-2025

(Yesterday, 11:27 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But surely the percentage of Voynich words was higher than 70% at the beginning (when it was mostly copies of fragments of the seed line) and less than 70% near the end (where most words were the result of multiple mutation steps). And the percentage must have been decreasing; unless the mutation procedure was complicated and finely tuned as per above.

This argument seems fallacious: the space of results grows like a tree, so divergence from a specific outcome is expected - what should be compared is the size of the resulting result space, which can be estimated by running the algorithm using different seeds and compare their pair-wise difference. That way, you get an estimate of the variation in results space depending on parameter space.

In other words the algorithm doesn't purport to be able to generate an infinite amount of Voynich-like manuscripts, nor always giving a great match to Voynich. It just allows a voynich-length output that looks similar enough, indicating that generation of it using a similar method is plausible. In other words, if the Voynich lies in the results space, then there could have been a seed (perhaps even now lost!) that generated it.

An interesting question is if the algorithm's result space is too wide to be plausible: a thousand monkeys on a typewriter will eventually generate anything, including Voynich, and that doesn't make it a plausible algorithm for generating Voynich. But the algorithm in Timm's article is pretty basic and simple and retaining the characteristics of the initial seed for the length of the VMS, so I expect the result space (at the level of Voynich length, that is) to be comparatively narrow, similar to the "gibberish after all?" article from the Voynich conference.

Also, the used algorithm for Voynichese is described in Timm's published article so I'm not sure why you keep asking how 'tuned' it is...

RE: Speculative fraud hypothesis - Jorge_Stolfi - 13-09-2025

(Yesterday, 01:38 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.voynich-length output that looks similar enough

The question is what "similar" means. To someone who has never seen English text, this sentence may look "similar enough" to English:

reasluione ogb qaj pan thoght rjaluio zwoj chus balbe cyat of exvwh
ij fe k resovutio ys vcklyed nepr oizkh ut reolta is evckloed evw
ovqery os tjhoyghd rvulutin oaje pnlae apesvtpb tmovgh asaluqone

The previous message pointed out that, in the output of their test run, "70% were Voynich words"; implying that the similarity criterion was just that, namely the percentage of words (word instances or word forms, not clear) that were in the VMS lexicon. If that was the criterion, then the "divergence" (a drop in the similarity as the algorithm progresses) is a problem, because it means that a significant part of the similarity was due to the fact that the seed text had been taken from the VMS.

All the best, --jorge

RE: Speculative fraud hypothesis - Jorge_Stolfi - 13-09-2025

(Yesterday, 01:38 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.I'm not sure why you keep asking how 'tuned' it is

Sorry for my confusing language. The "unless" did not mean that I did not know. I was referring to the generic algorithm, where the Mutate procedure is a parameter. The specific Mutate that they used was highly tuned to Voynichese.

RE: Speculative fraud hypothesis - dexdex - 13-09-2025

(Yesterday, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The previous message pointed out that, in the output of their test run, "70% were Voynich words"; implying that the similarity criterion was just that, namely the percentage of words (word instances or word forms, not clear) that were in the VMS lexicon. If that was the criterion, then the "divergence" (a drop in the similarity as the algorithm progresses) is a problem, because it means that a significant part of the similarity was due to the fact that the seed text had been taken from the VMS.

All the best, --jorge

That was not the only criterion: various Zipfian laws as well as frequency distributions were also compared in the article.

RE: Speculative fraud hypothesis - Torsten - 13-09-2025

(Yesterday, 11:27 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Here are some tests of your algorithm with a 14-word seed text in English (a bit longer than the one you used above). The mutation algorithm randomly deletes a letter, with increasing prob if the word is long; or inserts a letter chosen with the approximate English letter frequency, with increased prob if the word is short; or replaces a random letter by a loosely similar letter (vowel by vowel, stop by stop, sibilant by sibilant).

Keep in mind that the VMS was produced by a human scribe, not by a computer program. A 15th-century writer could not have executed an algorithm that randomly deletes or inserts letters, since neither computers nor random number generators were available to him. (Siedenote: Someone from the 15th-century wouldn't even understand the concept of randomness. The word "random" originated in Old French as randon, meaning "speed" or "force," and entered English around the early 14th century, referring to haste or violence. The modern statistical meaning, implying equal chances for all outcomes, emerged in the late 19th century.) Instead, the scribe relied on visual recognition and cognitive processes: scanning the text for source words and applying intuitive modifications. In such a context, it is far more natural to substitute glyphs with visually similar ones than to introduce or remove glyphs at random.

(Yesterday, 11:27 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(This algorithm is not trivial and somewhat "tuned" to English, but I suppose that this is still considerably simpler and less "tuned" than the mutation procedure you used for Voynichese, correct?)

Our algorithm is designed to approximate how a human scribe might have carried out the self-citation method. This is necessarily more complex than a purely mechanical procedure, since it must approximate the human ability to recognize, compare, and adapt patterns. While a human scribe can intuitively judge whether two glyphs or words appear similar, a computer program requires explicit rules to determine which glyphs are considered visually similar.

Note: Our argument is that the self-citation method could have been executed with ease by a medieval scribe, without the aid of any additional tools. We do not claim that a computer was involved in the creation of the Voynich text, nor that our computer simulation fully captures the complexity of human behavior. Rather, our aim is to demonstrate the feasibility of generating a text as rich and complex as the VMS through the strikingly simple mechanism of the self-citation method.

RE: Speculative fraud hypothesis - Torsten - 13-09-2025

(Yesterday, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The previous message pointed out that, in the output of their test run, "70% were Voynich words"; implying that the similarity criterion was just that, namely the percentage of words (word instances or word forms, not clear) that were in the VMS lexicon.

The similarity criteria in our work are not based on counting how many word instances also appear in the VMS lexicon—we do not even mention this. Instead, our focus is on reproducing the manuscript’s broader statistical key properties, which we describe in detail in our paper [see You are not allowed to view links. Register or Login to view.].

(Yesterday, 03:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.If that was the criterion, then the "divergence" (a drop in the similarity as the algorithm progresses) is a problem, because it means that a significant part of the similarity was due to the fact that the seed text had been taken from the VMS.

If you read our paper, you will see that we present a detailed analysis of the Voynich text. Building on this analysis, we introduce a concrete text-generation algorithm—the “self-citation” process—which could have been executed easily by a medieval scribe without any additional tools. We argue that the self-citation method offers the most concise and effective way to account for the features of the Voynich Manuscript.

An experiment by Gaskell and Bowern also strengthens our viewpoint. Gaskell and Bowern recruited Volunteers to write short “gibberish” documents, as a basis for a statistical comparison with the VMS and linguistically meaningful texts. Gaskell and Bowern write "Our results are generally consistent with the proposal of Timm and Schinner that the VMS was generated by a process of “self-citation”: that is, that the VMS scribe(s) generated the text largely by copying or modifying words appearing earlier in the same section". They write further in reference to the self-citation method “Informal interviews and class discussions conﬁrmed that many participants did indeed adopt this type of approach to create their texts, although they generally did so intuitively rather than by developing an explicit algorithm such as that published by Timm and Schinner” [You are not allowed to view links. Register or Login to view., p. 8]. ) In my eyes is further noteworthy that Gaskell and Bowern also report “[...] greater biases in character placement within lines and word placement within sections [...]” as the result of their experiment.

Another paper by Bowern and Lindemann further reports the test persons’ motivation for the word repetitions: “We tested this point in an undergraduate class and found that beyond about 100 words, the task of writing language-like non-language is very diﬃcult. It is too easy to make local repetitions [...]” This is an important point, because it clariﬁes that any scribe creating language mimicking gibberish will sooner or later replace the tedious task of inventing more and more words by the much easier reduplication of existing text (and stick with this strategy) [see You are not allowed to view links. Register or Login to view.].