13-09-2025, 11:27 AM
(13-09-2025, 07:53 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.You are assuming the existence of a fixed seed text from which words are copied and modified. That is not the case. .... In the self-citation model, the Voynich text functions simultaneously as both the source and the outcome of the copying process.
No. I explicitly wrote that the source text available for copying grows as the algorithm progresses, and that is why the word pair distribution changes and tends to the random x random limit. But you need a seed text to start the process. You admit so much:
Quote:The algorithm requires only a minimal seed (e.g., a single line of text) to initialize. ... In our implementation, we used line f103v.P.9 of the VMS as seed—<pchal shal shorchdy okeor okain shedy pchedy qotchedy qotar ol lkar
But that seed already shows the distinctive, non-trivial, non-"European" Voynichese word structure. How did the Author come up with that seed, and why?
If you use a short text as a seed, for the first page or so you would get only repeated fragments of that text, with a few mutations. Did you find any part of the VMS where the text looks like that?
If the mutation probability is cranked up in order to hide that "small seed" effect, then the mutation procedure must be complicated enough to preserve the structure of Voynichese words, and tuned to produce each segment with the right probabilities. But then the generated text would quickly lose the "repetitiveness" character that was supposed to justify your method. Indeed the algorithm would quickly become equivalent to a zero-order Markov model, with a word distribution that is an attractor of the mutation procedure M. Namely, a distribution P such that P(x) = sum(P(y)*Prob(M(y) = x) : y}.
Quote:... to generate a corpus of more than 10,000 words. The resulting text contained 7,678 Voynich words (70%) and 3,156 non-Voynich words (30%).
But surely the percentage of Voynich words was higher than 70% at the beginning (when it was mostly copies of fragments of the seed line) and less than 70% near the end (where most words were the result of multiple mutation steps). And the percentage must have been decreasing; unless the mutation procedure was complicated and finely tuned as per above. And the word pair distribution must already have been visibly tending to that of a zero-order Markov model, namely random x random.
Here are some tests of your algorithm with a 14-word seed text in English (a bit longer than the one you used above). The mutation algorithm randomly deletes a letter, with increasing prob if the word is long; or inserts a letter chosen with the approximate English letter frequency, with increased prob if the word is short; or replaces a random letter by a loosely similar letter (vowel by vowel, stop by stop, sibilant by sibilant). (This algorithm is not trivial and somewhat "tuned" to English, but I suppose that this is still considerably simpler and less "tuned" than the mutation procedure you used for Voynichese, correct?) For each combination of parameters, the algorithm was used to generate N = 100000 words, and the first and last 100 were printed.
[EDIT: changed slightly how the {p_mutate} parameter is used and re-created the examples.]
seed = ['the', 'native', 'hue', 'of', 'resolution', 'is', 'sicklied', 'over', 'with', 'the', 'pale', 'cast', 'of', 'thought']
=== N = 10000 p_reset = 0.100 p_mutate = 0.100 ===
resolution is sicklied over sicklied over with the pale cast the
native hue oj resolution i sicklied over wigh the rpale resolution i
sicklied over wigh the rpale sicklied over wigh oj resolution i
sicklied over wigh the rpale resolution i sicklied over wigh the
rpale is wigh hue oj resolution e sicklied over wigh the rpale
resolution i sicklied oser wigh the rpale sicklied over wigh oj
resolution i sicklied over wigh the rpale resoluetion i sicklied
over wigh the rpale is wigh hue oj resolution e sicklied over wigh
the rpale resolution i sicklied oser wigh the cast
...
e sicklied over widh oser wigh lthe rpale sicklied over wigh oj oj
resolution be o sicklied over wigh rpane fesolution i sicklied orer
wih phe rpale resolutuona i sicklied is wygh the rpale is wigh wigh
dhe rpale wogh resolution i sicklied i siklied oser wih the cst the
native hue oj i sicklied sicklied over wtigh wiygh phe rpal wigh the
npale wygh hue o suckliied oer sicklied wigh resolution e sickliud
over sicklied ogver hue ij resoluion pavo cart rpale relolution i
sicknied ozer nwigh the the i sicklied thw pale the rale over wigh
el sicklied
Note that the first 100 generated words are essentially repetitions of fragments of the seed text. As the algorithm progresses, the first few words that happened to be copied become increasingly more likely to be copied, so that the text at first becomes even more repetitive. Then, as mutations accumulate, the output becomes random tosses of variations of those few lucky words.
=== N = 10000 p_reset = 0.100 p_mutate = 0.700 ===
resalution i sicklied over us sickhied over witj the pale cst ov hu
otf resolution is sicklyed ovepr wizth the plale casnt of thoufght
reasalution ogf ogf ogf oxf uxf uxj uxj uxj uxj uxj uxr uxr uxr wxr
pale dst ol hu otf renolution is sicklyid vepr wuzth he psale cesnt
lof thoufgh reasalutione ogf obf og yxf uxm uxj uxl uxmj us sickhiet
ovur wtj thu pyle kst oz fu ot resolutio is vicklyed vepr wizth the
dlale os sichied over witl ethe pane fcst af riesolution is sickvied
ovev wits the pale casat af thought resalutio e
...
dgsis snipkle seqr wuzp ie sily evnt jos ghumgh gl obj tult refti i
liihiut ujs abn ymy wisai ht ewnj ussvv utlve fhea vbmsg l
rsilultian wns yqr obq vqon wbes cdasa il movwen pask rejoltion em
rkebsoluta uh vklyeqe m casa vicklyep cvepr vckloed ghofbv
reasluione ogb qaj pan thoght rjaluio zwoj chus balbe cyat of exvwh
ij fe k resovutio ys vcklyed nepr oizkh ut reolta is evckloed evw
ovqery os tjhoyghd rvulutin oaje pnlae apesvtpb tmovgh asaluqone
uigf ebov oqqd caonat om lenoliotifo iin scklii sn if vcklyeq vwh iz
hu t resolutio is vcknyed vepr
Here, the high mutation probability eventually renders the seed irrelevant, and the output soon becomes a zero-order Markov text with word distribution defined by the mutation procedure. The output does not look like English at all, because the mutation procedure is not sufficiently complicated and tuned.
=== N = 10000 p_reset = 0.700 p_mutate = 0.100 ===
witf cast o is resolution us resolution of thought o is resolution
us resolution tought casp u witf is of resolution the native native
native native native native native witf cast nativi is resolution
resolution resolution us thought o resolution native natijve native
resolution tought native native native pale resolution native natine
witf resolution o is of native palw cast o is resolution of o is
resolution nativi native cast us thought ir native witf cysp u witf
hue o thought ntivi resolution resolution naytive natijve cast cast
o is resolution resolutigon native resolugtion cast native witf cysp
native is
...
tought is is is resolution uf resolutigan natinde rslution witf
resolution xo native resolution native rsolution o o ih resolution
reslugtion is resoltion is resomution native is witf is ovwr of with
us witf thougmt resolution cysp hought witf o is resolution witf
mytie resolugion resolution native cast sicklied native us
resolution natuve native palw o sicklied resolution resolution
native o resolution of u cast is onf hought sative cast with
resolution resolution native u native is natinde resolution ovwr
ntive resootion native thw is natinbe resolution is wvitf witf witf
u sicklied witf native natuve i witf reslugtion es
=== N = 10000 p_reset = 0.400 p_mutate = 0.400 ===
of the cast hue iwith rasolution pale cast thought ast hue iwith
sicklied pave caist of hue owith of native pae caszt thought ast
pive kaist of vue owith o owixh of native paw caszt thought art
owixh of nnative pae cuszt thoughb pav cast owith thought ast hue
caw caszt bhought art native of the ast pale cast vue owith cost
thought ast huw iwit rasolution pale cast thought asbt nnativi pae
owith w owaxh of native pai caszt thought art owixh cast thought
asbt nnativi owikh of cost dhought as huw iwib rasolution thoghb pav
nntivi o owixh
...
thouht cal pott thoubht raslution though caszt tfought ast sickiek
cist iwibv hyu thyukht casx oj o sibbkeg heua abx art oh fr tougrt
ieitv ceost dhooght thouhgt qule theuhg ptule hae itf buist huo
natie thdee totj fbough pawlea iu owdth uf rasolution paa cuszt
thoghb paywv cwst oiith cast oh piva casc vue dhgc ckaft ast
gthought thighb owiph wiph hyught of hue thoghct iwib tloghqa oh fr
of raoltaon casg cuszt s hlue iabt ast oj iwit ihf oth fr tougrt
kqahw iwith caszt iwtexh r tvough vue csq of eh oiith l huef vasg
sicglied pavek
All the best, --jorge