Koen G > 02-07-2025, 08:26 PM
(02-07-2025, 07:05 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.This also makes me wonder: are you suggesting that Voynichese might behave similarly because it encodes something like a “morphologically rich” or non-repetitive learned language — perhaps even analogous to medieval Latin?
quimqu > 02-07-2025, 08:46 PM
magnesium
This could be interesting to test systematically with different kinds of ciphertexts. Are there classes of substitution ciphers, for instance, that produce readily decipherable ciphertexts that exhibit anomalously low, VMS-like GPT predictability? If a given type of cipher, encrypting a wide range of plaintexts, consistently appears to be more GPT-predictable than the VMS, than that kind of cipher probably isn't consistent with the VMS. But this method probably would be overkill: Monoalphabetic substitution ciphers would probably show up as much more predictable in this analysis than the VMS, but we can rule out monoalphabetic substitution ciphers using much less computationally expensive techniques.
[/quote' Wrote: You are not allowed to view links. Register or Login to view.Hello! Check this! You are not allowed to view links. Register or Login to view.
I will work with the ciphers and the GPT
(02-07-2025, 08:26 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.[quote="quimqu" pid='68392' dateline='1751479547']This also makes me wonder: are you suggesting that Voynichese might behave similarly because it encodes something like a “morphologically rich” or non-repetitive learned language — perhaps even analogous to medieval Latin?
I don't think so. If you remove all potential suffixes from Voynichese, there's not enough left in terms of roots. At least that's my intuitive feeling about it. Maybe it's worth an experiment.
quimqu > 02-07-2025, 08:55 PM
(02-07-2025, 07:20 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I just think that using a black box like GPT for analysis should be accompanied with very strict definition of what the inputs mean exactly and how exactly we interpret the outputs and why. Black boxes already introduce a lot of uncertainty by themselves, when we multiply uncertainties, weird things can happen.
Jorge_Stolfi > 02-07-2025, 11:10 PM
(02-07-2025, 04:10 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.I trained several nanoGPT models (roughly 1.1M parameters each) on corpora limited to 11,000 words each. The corpora included:
ReneZ > Yesterday, 01:30 AM
Jorge_Stolfi > Yesterday, 03:59 AM
(02-07-2025, 04:10 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The GPT model is a type of neural network trained to predict the next token (e.g., word or character) in a sequence, based on the context of the previous ones.
quimqu > Yesterday, 07:35 AM
(02-07-2025, 11:10 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(02-07-2025, 04:10 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.I trained several nanoGPT models (roughly 1.1M parameters each) on corpora limited to 11,000 words each. The corpora included:
In case you want to try a wider sample, You are not allowed to view links. Register or Login to view.are some texts that I collected a while ago:
stopsquark > Yesterday, 07:38 AM
stopsquark > Yesterday, 08:01 AM
quimqu > Yesterday, 08:07 AM
(Yesterday, 08:01 AM)stopsquark Wrote: You are not allowed to view links. Register or Login to view.if you're using a pretrained model, though, what it's pretrained on matters a lot.