dashstofsk > 25-06-2025, 04:46 PM
(25-06-2025, 03:04 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think no matter what algorithm one can propose for randomly generating a lot of pseudo text, it's probably quite trivial to adapt this mechanism for actual encoding, just by adding a bit of constraints on the randomness
(25-06-2025, 03:04 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think it makes sense to test how this would work for non-European languages
dashstofsk > 25-06-2025, 08:25 PM
(25-06-2025, 05:32 PM)davidma Wrote: You are not allowed to view links. Register or Login to view.Could these be nulls? Or encoded space characters?
Jorge_Stolfi > 17-11-2025, 12:39 AM
(25-06-2025, 04:46 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Also I don't know much about non-European languages.
ReneZ > 17-11-2025, 09:24 AM
JoJo_Jost > 4 hours ago
dashstofsk > 3 hours ago
JoJo_Jost > 3 hours ago
oshfdk > 1 hour ago
(4 hours ago)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.But not enough: You can also test this: If you ignore the spaces in Latin and split at random points (keeping the same word-length distribution), the assumed dependency between word beginnings and ends sometimes even collapses to values lower than those in the VMS! (and the same happens in MHD) So if you assume that the tokens are not “words,” your whole neat theory falls apart.
So, according to your logic, would Latin without spaces be “even more artificial” than VMS? The test does not measure a property of the construction. It measures a property of the cutting.
Thus, your findings say something entirely different from what you claim. They demonstrate the conspicuousness of the VMS spaces, not the artificiality of the words.