Aga Tentakulus > 16-01-2023, 01:30 AM
MarcoP > 16-01-2023, 01:10 PM
(15-01-2023, 04:56 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Hi Marco, thanks for this. I remember your earlier post, but had not thought of it in this context.
Let me try to understand.
In my two example texts (one derived from a meaningful plain text, the other from a fully scrambled text), the first step: finding likely function words, would lead to exactly the same result. The fact that the second is meaningless is not detected, and that is due to the fact that it still has some 'meaning' hidden deeply inside it.
It then depends on the next step: clustering word types, whether this meaninglessness can be detected. This would require that the metod is taking the distance between words into account. If it does not, it will still consider the scambled text just as meaningfull as the original one. I don't know the answer to this.
Smith and Witten Wrote:The relative size of the intersection of the first-order successors of two function words is a measure of how often the words are used in similar syntactic structures. Where two closed-class words share an unusually common structural usage, we assume that they are functionally similar.
(15-01-2023, 04:56 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.With respect to MATTR, the scrambling should be clearly visible in the result. Ideally the curve should be flat with some random noise on top. However, a "non-flat" MATTR does not indicate meaning, of course.
I really can't remember if this was tested at the time when MATTR was discussed here.
What the experiments presented at the conference show is that human-generated meaningless text does not appear random. This is not unexpected. Any test for 'meaning' should be able to distinguish human-generated meaningless text from computer-generated random text.
Koen G > 16-01-2023, 01:35 PM
(16-01-2023, 01:10 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.This showed that scrambling increases the rate of reduplication in linguistic texts, but lowers it in the Voynich manuscript.
MichelleL11 > 16-01-2023, 02:29 PM
degaskell > 16-01-2023, 05:37 PM
(16-01-2023, 02:29 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Another text to include, if possible, would be text produced using Torsten Timm’s algorithm to see how it holds up under this kind of analysis.I would go further and suggest using multiple texts produced by a range of variations on Torsten's algorithm, in order to determine what kinds of structure can and can't be produced by self-citation as a class of methods rather than just what structure is present in a single document generated by self-citation.
MarcoP > 16-01-2023, 07:01 PM
Gaskell and Bowern Wrote:A more significant limitation of this work is that, because of the short length of our text samples, we are unable to test whether gibberish can replicate the larger structural features (such as “topic words”) which have been observed in the VMS [5–7]. At present, these features pose a serious challenge to proponents of the hoax hypothesis. However, while it is premature to assume that gibberish can replicate these features, it is equally premature to assume that it cannot; in theory, the properties of a scribe’s gibberish might drift considerably over the course of the weeks or months required to generate a VMSlength manuscript, introducing significant large-scale nonrandomness. If the scribe took breaks between sections, or only kept out material from the current section to reference when copying vocabulary, further spatial patterns might arise. 3 Insofar as possible, our results appear consistent with this hypothesis.
Quote: Compared to meaningful texts, gibberish has lower mean information content (compression); lower mean conditional character entropy (entropy); higher mean occurrences of repeated characters and words (repeated_chars, repeated_words); higher mean bias in where characters appear in a line (charbias_mean) and where words appear in a 200-word section (wordbias_mean); higher mean autocorrelation of word lengths (wordlen_autocorr; see below)
Aga Tentakulus > 16-01-2023, 07:18 PM