(15-01-2023, 04:56 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Hi Marco, thanks for this. I remember your earlier post, but had not thought of it in this context.
Let me try to understand.
In my two example texts (one derived from a meaningful plain text, the other from a fully scrambled text), the first step: finding likely function words, would lead to exactly the same result. The fact that the second is meaningless is not detected, and that is due to the fact that it still has some 'meaning' hidden deeply inside it.
It then depends on the next step: clustering word types, whether this meaninglessness can be detected. This would require that the metod is taking the distance between words into account. If it does not, it will still consider the scambled text just as meaningfull as the original one. I don't know the answer to this.
Hi Rene,
from what I understand of the paper, things are exactly as you say. There is a general problem that (like e.g. Sukhotin's algorithm) Smith and Witten's method was not designed to output a reliability measure, but in my opinion this is something that can be easily added to it.
The first step (identifying function words based on frequency) requires using multiple texts, since the intersection of the most frequent words from multiple sources is much more reliable (some content words will be frequent because of the specific context, e.g. "plant" in a herbal). If one considers the different Voynich sections as individual texts, the small intersection of the 1% most frequent word types could be sufficient from the early failure of the method.
Coming to the scrambled vs non-scrambled example, as you say, problems show in the following step. Function words are further categorized according to the intersection between the sets of different words that can immediately follow (this is actually similar to Sukhotin's algorithm).
Smith and Witten Wrote:The relative size of the intersection of the first-order successors of two function words is a measure of how often the words are used in similar syntactic structures. Where two closed-class words share an unusually common structural usage, we assume that they are functionally similar.
The significance of the intersection is computed by comparing actual counts with expected counts
"under the assumption of random sampling". In a randomly scrambled text, all variation will be due to statistical fluctuations and no significant clustering will be possible.
Though the method described in the paper is very simple (and certainly not robust), one could argue that this line of reasoning was expanded in continuous-bag-of-words approaches, where the N words immediately preceding and following (instead of the single following word) are used for modelling word types. In combination with neural networks, this led to Word2Vec and ultimately contributed to recent AIs like GPT.
(15-01-2023, 04:56 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.With respect to MATTR, the scrambling should be clearly visible in the result. Ideally the curve should be flat with some random noise on top. However, a "non-flat" MATTR does not indicate meaning, of course.
I really can't remember if this was tested at the time when MATTR was discussed here.
What the experiments presented at the conference show is that human-generated meaningless text does not appear random. This is not unexpected. Any test for 'meaning' should be able to distinguish human-generated meaningless text from computer-generated random text.
I don't think MATTR on randomly scrambled text has been discussed much. That looks like an interesting line of investigation. A while ago, I posted You are not allowed to view links.
Register or
Login to view. about the effects of scrambling on full and partial reduplication.
This showed that scrambling increases the rate of reduplication in linguistic texts, but lowers it in the Voynich manuscript.
Another possibly related measure was discussed discussed in Cárdenas et. 2016, as mentioned You are not allowed to view links.
Register or
Login to view..