![]() |
|
In Support of Guido Pérez's Suggestion - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: In Support of Guido Pérez's Suggestion (/thread-5213.html) Pages:
1
2
|
In Support of Guido Pérez's Suggestion - dashstofsk - 06-01-2026 You are not allowed to view links. Register or Login to view. I think we have been a bit hasty in dismissing his effort as 'slop'. What he seems to be saying is that if each paragraph were to be on a uniform topic then there would be similarity in the words, that you would see certain words not randomly used throughout the manuscript but closely congregated in places, and this would be apparent if you were to do a statistical comparison of successive lines. But his analysis seems not to show this. His conviction: "we would expect a higher overlap in function words or core thematic vocabulary", "VMS text behaves as if there was a 'reset' at almost every line break", "lines are very independent". I cannot believe that any AI program can ever be clever enough to initiate such an effort and find ways to do the statistical analysis. I would very much like to see some more on this idea. It might turn out to be significant. I would like to encourage Guido to continue with his idea, but perhaps to present it in a different way. RE: In Support of Guido Pérez's Suggestion - Koen G - 06-01-2026 I understand that there is a useful idea in there, but his paper was AI slop. Orphaned reference list, one of the references doesn't exist. The whole thing is illegible, and I doubt he understands it himself. Of course, the idea may be worth exploring and actually testing. RE: In Support of Guido Pérez's Suggestion - Jorge_Stolfi - 06-01-2026 (06-01-2026, 02:34 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I think we have been a bit hasty in dismissing his effort as 'slop'. I agree. The paper does make sense, even thought I think the details of the methodology are flawed and these flaws invalidated the conclusion. He did the faux pas of using LLMs to create the bibliography, and did not check whether the references even existed. l hope he fixes the flaws and tries again.. All the best, --stolfi RE: In Support of Guido Pérez's Suggestion - tavie - 06-01-2026 There's nothing wrong with comparing word types in line pairs. What is unacceptable on this forum is using an LLM to do it and produce a slop paper. If a person wishes to appeal the locking and moving of their thread, then under the You are not allowed to view links. Register or Login to view. (Rule 5), they need to PM the relevant moderator. The vast majority of the LLM slop theorists whose topics I lock do indeed follow this rule and PM me (there are so many that I have a separate inbox folder). But Koen and I don't lock threads unless we are extremely confident they contain LLM slop, so it's very unlikely we would reopen them. People who have their thread locked and moved to the Slop Bucket are free to attempt to do all the work themselves, without any LLM involvement, and post a new thread. But if that new thread still contains LLM slop, it'll be locked and they will likely be banned. RE: In Support of Guido Pérez's Suggestion - dashstofsk - 06-01-2026 (06-01-2026, 03:12 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Orphaned reference list, one of the references doesn't exist. The whole thing is illegible It might just be that he is a newcomer to writing academic papers, and might just be in need of constructive advice. Aren't you being a bit too hard on him? (06-01-2026, 03:12 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I doubt he understands it himself. I don't see this. He knows about Jaccard Similarity, and enough to generate and present a table of continuity statistics by codicological section. I also believe he is correct when he wrote "any viable account of MS 408 must therefore explain not only token statistics, but the observed structural independence between diagrammatic units." He highlighted this statement as being particularly significant. RE: In Support of Guido Pérez's Suggestion - dashstofsk - 06-01-2026 (06-01-2026, 03:16 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.l hope he fixes the flaws and tries again.. I hope so too. But unfortunately his experience of this forum and his treatment has probably dented his enthusiasm. The danger is that a piece of new and potentially productive research might not now happen. RE: In Support of Guido Pérez's Suggestion - Koen G - 06-01-2026 Then I hope he takes the opportunity to start over with a new thread. RE: In Support of Guido Pérez's Suggestion - RobGea - 07-01-2026 guidoperez's[1] core idea on the surface seems interesting, the perspective ( line similarity ) was new -to me at least-, and looking at things in a new way is almost always useful to some degree. I'm kinda torn on whether it reveals anything new or not, but Hopefully guidoperez will try again. Also, there is nothing to stop anyone doing something similar or replicating the study as long as they give credit in an appropriate reference. Thats what science is all about. References 1. Pérez, G. J. (2026). Beyond Linguistic Decipherment: A Structural Analysis of the Voynich Manuscript as a Diagrammatic Information System (DIS) (2.0). Zenodo. You are not allowed to view links. Register or Login to view. RE: In Support of Guido Pérez's Suggestion - nablator - 07-01-2026 (07-01-2026, 12:23 AM)RobGea Wrote: You are not allowed to view links. Register or Login to view.I'm kinda torn on whether it reveals anything new or not, but I wanted to give guidoperez the benefit of the doubt. Now after checking a few things, I doubt there is anything salvageable in there. It's all either miscalculated or AI-hallucinated. I checked the "J" statistics of the VMS, inter-paragraph consecutive lines Jaccard index of word similarity, keeping all spaces ('.' and ','), lines extracted from RF1b-er, paragraph starts-ends from ZL3a. Global (all paragraphs): 0.032 Q13: 0.051 Q20: 0.029 Q13 is more repetitive, no surprise, we knew that from MATTR statistics. Note: the C lines in the old interlinear file are missing a lot of pages, there is only one page of Q20 (f106v) so I used the complete RF basic EVA transliteration instead. There is no baseline value of Latin herbals, higher or lower than that: both are possible, I checked. It depends: some medieval texts use "ad" a lot, and other NLP "stop words" like "et", "vel", others not so much. The VMS doesn't have a short list of extremely common "stop words" that appear in almost every line, this is certainly true and it is the only fact worth discussing. RE: In Support of Guido Pérez's Suggestion - Jorge_Stolfi - 07-01-2026 (07-01-2026, 11:52 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.I wanted to give guidoperez the benefit of the doubt. Now after checking a few things, I doubt there is anything salvageable in there. It's all either miscalculated or AI-hallucinated. I think that the first paper is contaminated by LLM slop, and I agree that the way the statistics were computed is quite wrong in many ways, making the results meaningless. But the idea does not seem to be LLM hallucination. The first flaw was defining "segment" as one line for the VMS, then a paragraph or a recipe for the control texts. The J index should increase with segment size (token count), so it is important that all files have segments of similar sizes. Using the "C" transcription was another error. Quote:The VMS doesn't have a short list of extremely common "stop words" that appear in almost every line, this is certainly true and it is the only fact worth discussing. Indeed. Rather than trying to compensate for those common words, it would be more informative to list the most common words in the A ∩ B sets for each text. All the best, --stolfi |