![]() |
|
Why is there even a Voynich B? - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Why is there even a Voynich B? (/thread-4975.html) |
RE: Why is there even a Voynich B? - Jorge_Stolfi - 12-10-2025 (12-10-2025, 12:04 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Frequent or function words, like “os”, “la”, or “que”, behave very similarly across all texts, so NMF spreads them over several topics. Not sure I understand this point. But the words that I listed in my previous post occur exclusively in only one of the three "languages" (DC1+DC2, DC3, and DC4). I suspect that the problem is that NMF had a large rate of mis-classification, so that (say) 10-20% of the DC4 parags were classified as topic 0 or 1, instead of topic 2. Is this the case? Could you please list a couple of parags from DC3 that were so mis-classified? Maybe they are neither Portuguese nor Spanish, but just "Iberian"... All the best, --jorge RE: Why is there even a Voynich B? - quimqu - 12-10-2025 (12-10-2025, 01:44 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Not sure I understand this point. But the words that I listed in my previous post occur exclusively in only one of the three "languages" (DC1+DC2, DC3, and DC4). Those words only occur in one of the three language variants, that’s correct. The issue is that NMF doesn’t know that. It doesn’t look at which file a word comes from; it only sees a big table of word counts. When it decomposes that table, each topic is a weighted combination of all the words, and every word gets some weight in every topic, even if that word only appears in one language. So “nao” might only appear in DC1–DC2, but because NMF works with continuous values, it can still give “nao” a small non-zero weight in another topic if that helps to approximate the overall structure of the data. This doesn’t mean the model thinks “nao” appears elsewhere, it just reflects how the math spreads variance across components. NMF doesn’t make hard assignments like “this word belongs only to topic 1.” It produces soft, overlapping topics, so a word that is exclusive in reality can still have small weights in other topics for mathematical reasons. In fact, that's why I find it so interesting for the Voynich topic/languae analysis, as we don't know what it means (if it means anything) but we can detect different constructions of the text. Now firstly, let me attach the same topic assignment plot that I attached in my previous post but horizontally, so you can see that in fact there are not that much paragraphs that are wrong. In fact there are only 4 Portuguese paragraphs that are labelled as Spanish and one Spanish paragraph that is labelled as Portuguese. But all the phonetic Portuguese paragraphs were labelled correctly. Here are some lines that were given to an uncorrect language. I have marked the most weighted words with colors for the topics in the paragraph. You will see words in up to three colors (drawn), in two colors, in one single color, or no color at all (meaning the model does not take them into account). If we go to the first sentence in Portuguese that is labelled as Spanish, we have: Most words are weighted in different topics, but in concrete, "cosme", "tio", "tio cosme" and "como" make Spanish weight more. The second one is: In this case it is almost a drawn, but the words weight a bit more in Spanish topic. If we go to the Spanish paragraph labelled as Portuguese: In this case, as in the previous, the words weight more for Portuguese topic. In general, you can see that the NMF has found the languages almost perfectly. I think now you can understand better my results in my You are not allowed to view links. Register or Login to view. about the Voynich. If you have suggestions, doubts, whatever that makes us advance in this topic, please feel free to ask. RE: Why is there even a Voynich B? - Jorge_Stolfi - 12-10-2025 Thanks for the detailed reply! So, what can we conclude by comparing these results to your previous one of the VMS? Does NMF find the difference between Voynichese A and B greater or smaller than that between Spanish and Portuguese? Or that between official and phonetic Portuguese? I mean, in terms of number of apparently mis-classified pages? Those ambigous parags that you listed are barely above the 5 word cut-off. I suppose that the NMF is more accurate when it works with bigger chunks of text. You analyzed the VMS at both the page and parag level, right? But the VMS parags are still rather large. How would the NMF perform if you deleted from the PT+ES dataset all parags with less than (say) 30 words? Would it get a perfect score? All the best, --jorge RE: Why is there even a Voynich B? - quimqu - 12-10-2025 I will post news about the topics this week (hopefully tomorrow) in my thread. The topic models detect 3 different languages/dialects. I have been working also with subtopics of those topics (but this is a bit tricky). RE: Why is there even a Voynich B? - Dunsel - 09-11-2025 (08-10-2025, 08:13 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.So my best three guesses for the A/B difference is that (1) the Author changed the spelling system between the two sets. or (2) the Author switched to another Dictator with a different dialect, or (3) the two sets come from two different source texts written in different dialects. Have you considered that perhaps it was just one author spread over many years? Just thinking that if it is a constructed language that the author may have sat down and wrote part of the book, went off to other things and came back to it say 10 years later. The language may have evolved into something slightly different over those years which would explain a good bit. BTW, pleasure to finally get to pick the brain of a legend. Thanks. RE: Why is there even a Voynich B? - Jorge_Stolfi - 10-11-2025 (09-11-2025, 09:57 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Have you considered that perhaps it was just one author spread over many years? Just thinking that if it is a constructed language Well, but I believe it is not a constructed language, but a real one. So if it was just the passage of time, what would have changed is the Author's spelling system; which would be my item (1). All the best, --stolfi RE: Why is there even a Voynich B? - Dunsel - 10-11-2025 (10-11-2025, 12:17 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(09-11-2025, 09:57 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Have you considered that perhaps it was just one author spread over many years? Just thinking that if it is a constructed language Or, perhaps added more morphemes (forgive me if I'm not using the correct word there), or perhaps a better description is evolved, as the language developed, kinda like a normal language would change over the years? I'm just picturing some old alchemist trying to jot down his notes in some language he created so he doesn't get branded a heretic for delving into naked women in pools of goo and magic. Perhaps he worked on the book for a while, went on to other documents, evolved the language and came back to it years later. House caught fire and that book was the only work he managed to salvage. Just thinking out loud. Thanks. RE: Why is there even a Voynich B? - JoJo_Jost - 10-11-2025 I suspect that the basis of the Voynich manuscript was several notebooks on various topics belonging to a regionally known doctor, monk or alchemist, which he wrote in an extremely abbreviated form and ‘encrypted’ by writing some of it in mirror image (possibly a left-hander, no, not Leonardo) and also placing spaces randomly. This was copied after his death in the hope that someone might be able to decipher it at some point. The copyists could not decipher the text and wrote down what they thought they saw. I explain the different ‘languages’ by the fact that a new transcriber simply interpreted the underlying shorthand differently in some places, precisely because he did not understand it. I would be really interested to know whether this theory would fit with your outcomes? |