Mauro > 29-11-2024, 08:12 PM
Mauro > 29-11-2024, 09:54 PM
(29-11-2024, 01:24 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(29-11-2024, 10:08 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.(voynichese.com transcription, words with 'rare' charactes ('g', 'x', 'v', 'z' and 'c', 'h' appearing alone) excluded, 7700 total words remaining)
Voynichese.com uses the old Takeshi Takahashi transliteration (1995) without unclear characters (all '?' removed)...
Try You are not allowed to view links. Register or Login to view. instead, it's better.
Mauro > 29-11-2024, 10:04 PM
(29-11-2024, 07:50 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(29-11-2024, 01:41 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.And surely I can try with the RF1a-n transcription. Just, can you point me to a link? Ideally it should be a single .txt file without any metadata or added remarks (that would save a lot of asinine work).
The link is in my post #13.
All separators converted to spaces, no metadata: ivtt -x7 RF1a-n.txt RF1a-n-x7.txt
oshfdk > 30-11-2024, 03:42 AM
(29-11-2024, 07:36 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I think you can find all the answers in my post #14. For an example of how functional element could be identified, see the note in small characters at the end of the point "It's a sillabary".
Quote:It’s a syllabary. I actually tried a little to investigate more, and I even found a reasonable way to convert fields to syllables, but then I realized the most frequent word in Voynich (daiin) has 2 chunks, which would be two syllables, while in all the languages for which I have statistics I can trust (English, Italian, Spanish, Latin, Classic Greek, Koine Greek, (rather old) German, (rather old) French) all the most frequent words, by far, have only one syllable. On the upside, if it’s a syllabary, the chances of finding a solution do not decrease much beyond the baseline (just divide by the number of all possible languages). The ‘decoding’ worked roughly like this: chunks such as ‘aain’encode CV/VC/V syllables (the slots can be arranged to get two fields with about 3 bits of information, enough for vowels, and one field for a consonant (with ~14 possible consonants, which would be +- enough for Latin, much less for English). Chunks such as ‘Cedy’ would encode CVC syllables: it’s possible to get a field for 5 vowels (but only in the first syllables, choices are limited to three after the first) and two fields for the consonants (but one of them is limited to about 9 choices).
Mauro > 30-11-2024, 10:03 AM
(30-11-2024, 03:42 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(29-11-2024, 07:36 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I think you can find all the answers in my post #14. For an example of how functional element could be identified, see the note in small characters at the end of the point "It's a sillabary".
It's exactly the analysis in post #14 that I have troubles understanding. I still have a very poor comprehension of how a grammar relates to potential features of the text. If I understand this correctly, any list of words can be approximated (?) via any of multiple grammars incompatible between themselves (in a sense that they can generate quite different sets of strings). As far as I can see, almost all proposed grammars generate vastly larger number of word types than those that are actually contained in the manuscript. In this case, I suppose, if it's possible to compute the overlap between all the strings predicted by grammar A and all the strings predicted by grammar B, this overlap would probably represent a very tiny fraction of both spaces. Does this mean that either grammar is likely a very poor model of the underlying text?
[Edit] My understanding of using grammars for Voynichese originally was that they are just to show that there is some structure to the words. For this grammars serve as a good tool. However, unless there is a very tight grammar that, say, covers 95% of the text and only produces less than 10x the number of known words types....
(30-11-2024, 03:42 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view..... I have little understanding of what we can learn by further refining these loose grammars.
oshfdk > 30-11-2024, 10:57 AM
(30-11-2024, 10:03 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Let me re-work the 'syllabary hypothesis' for my example, I hope it will be enough (and the example does not at all implies I want to propose it as a possible actual solution!).
(30-11-2024, 10:03 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.But I wouldn't say there is nothing to gain in exploring the grammars path: after all, Voynich words obviously have a (weird) structure, which probably has some function in the manuscript (or, in the case it's a meaningless text, it is some consequence of the pseudo-random method used to generate the gibberish) and even just finding a class of grammars where we cannot say exactly which is the right one, but we can be confident that one of them is probably the correct one, would be a step forward.
Mauro > 30-11-2024, 11:03 AM
Mauro > 30-11-2024, 12:07 PM
(30-11-2024, 10:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(30-11-2024, 10:03 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Let me re-work the 'syllabary hypothesis' for my example, I hope it will be enough (and the example does not at all implies I want to propose it as a possible actual solution!).
Thank you for explaining this further. If I understand it right, even if we stumble upon the correct grammar, it looks like we don't have any way to tell it apart from incorrect grammars with similar loose predictive power (this is, huge number of possible strings).
(30-11-2024, 10:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I don't know how you identify these grammars. If by some sort of machine learning process, maybe it could be possible to split the words into training and validation subsets? To check whether the grammar learned on a subset of words is as efficient for the rest of the text.
(30-11-2024, 10:03 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.But I wouldn't say there is nothing to gain in exploring the grammars path: after all, Voynich words obviously have a (weird) structure, which probably has some function in the manuscript (or, in the case it's a meaningless text, it is some consequence of the pseudo-random method used to generate the gibberish) and even just finding a class of grammars where we cannot say exactly which is the right one, but we can be confident that one of them is probably the correct one, would be a step forward.
(30-11-2024, 10:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Let's say, hypothetically, we learn that the correct intended grammar for the whole text is A + B + C + D (we can't know this before we know the method used to create the text, so this is a pure thought experiment). Let's say that we know that A can be either 'q' or 'y' or null, etc. for B, C and D. I still don't understand how knowing all these facts would give us anything in terms of identifying the method used to produce the text. The only thing I see is that we will be able to generate new conforming Voynichese words, with no clue about their meaning or function. Maybe we could try to guess something from the number of entries in each slot, but overall I just see no clear path from identifying the correct grammar (which could be impossible in the first place) to identifying the meaning or function.
(30-11-2024, 10:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.BTW, do you have any thought about the binomial distribution of word lengths, as identified by Stolfi (if I'm not mistaken)? Does it put some constraints on possible grammars?
ReneZ > 01-12-2024, 04:23 AM
(29-11-2024, 08:45 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.At first I was a bit confused as the 'first missing word' did not match mine, but I based the checks on the reference transliteration (RF-1a).
Mauro > 01-12-2024, 09:48 AM
(01-12-2024, 04:23 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(29-11-2024, 08:45 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.At first I was a bit confused as the 'first missing word' did not match mine, but I based the checks on the reference transliteration (RF-1a).
It's taken me a few days to double-check this.
I had written that the first word not modeled by M.Zattera was 150th ranked 'choty'. This has 40 occurrences in RF-1a. However, I had indeed overlooked 'cheky', 89th ranked with 65 occurrences.
(01-12-2024, 04:23 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Of course you are welcome to use the term 'word chunks' if you think it fits your model.
(01-12-2024, 04:23 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I can recommend you to look at the tool 'ivtt' which is available as a single C source file, with documentation.
It will eliminate all sorts of boring and time consuming editorial work. (You are not allowed to view links. Register or Login to view.)
@nablator also used it to create the clean file for you.
Quote:ivtt -x7 RF1a-n.txt RF1a-n-x7.txt