• An attempt at extracting grammar from vord order statistics.
  • RE: An attempt at extracting grammar from vord order statistics.

    dashstofsk > 31-05-2025, 12:38 PM

    (31-05-2025, 07:25 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.in our You are not allowed to view links. Register or Login to view.

    The variability of character combinations across word breaks is something that has puzzled me also. The tables and charts in your paper would be better if you were to create a matrix of affinities for the most frequent word prefixes and suffices. Something like what I have done, and which I hope you might like to see. It was generated for the Bio B2 pages.
  • RE: An attempt at extracting grammar from vord order statistics.

    davidd > 31-05-2025, 12:49 PM

    (30-05-2025, 05:09 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.
    (27-05-2025, 11:04 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.I would have a question Davidd. Do you have any statistical simple measure that would tell us how "good" these groups are? Something like correlation coefficient ot some measure of total variance explained?

    This definitely requires more statistical knowledge than I have, Rene or Nablator can certainly make better suggestions. My unreliable guess is that, as a simple measure of quality, one could compute the probability for a graph to generate the text under examination. For instance, one could use You are not allowed to view links. Register or Login to view.discussed in Brown et al. "Class-based n-gram models of natural language".



    The probability of generating word Wi given that the preceding word is Wi-1, is the product of the probability of Wi occurring as a member of class Ci multiplied by the probability that class Ci-1 (Wi-1's class) is followed by class Ci (the arrows in Davidd’s graphs, assuming that the weights of arrows going out of a class add up to 100%).

    One can compute such probabilities for all words W1...Wn and multiply all of them to get the overall probability for the whole passage.

    I guess that Pr(Wi|Ci) is simply how many Wi tokens occur among all the tokens assigned to Ci.

    This system appears to be simple enough, but I think it can only compare models based on the same number of classes and texts of identical length.

    The standard deviation of a binomial distribution scales with the square root of the sample size
    Imagine you have a 6 sided die (dice). How many casts (throw) do you have to do to test if it is a fair one? Mathematically you can never be sure it is totally fair, but you can put upperbounds on how unfair it is, what the maximum deviations of the odds are. If you throw it 60 times, you have less information than when you throw it 600 times. When statisticians say the expected value is 100, they off course dont mean that they expect litterally to have 100 throws of each value, but somewhere close to 100. 

    My plan is to regard being in a group and transitioning to another group is like throwing a dice, and calculate how fair that dice is, how far those observations deviate from a "fair" dice assuming the odds of transition to a group would normally just depend on the relative group size that youre transitioning to.
  • RE: An attempt at extracting grammar from vord order statistics.

    dashstofsk > 31-05-2025, 01:29 PM

    (31-05-2025, 12:49 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.how far those observations deviate from a "fair"

    Using statistical techniques on the VMS is not going to be easy. We all know that the text is full of oddities and irregularities. One being that the frequencies of line first words and line last words are different to the remainder of the text. Also perhaps the authors made mistakes. Perhaps also knowing that the text would be undecipherable they did not feel it important to be correct. The VMS might just be a sloppy piece of work. If you are planning on using some of the techniques of hypothesis testing to get some significance level to some hypothesis then these oddities will bias your calculations.
  • RE: An attempt at extracting grammar from vord order statistics.

    Rafal > 31-05-2025, 02:07 PM

    Quote:One being that the frequencies of line first words and line last words are different to the remainder of the text.

    My very vague intuition is that some real words may by built of several vords. Think of such thing as hyphenation, breaking a word with a hyphen:

    [Image: hyphenation_1.png]

    A hyphen works here a sign that the word is continued in the next line. So it appears mainly at the line end. Just like "m" sign in VM  Wink

    Maybe vords allow to write down syllables but not long words??? If it was the case then there had to be some trick, some sign to mark that the next vord is continuation of the previous vord and they together make a complete word in final language. Such hyphen signs would probably bring some patterns to the text.

    Just thinking loudly.
  • RE: An attempt at extracting grammar from vord order statistics.

    dashstofsk > 31-05-2025, 04:51 PM

    (31-05-2025, 02:07 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.and they together make a complete word in final language

    Using the GC transliteration and for language B the average word length of the line first word is greater than the average by 0.4. The average for the last line word is shorter by only 0.12. The shortness of the last words can partly be explained by the authors wanting to finish a sentence in the remaining available space. I gave some views on this in You are not allowed to view links. Register or Login to view.. But also it seems to me unlikely that joining a line last word to the next line first word would result in many valid-looking words.

       
  • RE: An attempt at extracting grammar from vord order statistics.

    Bluetoes101 > 31-05-2025, 07:15 PM

    When I was looking at "what follows what" I found that EVA "m" follows the rules of being a letter made up from a "backslash", so like "r" rather than like "s" and "l" rather than like "y", so that is to say it follows a preceding "backslash" or a "curve" followed by "o" to switch/transition/modify the next thing into a backslash. 

    Other than the strange double "m" I circle, you can see all the highlighted examples follow this, and if you look at the "r" and "l" elsewhere in a paragraph you will see they follow the same "rules". In my opinion this shows "m" is not something outside the normal voynich word "rules". 

    In the first example you can see if we remove "m" and consider it "-" the word becomes "daiycheor" which wouldn't be typical due to the "iy" pairing, if we presume "line start" has something weird going on and remove "y" also we have "daicheor" which also would not be typical due to "ic" pairing, in fact you can remove each letter one at a time and you would have to go all the way to "r" to make "dair" to make something typical. I can't see many example of the others making something valid treating "m" as "-" either. 

    I have absolutely no clue why "m" gravitates to the right side of lines though and it does seem like it must be something other than just a letter of the alphabet if we are reading some sort of normal left to right writings of various things, just doesn't feel to me like it joins-on words, or is outside of "what follows what" preferences.

    Edit - the circled "m" actually looks like "g" now that I look again.
  • RE: An attempt at extracting grammar from vord order statistics.

    R. Sale > 31-05-2025, 08:49 PM

    Looking at the Voynichese snippet in Post #46 line 7, the first two highlighted vords appear to be the same, except the second vord has added a macron as part of the first glyph. What does this indicate about the use / purpose of this macron? Does it modify a particular glyph or create a new one?
  • RE: An attempt at extracting grammar from vord order statistics.

    dashstofsk > 31-05-2025, 09:30 PM

    (31-05-2025, 07:15 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.absolutely no clue why "m" gravitates to the right

    The image is from f3r. Something about this was discussed in  You are not allowed to view links. Register or Login to view.  where it was suggested that  and  are one and the same character.


    (31-05-2025, 08:49 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.or create a new one?

    I am also inclined to think that characters  Sh and  ch are not entirely different. If you look at the matrices of affinities in the quoted post you will see that the values for character pairs beginning  Sh and  ch are much the same, which suggests a commonality.
  • RE: An attempt at extracting grammar from vord order statistics.

    Bluetoes101 > 31-05-2025, 11:28 PM

    (31-05-2025, 09:30 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
    (31-05-2025, 07:15 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.absolutely no clue why "m" gravitates to the right

    The image is from f3r. Something about this was discussed in  You are not allowed to view links. Register or Login to view.  where it was suggested that  and  are one and the same character.

    It would be the simplest answer, but the more I've looked at the text the less sure I am. If EVA: "m/g" were always line-end, some fancy flourish would be fairly normal I think, at least I have seen it happen a fair bit in the limited manuscripts I have looked at.. the issue is the this character being "line-end" is a bit of a myth, it seems to mostly pop up on the right side of text but its not an ending flourish, in the example I gave there's a line with 3 of them in the same line. Part of me likes the idea of the flourish indicating "!" or "?" (or similar).. but you'd think we would have a version for most glyphs. Anyway, I think it being right leaning suggests whatever it is requires previous context which might suggest it is meaningful in someway beyond a fancy flourish... or not.
  • RE: An attempt at extracting grammar from vord order statistics.

    MarcoP > 01-06-2025, 07:13 AM

    I guess that we are going off-topic with respect to Davidd’s research. My favorite hypothesis with respect to EVA:m is that it is an abbreviation. I have no conclusive evidence, but in my opinion there are hints pointing in that direction:

    1.
    The downward stroke added to EVA:r r to produce EVA:m m is similar to the downward stroke turning the Latin letter ‘r’ into an abbreviation for ‘ris’. Examples from You are not allowed to view links. Register or Login to view..
       

    2.
    If one removes the final -m from Voynich word X-m and searches for words sharing the same root, the most common results are ‘X-iin’ or ‘X-in’, e.g. qokam matches qokaiin/qokain, dam matches daiin etc.

    EVA_Q13
    qokam 5 : qokain 162 , qokal 104 , qokaiin 87
    dam 4 : daiin 76 , dar 57 , dal 51
    am 3 : aiin 24 , ar 15 , al 14
    ram 3 : raiin 12 , rain 8 , ral 7
    lom 3 : lol 17 , lor 10 , lo 5

    EVA_Q20
    am 18 : aiin 122 , ar 111 , al 84
    otam 16 : otaiin 77 , otain 52 , otar 50
    qokam 12 : qokaiin 118 , qokain 100 , qokal 39
    okam 8 : okaiin 97 , okain 67 , okal 38
    dam 8 : daiin 110 , dain 42 , dar 34


    3.
    Actual scribal abbreviations sometimes show a preference for the right-most part of lines (as scribes shortened words to squeeze them into the page). In 2016, I shared an example (from You are not allowed to view links. Register or Login to view.) commenting on You are not allowed to view links. Register or Login to view..