dashstofsk > 31-05-2025, 12:38 PM
(31-05-2025, 07:25 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.in our You are not allowed to view links. Register or Login to view.
davidd > 31-05-2025, 12:49 PM
(30-05-2025, 05:09 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.(27-05-2025, 11:04 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.I would have a question Davidd. Do you have any statistical simple measure that would tell us how "good" these groups are? Something like correlation coefficient ot some measure of total variance explained?
This definitely requires more statistical knowledge than I have, Rene or Nablator can certainly make better suggestions. My unreliable guess is that, as a simple measure of quality, one could compute the probability for a graph to generate the text under examination. For instance, one could use You are not allowed to view links. Register or Login to view.discussed in Brown et al. "Class-based n-gram models of natural language".
The probability of generating word Wi given that the preceding word is Wi-1, is the product of the probability of Wi occurring as a member of class Ci multiplied by the probability that class Ci-1 (Wi-1's class) is followed by class Ci (the arrows in Davidd’s graphs, assuming that the weights of arrows going out of a class add up to 100%).
One can compute such probabilities for all words W1...Wn and multiply all of them to get the overall probability for the whole passage.
I guess that Pr(Wi|Ci) is simply how many Wi tokens occur among all the tokens assigned to Ci.
This system appears to be simple enough, but I think it can only compare models based on the same number of classes and texts of identical length.
dashstofsk > 31-05-2025, 01:29 PM
(31-05-2025, 12:49 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.how far those observations deviate from a "fair"
Rafal > 31-05-2025, 02:07 PM
Quote:One being that the frequencies of line first words and line last words are different to the remainder of the text.
![[Image: hyphenation_1.png]](https://www.atlantiswordprocessor.com/en/help/images/hyphenation_1.png)

dashstofsk > 31-05-2025, 04:51 PM
(31-05-2025, 02:07 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.and they together make a complete word in final language
Bluetoes101 > 31-05-2025, 07:15 PM
R. Sale > 31-05-2025, 08:49 PM
dashstofsk > 31-05-2025, 09:30 PM
(31-05-2025, 07:15 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.absolutely no clue why "m" gravitates to the right
(31-05-2025, 08:49 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.or create a new one?
Bluetoes101 > 31-05-2025, 11:28 PM
(31-05-2025, 09:30 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.(31-05-2025, 07:15 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.absolutely no clue why "m" gravitates to the right
The image is from f3r. Something about this was discussed in You are not allowed to view links. Register or Login to view. where it was suggested that m and r are one and the same character.
MarcoP > 01-06-2025, 07:13 AM