ReneZ > 25-12-2025, 12:11 AM
(24-12-2025, 03:13 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Map all instances of k t p f K T P F to f. (Why not t? Because You are not allowed to view links. Register or Login to view. would become t17v...)
Jorge_Stolfi > 25-12-2025, 01:23 AM
(25-12-2025, 12:11 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(24-12-2025, 03:13 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Map all instances of k t p f K T P F to f. (Why not t? Because You are not allowed to view links. Register or Login to view. would become t17v...You can use bitrans, which would know not to change the metadata in the file.)
kckluge > 28-12-2025, 09:30 AM
(23-12-2025, 11:08 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Am I right in understanding that these PCA plots are computed just on character pair frequencies?
Quote:And that these plots are being used to judge how A-like or B-like certain pages are?
Quote:It seems to me that this might lead to wrong conclusions.
[...]
Surely you need to include some such additional measures alongside character pairs when deciding if groups of pages are related.
Jorge_Stolfi > 28-12-2025, 01:33 PM
(28-12-2025, 09:30 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.there is no continuous Bayesian likelihood "A-like" or "B-like" value, there's binary "in the clump on the left" or "in the clump on the right."
dashstofsk > 28-12-2025, 05:05 PM
(28-12-2025, 09:30 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.your using a different transcription alphabet
(28-12-2025, 09:30 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.relative frequencies of prefixes & suffixes, but didn't describe how they were defined
kckluge > 29-12-2025, 07:56 AM
(28-12-2025, 05:05 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.(28-12-2025, 09:30 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.your using a different transcription alphabet
I use the GC transliteration for my analysis work. 101-C characters I convert to ee . I only used paragraph text, including 'Pb' text. Otherwise no labels, radial or circular text.
(28-12-2025, 09:30 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.relative frequencies of prefixes & suffixes, but didn't describe how they were defined
The prefixes I used were the most frequent prefixes to words on the pages labelled as language B. A word such as okeedy would contribute 5 prefixes to the list. o, ok, oke ,okee ,okeed . There is no harm in adding the longer strings since they can be expected to appear low in the frequency list. Similarly, suffices to contribute to the list would be y, dy, edy, eedy, keedy .
Other measures that might be useful to try: frequency of long words ( 6 or 7 GC-101 characters ), frequency of words containing e or some other character, ratios of t to k , ratios of iin to in .
In my opinion the PCA plots do have some limitations. I just wanted to highlight that they could lead to wrong conclusions. Correlation maps such as the ones I gave on You are not allowed to view links. Register or Login to view. are more useful to me to visualise how closely pages relate to each other.
dashstofsk > 29-12-2025, 10:21 AM
(29-12-2025, 07:56 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.You may want to do more preprocessing of the v101 transcription -- Glen's intent in having (for example) '9' and '(' variants for EVA 'y', or '7' and '8' for EVA 'd' (or, by my count, 6 variants of EVA 'r') wasn't to claim those were actually different glyphs, he was just giving you the option.
nablator > 29-12-2025, 01:25 PM
dashstofsk > 29-12-2025, 03:03 PM