quimqu > 24-08-2025, 09:41 AM
(24-08-2025, 07:24 AM)obelus Wrote: You are not allowed to view links. Register or Login to view.a quantitative measure of effect size might help; conventional for these data would be "Cramér's V."
(24-08-2025, 07:24 AM)obelus Wrote: You are not allowed to view links. Register or Login to view.How was the text sample partitioned for classification? The number of text blocks tagged as paragraphs in the RF transliteration is less than 300.
oshfdk > 24-08-2025, 01:23 PM
(23-08-2025, 11:28 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.This is a very interesting question.
I performed a Chi² test to examine the correlation between NMF-derived topics and scribal hands, separately for Currier languages A and B.
quimqu > 24-08-2025, 01:41 PM
(24-08-2025, 01:23 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.this should be run from scratch, first identifying the topics using NMF separately for language A and language B. Just to avoid any possibility that the language information affects the outcome.
oshfdk > 24-08-2025, 03:05 PM
(24-08-2025, 01:41 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.If I follow your suggestion, I end up with topics 0 to N for language A and topics 0 to M for language B, but then I lose any potential connection between the two sets of topics. So, I feel this approach would fragment the analysis too much.
quimqu > 24-08-2025, 07:24 PM
(22-08-2025, 11:11 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.If you want to analyze a given Naibbe ciphertext as if it were a synthetic Voynich B, divide each ciphertext into four equal portions, aka each ~5000-5500 tokens long, and then subdivide from there. Each fourth will roughly correspond with one of the original plaintext sections. There are no exact equivalents to folios in these ciphertexts, but you could explore the statistical effect of smaller subdivisions by treating each ciphertext as if it were a corpus of N different documents each one roughly (total/N) tokens long, just as you have been doing with the various folios of the VMS.
quimqu > 25-08-2025, 09:27 PM