Aga Tentakulus > 15-12-2025, 09:31 AM
dashstofsk > 15-12-2025, 11:03 AM
(13-12-2025, 02:22 PM)LisaFaginDavis Wrote: You are not allowed to view links. Register or Login to view.104v and 115r (conjoint) are more closely related than, say, 104v and 105r (consecutive).
dashstofsk > 15-12-2025, 11:49 AM
nablator > 15-12-2025, 02:14 PM
(15-12-2025, 11:49 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Brighter colours show where there is better correlation.
The idea is to test various methods to find out what makes the studied property appear (a "very strong textual correlation across conjoint bifolia" in Q13 and Q20). LSA similarity is a black box, it does not help understand what matters exactly and what does not: word order or proximity? Sub-word tokens (BPE segmentation or prefix-stem-suffix decomposition by some grammar)? Glyph bigram/trigram statistics?
dashstofsk > 15-12-2025, 03:09 PM
DG97EEB > 23-12-2025, 12:21 PM

dashstofsk > 23-12-2025, 01:41 PM
nablator > 23-12-2025, 01:53 PM
(23-12-2025, 12:21 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.Fascinating analysis.. Have you looked at Cosine similarity based purely on token commonality?
DG97EEB > 23-12-2025, 07:43 PM
(14-12-2025, 03:37 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(13-12-2025, 02:22 PM)LisaFaginDavis Wrote: You are not allowed to view links. Register or Login to view.In other words, 104v and 115r (conjoint) are more closely related than, say, 104v and 105r (consecutive).
Yes, but there are 4 exceptions in Q20 (4/10 versos are not more closely related to the conjoint recto than to the consecutive recto), not a "very" strong textual correlation across conjoint bifolia. Probably there is a better metric than the one I'm using. And, of course, the result would be different if the bifolios were reordered, so there may be a way to make consecutive pages more closely related.
Comparison of the vocabulary (set of words) of Q20 pages using the RF1b-er transliteration (without '?' words, with all spaces kept, i.e. all '.' and all ','). The % is the You are not allowed to view links. Register or Login to view. coefficient.