Dunsel > Yesterday, 01:02 AM
(Yesterday, 12:37 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If VMS was a cipher and different cipher tables (keys, mappings) were used for different folios, this behavior of ed would be easy to explain. Say, for some folios ed maps to plaintext ST, for other folios ed maps to plaintext GP, very different statistics. However, even under this scenario it's hard to explain why it's only ed that behaves like this.
Is it possible to probe trigrams/tetragrams and see whether any particular longer ngram that contains ed causes this effect?
oshfdk > Yesterday, 01:07 AM
(Yesterday, 01:02 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Excellent question. You'd think that if ed exhibits that behavior then edy, would also. But, it doesn't.
Fontanellean > Yesterday, 01:07 AM
(Yesterday, 12:29 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I understand what you're saying about bigram statistics being driven by common words and subject matter. That absolutely happens in natural languages. Your examples make sense in that context. But what I'm seeing here doesn’t look like normal topic drift. I'm not just saying that “ed is frequent.” I'm saying that a very high-frequency bigram simply disappears on many pages, and then appears heavily on others, within and across topics. That kind of page-level on/off behavior is what caught my attention.
If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.
oshfdk > Yesterday, 01:11 AM
(Yesterday, 01:07 AM)Fontanellean Wrote: You are not allowed to view links. Register or Login to view.You don't even have to look at bigrams. Take an instruction manual with one page written in British English and another in Pinyin for Mandarin. "Z" will hardly appear at all on the former but will be all over the place in the latter.
Dunsel > Yesterday, 01:15 AM
(Yesterday, 01:07 AM)Fontanellean Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 12:29 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I understand what you're saying about bigram statistics being driven by common words and subject matter. That absolutely happens in natural languages. Your examples make sense in that context. But what I'm seeing here doesn’t look like normal topic drift. I'm not just saying that “ed is frequent.” I'm saying that a very high-frequency bigram simply disappears on many pages, and then appears heavily on others, within and across topics. That kind of page-level on/off behavior is what caught my attention.
If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.
You don't even have to look at bigrams. Take an instruction manual with one page written in British English and another in Pinyin for Mandarin. "Z" will hardly appear at all on the former but will be all over the place in the latter.
Dunsel > Yesterday, 01:27 AM
(Yesterday, 01:07 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 01:02 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Excellent question. You'd think that if ed exhibits that behavior then edy, would also. But, it doesn't.
For clarity, could you show ed trigrams on the same graph as all other trigrams, just in a different color?
Given these charts are cut off at Y~0.5, I'm not sure how they relate to other trigrams.
oshfdk > Yesterday, 02:16 AM
Jorge_Stolfi > Yesterday, 03:10 AM
(Yesterday, 12:29 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.
Dunsel > Yesterday, 03:55 AM
(Yesterday, 03:10 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.What is the average size of a page in your plots? Culpeper's herbal has on average ~3900 letters per "page".
All the best, --stolfi
Dunsel > Yesterday, 05:27 AM