Finally, here are the counts for crust-mantle-core patterns with a core (gallows). Here I make a distinction between simple gallows and those with platforms.
Again, the CMC pattern of a word is obtained by deleting all "O" elements {a} {o} {y},
and mapping the other elements to the classes
- "Q" just {q}.
- "D" the /dealers/ {d} {l} {r} {s}.
- "X" the /benches/ {ch} {sh} {ee} with an optional 'e' suffix.
- "G" the /simple gallows/ {k} {t} {p} {f} with optional 'e' suffix.
- "H" the /platform gallows/ {cth} {ckh} {cph} {cfh} with optional 'e' or 'h' suffix.
- "N" the /codas/ {n}, {in}, {iin}, {iiin}, {m} {im} {iim} {iiim}, {ir}, {iir}, {iir}.
In this modified CMC model, a valid word with core ("G" or "H") must have the form
Q^q D^d X^x (G|H) X^y D^e N^n
where q,n are 0 or 1, and d+e and x+y are in 0..3.
Here are the counts of the patterns with G core (left) and with H core (right):
COUNTING TOKENS WITH G AND H CORE BY CMC PATTERN
14060.375000 1.00000 TOTAL 1487.250000 1.00000 TOTAL
2008.125000 0.14282 GD 394.750000 0.26542 XH
1538.750000 0.10944 GXD 375.250000 0.25231 HD
1314.500000 0.09349 QGD 315.000000 0.21180 H
1175.375000 0.08359 GN 76.750000 0.05161 XHD
1025.000000 0.07290 GX 58.500000 0.03933 QH
858.500000 0.06106 QGXD 55.000000 0.03698 HN
813.875000 0.05788 QGN 41.000000 0.02757 QHD
687.000000 0.04886 QGX 33.000000 0.02219 HX
483.000000 0.03435 G 31.750000 0.02135 HDD
417.625000 0.02970 XG 21.000000 0.01412 DXH
410.750000 0.02921 QG 11.250000 0.00756 HDN
274.625000 0.01953 XGD 10.000000 0.00672 DH
255.500000 0.01817 GDD 8.000000 0.00538 HXD
244.750000 0.01741 DGD 6.500000 0.00437 XXH
240.875000 0.01713 DGXD 6.000000 0.00403 DHD
234.625000 0.01669 DGN 5.500000 0.00370 HDDD
201.000000 0.01430 XGX 5.000000 0.00336 XHN
191.500000 0.01362 XGN 4.500000 0.00303 XHX
189.000000 0.01344 GDN 4.000000 0.00269 QHX
174.125000 0.01238 DGX 3.250000 0.00219 HDDN
170.125000 0.01210 GXDD 3.000000 0.00202 QXH
111.750000 0.00795 GXDN 2.000000 0.00134 DDH
102.687500 0.00730 DG 2.000000 0.00134 QDH
101.625000 0.00723 XGXD 2.000000 0.00134 QHDD
75.000000 0.00533 XXG 2.000000 0.00134 QHN
69.000000 0.00491 GXX 2.000000 0.00134 QHXD
67.250000 0.00478 QGDD 1.500000 0.00101 DDXH
62.000000 0.00441 GXN 1.000000 0.00067 DXHD
39.500000 0.00281 GXXD 1.000000 0.00067 DXXH
34.500000 0.00245 QGDN 1.000000 0.00067 DXXHD
33.000000 0.00235 QGXDD 1.000000 0.00067 HXDD
27.875000 0.00198 GDDN 1.000000 0.00067 HXN
18.500000 0.00132 XGDD 1.000000 0.00067 XXHD
18.375000 0.00131 DDGXD 0.250000 0.00017 XHDD
16.875000 0.00120 DDGD 0.250000 0.00017 XHDDD
16.875000 0.00120 DXG 0.250000 0.00017 XHDN
16.750000 0.00119 QGXDN
16.500000 0.00117 QGXN
16.500000 0.00117 QGXX
16.500000 0.00117 XXGX
16.125000 0.00115 GDDD
15.500000 0.00110 DDGN
14.125000 0.00100 DGDD
13.500000 0.00096 GXDDD
13.000000 0.00092 QDGXD
12.250000 0.00087 DDGX
12.000000 0.00085 XGDN
11.750000 0.00084 DGXX
10.750000 0.00076 XXGD
9.750000 0.00069 XXGN
9.500000 0.00068 QGXXD
9.250000 0.00066 DGDN
9.125000 0.00065 DGXDD
8.562500 0.00061 DDG
8.500000 0.00060 QDGX
8.000000 0.00057 QDGN
7.500000 0.00053 DGXDN
7.500000 0.00053 XGXX
7.250000 0.00052 QDG
7.000000 0.00050 XGXN
6.750000 0.00048 DXGX
6.500000 0.00046 DGXXD
4.750000 0.00034 GXDDN
4.625000 0.00033 DXGD
4.500000 0.00032 DXGXD
4.125000 0.00029 QDGD
4.000000 0.00028 QGDDD
3.750000 0.00027 DXGN
3.000000 0.00021 DGXN
3.000000 0.00021 DXXG
3.000000 0.00021 XGXDD
2.500000 0.00018 DXXGD
2.000000 0.00014 GXXN
2.000000 0.00014 QGDDN
2.000000 0.00014 QGXXN
1.625000 0.00012 DGDDN
1.500000 0.00011 DDXXG
1.500000 0.00011 QDDGXD
1.500000 0.00011 QXG
1.500000 0.00011 XXGXD
1.250000 0.00009 XGXDN
1.000000 0.00007 DXXGX
1.000000 0.00007 GXXDN
1.000000 0.00007 QGXDDD
1.000000 0.00007 QXGN
1.000000 0.00007 QXGXD
1.000000 0.00007 QXXG
1.000000 0.00007 XGDDN
1.000000 0.00007 XGXDDN
1.000000 0.00007 XGXXD
0.500000 0.00004 DDDGN
0.500000 0.00004 DDXGX
0.500000 0.00004 GXXDD
0.500000 0.00004 GXXDDD
0.500000 0.00004 GXXX
0.500000 0.00004 GXXXD
0.500000 0.00004 QGXXDD
0.500000 0.00004 XGDDD
0.500000 0.00004 XXGXN
0.250000 0.00002 DGXXDD
0.250000 0.00002 GDDDN
0.250000 0.00002 XXGDN
0.125000 0.00001 DDGXDN
I don't know yet what to conclude from these numbers.
For either class of core, the formula above allows 2 x 10 x10 x 2 = 400 possible patterns, but only 103 "G" patterns occur in the parags text, and only 36 "H" patterns, even with fractional counting. Obviously there are some combinations of q,d,x,y,e,n that are so rare that they could be excluded from the CMC model; but I don't see a simple rule yet.
One notable thing is that words with H core are not only ~1/10 as common as those with G core, but the distribution of H patterns decays significantly faster. In particular, there is a large drop from the first three patterns to the fourth one in the above list.
The most common "G" pattern with three benches "X" is XXGX, that occurs 16.5 times (~0.1% of all "G" patterns). There are no "H" patterns with three benches.
That may be just a consequence of "H" patterns being less common, but it is also consistent with the theory that an "H" element should be counted as one bench for the rule x+y <= 3.
All the best, --stolfi