21-05-2022, 05:56 PM
Bigrams across uncertain spaces.
All errors are mine and there probably are some.
Extract text from ZL_ivtff_2a using IVTT: from Windows cmdline "ivtt -x7 -s0 -h0 ZL_ivtff_2a.txt ZL2adot-comma.txt"
text file contains : pure text, keep commas and dots, also keeps '?', '*' and @123; notation
Find all the commas denoting uncertain spaces, get EVA-character from either side of the comma, concatenate to form bigram i.e ('X,Y' = 'XY')
Find all the dots denoting certain spaces, get EVA-character from either side of the comma, concatenate to form bigram i.e ('X.Y' = 'XY')
Count them, Rank them, Simple stat them:
abs(Percentage divide) =
percentage occurrence of Comma-bigram, percentage occurrence of Dot-bigram, take which ever number is higher and divide by the lower
abs(Rank Comma - Rank Dot) =
Absolute value of the Comma bigram rank subtracted from the Dot bigram rank
Did the 'percent divide' and 'rank subtract' because it just makes it easier to spot the differences and provides a simple way to compare them.
Top12 shown here:
Observations:
the 'o<character>' family turn up a lot with high abs(Rank subtract) scores ; ol, ok, or,oa,ot
The bigram 'lg' has the highest abs(Rank subtract) score:
comma dot
('R58', 5, 'lg', 0.183) ('R206', 1, 'lg', 0.003) 61.0 148 --lg
One conclusion is that at least some of those 'l,g' bigrams are real bigrams and any apparent space is a scribal artifact.
Questions:
What does it mean when the Dot-bigram occurrence percentage is higher than the Comma-bigram occurrence percentage? e.g
('R38', 15, 'yo', 0.548) ('R2', 2687, 'yo', 8.699) 15.874 36
Data attached:[attachment=6556]
All errors are mine and there probably are some.
Extract text from ZL_ivtff_2a using IVTT: from Windows cmdline "ivtt -x7 -s0 -h0 ZL_ivtff_2a.txt ZL2adot-comma.txt"
text file contains : pure text, keep commas and dots, also keeps '?', '*' and @123; notation
Find all the commas denoting uncertain spaces, get EVA-character from either side of the comma, concatenate to form bigram i.e ('X,Y' = 'XY')
Find all the dots denoting certain spaces, get EVA-character from either side of the comma, concatenate to form bigram i.e ('X.Y' = 'XY')
Count them, Rank them, Simple stat them:
abs(Percentage divide) =
percentage occurrence of Comma-bigram, percentage occurrence of Dot-bigram, take which ever number is higher and divide by the lower
abs(Rank Comma - Rank Dot) =
Absolute value of the Comma bigram rank subtracted from the Dot bigram rank
Did the 'percent divide' and 'rank subtract' because it just makes it easier to spot the differences and provides a simple way to compare them.
Top12 shown here:
Code:
Commas(X,Y) 2737 Total Dots(X.Y) 30890 Total abs(Percentage divide) abs(Rank Comma - Rank Dot)
Rank count bigram % Rank count bigram %
('R1', 285, 'ra', 10.413) ('R14', 730, 'ra', 2.363) 4.407 13
('R2', 147, 'lc', 5.371) ('R10', 1146, 'lc', 3.71) 1.448 8
('R2', 147, 'lk', 5.371) ('R33', 201, 'lk', 0.651) 8.25 31 --lk
('R4', 125, 'ls', 4.567) ('R15', 672, 'ls', 2.175) 2.1 11
('R5', 119, 'sa', 4.348) ('R28', 245, 'sa', 0.793) 5.483 23
('R6', 101, 'yk', 3.69) ('R18', 443, 'yk', 1.434) 2.573 12
('R7', 93, 'ol', 3.398) ('R39', 118, 'ol', 0.382) 8.895 32 --ol
('R8', 90, 'ld', 3.288) ('R17', 569, 'ld', 1.842) 1.785 9
('R9', 83, 'ro', 3.033) ('R6', 1355, 'ro', 4.387) 1.446 3
('R10', 78, 'lo', 2.85) ('R11', 996, 'lo', 3.224) 1.131 1
('R11', 73, 'yd', 2.667) ('R7', 1275, 'yd', 4.128) 1.548 4
('R12', 68, 'yt', 2.484) ('R23', 312, 'yt', 1.01) 2.459 11
('R12', 68, 'ok', 2.484) ('R52', 65, 'ok', 0.21) 11.829 40 --ok
Observations:
the 'o<character>' family turn up a lot with high abs(Rank subtract) scores ; ol, ok, or,oa,ot
The bigram 'lg' has the highest abs(Rank subtract) score:
comma dot
('R58', 5, 'lg', 0.183) ('R206', 1, 'lg', 0.003) 61.0 148 --lg
One conclusion is that at least some of those 'l,g' bigrams are real bigrams and any apparent space is a scribal artifact.
Questions:
What does it mean when the Dot-bigram occurrence percentage is higher than the Comma-bigram occurrence percentage? e.g
('R38', 15, 'yo', 0.548) ('R2', 2687, 'yo', 8.699) 15.874 36
Data attached:[attachment=6556]