julian > 04-10-2016, 06:55 AM
davidjackson > 04-10-2016, 08:24 AM
Quote: This random process continues over many pairings/chromosome and many generations, using selection between each generation to refine the pairings (I'll spare you the details!).That's fine, but only if it builds on the previous generation to produce a coherent translation across all words. It should be discarding unmatched chromosomes and keeping the favourable ones that link together, but it seems to be doing this only on a word by word basis - not across the whole transcription.
Quote: [3] erratum interim ratis da carumMeanwhile, the error was that the raft was expensive?
-JKP- > 04-10-2016, 08:32 AM
(04-10-2016, 06:55 AM)julian Wrote: You are not allowed to view links. Register or Login to view.I've recently repurposed my genetic algorithm code to use EVA rather than Voyn_101. The GA seems to do better with EVA, and I'd like to report an interesting result using Latin as a base language for You are not allowed to view links. Register or Login to view. (a folio I picked at random).
The way this works is that the GA reads in the EVA transcription for the given folio(s), line by line and word by word, and as it does so it creates frequency tables of all the ngrams it finds. Right now it uses ngrams up to 3 glyphs long.
It then reads in a very large Latin word list, to use as a validation dictionary.
It then prepares a set of Latin letters, nulls and scribal abbreviations, currently numbering around 60 items in total.
Then it randomly pairs each EVA ngram with one of the Latin letters, nulls or abbreviations, and using that pairing (called a chromosome in the jargon), applies it to all lines and words in the EVA, so as to produce new words in plaintext. Each plaintext word is checked for validity in the Latin dictionary, and scored. If the word is valid, it gets a high score. If the word is long, it gets a higher score. All the word scores are summed. If a consecutive sequence of valid Latin words appear, that causes the overall score of the chromosome to increase according to the length of the sequence. The idea here is to reward chromosomes that produce sequences of valid, long Latin words.
This random process continues over many pairings/chromosome and many generations, using selection between each generation to refine the pairings (I'll spare you the details!).
Here are details for one of the better results (with a score of over 22000):
A) The list of letters, nulls and abbreviations used is as follows:
'a', 'b', 'c', 'd', 'e', 'g', 'h', 'i', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'x', 'y', 'z',
' ', ' ', ' ', (nulls)
'qu',
'ra', 're', 'ca', 'ci', 'co', 'us', 'os', 'is', 'ur', 'um', 'er', 'in', 'im', 'nt', 'nd'
'quo', 'cum', 'con', 'cun', 'cus', 'cre', 'car', 'cer', 'cri', 'cis',
'ent', 'est', 'rum', 'tis', 'tum', 'tur', 'ter', 'mum',
'ntum', 'quon', 'eius', 'etam'
B) The best chromosome of VM glyph pairing to the Latin ngrams in A), includes the following
[font=voynich] a = r[/font]
[font=voynich] 8 = t[/font]
[font=voynich] c = re[/font]
[font=voynich] h = ur[/font]
[font=voynich] o = er[/font]
[font=voynich] y = tum[/font]
[font=voynich] s = u[/font]
[font=voynich] k = [font=Arial]est[/font][/font]
[font=voynich] 9 = um[/font]
[font=voynich] 8a = c[/font]
[font=voynich] co = m[/font]
[font=voynich] ii = <null>[/font]
[font=voynich] 4o = in[/font]
(The remaining pairs are omitted for brevity.)
I found the 9 = um equivalence that the GA discovered to be striking (Brumbaugh claimed this equivalence in his solution), but I suppose it's sort of obvious.
B) The best pairing translates the following valid Latin words on f3r:
'ratis', 'carus', 'ratum', 'cum', 'inque', 'cercis', 'erratum', 'interim', 'da', 'carum', 'uterum', 'certis', 'ra', 'ius', 'caro', 'pratis', 'inda', 'is', 'pratum', 'us', 'istis', 'sus', 'sum', 'corda', 'iratum', 'irent', 'inest', 'iterum', 'tergum', 'istum', 'peius', 'creo', 'irem'
(All these words appear in the Latin word list I'm using.)
C) The longest sequence of valid words is 16 (spanning folio lines 2,3,4 and 5):
"ratum cum inque cercis erratum interim ratis da carum uterum ratum certis cercis cercis ra ius"
D) The translation, using this ngram pairing, of the first few lines of f3r:
tsheos qopal chol cthol daimm
estseru inquonrtis ratis carus dciscis
ycheor chor dam qotcham cham
umterim ratum cum inque cercis
ochor qocheor chol daiin cthy
erratum interim ratis da carum
schey chor chal cham cham cho
uterum ratum certis cercis cercis ra
qokol chololy s cham cthol
ius ratisusum u cercis carus
ychtaiin chor cthom otal dam
umturestcarum ratum caro prtis cum
otchol qodaiin chom shom damo
pratis inda racis iscis cumer
ysheor chor chol oky damo
umsim ratum ratis coum cumer
I expect the Latin above makes no sense at all, but the "look and feel" of the word lengths and the vocabulary size I find encouraging.
I'd welcome suggestions of Latin abbreviations, prefixes and suffixes that I could include in (or remove from) the list in A) above (which I gleaned mostly from d'Imperio's summary of Cappelli).
Julian
MarcoP > 04-10-2016, 10:44 AM
(04-10-2016, 06:55 AM)julian Wrote: You are not allowed to view links. Register or Login to view.I'd welcome suggestions of Latin abbreviations, prefixes and suffixes that I could include in (or remove from) the list in A) above (which I gleaned mostly from d'Imperio's summary of Cappelli).
ThomasCoon > 04-10-2016, 02:33 PM
julian > 04-10-2016, 05:53 PM
(04-10-2016, 08:24 AM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.An interesting concept - translation by brute force, eh?
My concern is that there doesn't seem to be consistency in your translation. For example, in the first three lines above the suffix os [font=Arial]appears as -ru, -im and -tum.[/font]
[font=Arial][font=Eva]he [font=Arial]appears as -se-, -ter- and -erra-.[/font][/font][/font]
Quote: This random process continues over many pairings/chromosome and many generations, using selection between each generation to refine the pairings (I'll spare you the details!).That's fine, but only if it builds on the previous generation to produce a coherent translation across all words. It should be discarding unmatched chromosomes and keeping the favourable ones that link together, but it seems to be doing this only on a word by word basis - not across the whole transcription.
Quote: [3] erratum interim ratis da carumMeanwhile, the error was that the raft was expensive?
(04-10-2016, 02:33 PM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view.This is wonderful work, Julian! Much praise!
I just have one question: when your program checks a Latin dictionary for word validity, does the program also account for the fact that nouns may be in different grammatical cases? (Ex: in a dictionary only the form femina might be listed for "woman", but that noun can also appear as feminae, feminam, feminis, feminas, feminarum in a Latin text). It looks like you have accounted for that, but I just wanted to ask.
(04-10-2016, 08:32 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.(04-10-2016, 06:55 AM)julian Wrote: You are not allowed to view links. Register or Login to view.I've recently repurposed my genetic algorithm code to use EVA rather than Voyn_101. The GA seems to do better with EVA, and I'd like to report an interesting result using Latin as a base language for You are not allowed to view links. Register or Login to view. (a folio I picked at random).
The way this works is that the GA reads in the EVA transcription for the given folio(s), line by line and word by word, and as it does so it creates frequency tables of all the ngrams it finds. Right now it uses ngrams up to 3 glyphs long.
It then reads in a very large Latin word list, to use as a validation dictionary.
It then prepares a set of Latin letters, nulls and scribal abbreviations, currently numbering around 60 items in total.
Then it randomly pairs each EVA ngram with one of the Latin letters, nulls or abbreviations, and using that pairing (called a chromosome in the jargon), applies it to all lines and words in the EVA, so as to produce new words in plaintext. Each plaintext word is checked for validity in the Latin dictionary, and scored. If the word is valid, it gets a high score. If the word is long, it gets a higher score. All the word scores are summed. If a consecutive sequence of valid Latin words appear, that causes the overall score of the chromosome to increase according to the length of the sequence. The idea here is to reward chromosomes that produce sequences of valid, long Latin words.
This random process continues over many pairings/chromosome and many generations, using selection between each generation to refine the pairings (I'll spare you the details!).
Here are details for one of the better results (with a score of over 22000):
A) The list of letters, nulls and abbreviations used is as follows:
'a', 'b', 'c', 'd', 'e', 'g', 'h', 'i', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'x', 'y', 'z',
' ', ' ', ' ', (nulls)
'qu',
'ra', 're', 'ca', 'ci', 'co', 'us', 'os', 'is', 'ur', 'um', 'er', 'in', 'im', 'nt', 'nd'
'quo', 'cum', 'con', 'cun', 'cus', 'cre', 'car', 'cer', 'cri', 'cis',
'ent', 'est', 'rum', 'tis', 'tum', 'tur', 'ter', 'mum',
'ntum', 'quon', 'eius', 'etam'
B) The best chromosome of VM glyph pairing to the Latin ngrams in A), includes the following
[font=voynich] a = r[/font]
[font=voynich] 8 = t[/font]
[font=voynich] c = re[/font]
[font=voynich] h = ur[/font]
[font=voynich] o = er[/font]
[font=voynich] y = tum[/font]
[font=voynich] s = u[/font]
[font=voynich] k = [font=Arial]est[/font][/font]
[font=voynich] 9 = um[/font]
[font=voynich] 8a = c[/font]
[font=voynich] co = m[/font]
[font=voynich] ii = <null>[/font]
[font=voynich] 4o = in[/font]
(The remaining pairs are omitted for brevity.)
I found the 9 = um equivalence that the GA discovered to be striking (Brumbaugh claimed this equivalence in his solution), but I suppose it's sort of obvious.
B) The best pairing translates the following valid Latin words on f3r:
'ratis', 'carus', 'ratum', 'cum', 'inque', 'cercis', 'erratum', 'interim', 'da', 'carum', 'uterum', 'certis', 'ra', 'ius', 'caro', 'pratis', 'inda', 'is', 'pratum', 'us', 'istis', 'sus', 'sum', 'corda', 'iratum', 'irent', 'inest', 'iterum', 'tergum', 'istum', 'peius', 'creo', 'irem'
(All these words appear in the Latin word list I'm using.)
C) The longest sequence of valid words is 16 (spanning folio lines 2,3,4 and 5):
"ratum cum inque cercis erratum interim ratis da carum uterum ratum certis cercis cercis ra ius"
D) The translation, using this ngram pairing, of the first few lines of f3r:
tsheos qopal chol cthol daimm
estseru inquonrtis ratis carus dciscis
ycheor chor dam qotcham cham
umterim ratum cum inque cercis
ochor qocheor chol daiin cthy
erratum interim ratis da carum
schey chor chal cham cham cho
uterum ratum certis cercis cercis ra
qokol chololy s cham cthol
ius ratisusum u cercis carus
ychtaiin chor cthom otal dam
umturestcarum ratum caro prtis cum
otchol qodaiin chom shom damo
pratis inda racis iscis cumer
ysheor chor chol oky damo
umsim ratum ratis coum cumer
I expect the Latin above makes no sense at all, but the "look and feel" of the word lengths and the vocabulary size I find encouraging.
I'd welcome suggestions of Latin abbreviations, prefixes and suffixes that I could include in (or remove from) the list in A) above (which I gleaned mostly from d'Imperio's summary of Cappelli).
Julian
I'm always interested in genetic algorithms.
Unfortunately, I don't have time to comment overall right now (my workday hasn't ended yet), but wanted to point out that the Latin 9 abbreviation (EVA-y) that is usually -um or -us at the end of a word means com- or con- when it's at the beginning of a word.
Also, the EVA-j shape is actually three different abbreviations in Latin... -cis, -ris, and -tis. The first part of the shape (straight or curved) determines which one it is in Latin. The -cis is usually pretty clear, the -ris and -tis are sometimes less clear, depending on the scribe, and sometimes distinguished by context.
R. Sale > 04-10-2016, 08:27 PM
Anton > 04-10-2016, 08:41 PM
Quote:If there was some way of computing whether the translated Latin makes sense grammatically, that would be extremely useful: the chromosome's score could be boosted if it produced Latin that made sense.
julian > 04-10-2016, 09:20 PM
(04-10-2016, 08:27 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.Among other inconsistencies, 'ratis' and 'ratum' originate from multiple VMs words.
Is it possible to shift to triglyphs alone? And pump up the Latin vocabulary? Or to add quadglyphs like Latin '-orum'? Seems to me that the more complex elements that are frequently found in Latin are the ones to be searching for. While the monoglyphs and diglyphs create a noise that obscures those potential patterns.
R. Sale > 05-10-2016, 12:19 AM