Hey peoples, fairly new here been a viewer for a while, starting to test things to help me understand the problem better.
I'm treaing the manuscript as a homophonic cipher over italian and trying to optimise against a codebook. The basic idea, take a voynich word, assign it an italian word, then check if the decoded line produces trigrams, three-word sequences that actually appear in real italian text. If swapping in a different italian word improves the count, keep it, then repeat thousands of times. I don't know Italian so it's a lot of "copy, paste, check, translate, repeat". I'm basically checking whether the decoded text produces word sequences that look like real Italian. No secrets here... i'm comparing against this specific book - You are not allowed to view links.
Register or
Login to view. for Italian.
So it felt like it was working, scores were climbing, but I'm a bit stuck, basically 45% of my entries are collapsing into i,e, and a. The seem to help form valid trigrams with anything so the optimiser keeps picking them. Trigram score looked great, but decoded txt reads like trash just "i e a i " e.t.c with occasional real words.
Anyone else run into this? Is this just a fault of my method, or is there potentially a better dataset based on 1400/1500 italian, or anyone has a better idea how to pull out of this rut, as I think i've hit a wall. Is there by any chance a source of n grams based on this that exists.. I can't seem to find one or understand how to find one perhaps, dont' mind if it'is a paid resource I need to purchase.
it does make me wonder, if we had access to a super computer could we just bascially brute force our way to success here, if we coudl assume other things such as short hand and exclude those entries for example?