(11-10-2025, 08:37 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Distinguishing whether that reflects stylistic, dialectal, or linguistic variation is much less straightforward.
Perhaps one can answer that question by looking at frequencies of
pairs of words in consecutive positions. Preferably excluding the first and last words of each line, and the first line of each parag.
If the A/B difference is due to spelling or dialect difference, there should be a roughly 1-1 mapping W -> m(W) between word types. Then it should be possible to match most of the most common word pairs in a way that is consistent with that mapping.
That is, if "W1 W2" is a common pair in language A, then "m(W1) m(W2)" should be common in language B.
Since daiin is the most common word in both A and B, we can start by guessing that m(dain) = daiin. Then we should look for the most common word pairs "daiin W" and "W daiin", and see whether we can guess m(W) for some of those words.
However, it is important to test this method with two texts on the same known language, with the same subject but different dialects and spellings. If it does not work on such test example, then we should try to understand why...
You are not allowed to view links.
Register or
Login to view. are four texts that could be used for that purpose. The files are in UTF-8 encoding. They are non-overlapping extracts from the same novel, with three different spelling systems and translated into a different but closely related language. Parags are separated by blank lines. More details are in You are not allowed to view links.
Register or
Login to view.. For better comparison with Voynichese, you may want to ignore all lines starting with "#", delete the string "\emph", map everything to lower case, map all letters with diacritics to ASCII letters without them, and replace all punctuation by spaces. You may also want to delete all parags that are too short (mostly dialog lines).
If this method works with such control texts but not with the VMS Herbal A/B, then either we are dealing with two different languages (or very different "dialects"); or A and B were copied from two distinct sources with very different "formulas". Compare for example a typical entry from You are not allowed to view links.
Register or
Login to view. with You are not allowed to view links.
Register or
Login to view.:
- Herba Bortines. For a twisted mouth due to some ailment. Take the leaves of this herb, cook them with wine, and apply the poultice for thirty days; it will restore the mouth to its normal state, and it is proven. Also, when cooked with red wine in the form of a poultice and applied to a cold gout for fifteen days, it heals the gout, it is proven. It is gathered in May. It grows in wild mountains and cold places.
- Sium aquaticum is a little shrub which is found in the water — upright, fat, with broad leaves similar to hipposelinum, yet somewhat smaller and aromatic — which is eaten (either boiled or raw) to break stones [kidney, bladder] and discharge them. Eaten they also induce the movement of urine, are abortifacient, expel the menstrual flow, and are good for dysentery. (Crateuas speaks of it thus: it is a herb like a shrub, little, with round leaves, bigger than black mint, similar to eruca). It is also called anagallis aquatica, schoenos aromatica, as well as a sort of juncus odoratus, darenion, or laver
Apart from the two entries being about different plants (one imaginary, one real) with different uses, note the very different syntactic structure and common phrases. The first one has more imperative and future mood ("take the leaves", "it will restore", etc.) whereas the second prefers descriptive mood ("is eaten", "they also induce", etc.) If such differences were to persist for many entries, they would probably result in "language" differences like VMS A and B.
All the best, --jorge