I give credit to davidsch, on his blog, for inspiring this idea. For all I know it's already been done, or at least attempted. If so, I'd be interested to peruse the finished product and hear how it went. If not, I'm considering making this my next little side project. Any feedback would be appreciated.
The size of the VMS's vocabulary isn't all that large. There are only 1,109 types with four or more tokens (You are not allowed to view links.
Register or
Login to view.). I had an idea recently that it might be helpful to create a Voynichese lexicon, which lists all vord types which occur in the VMS, in [some sort of] alphabetical order. Obviously we don't have the meanings for any of them. But there are other potentially helpful pieces of data that could be given for each entry, for example:
- Total number of tokens
- Overall rank order based on total number of tokens. I'd use bolding and/or asterisks and/or larger font size for the 350 most common vords, so that they are easy to index, and stand out at page-level view.
- Token breakdown by Currier A pages and Currier B pages, and the frequency differential of tokens in A versus B.
- Token breakdown by apparent subject section (Herbal A, balneological, etc.)
- Token breakdown by line position (line-start, midline, just before drawing, just after drawing, line-end, or label)
- Vords significantly more likely to occur in a line containing this vord than a line without it, broken down by Currier A vs. B
- Rate of reduplication
I'm sure there are many more potentially helpful metrics that could be entered for each entry that I haven't thought of.
One way such a lexicon might be useful is to look for evidence of inflection. If Voynichese is any sort of symbolic language, it's very possible that vords expressing similar meanings are written similarly, and make use of affixation (suffixes, prefixes, and/or infixes) to show subtle but significant differences in the meaning of the vord, its grammatical function, or its relationship to other vords in the line. This is a reasonable idea because it's highly likely that the author spoke at least one agglutinative language which inflects words to nuance their meanings and function.
It might be interesting to list together all the types that appear to have a similar stem, but different endings, along with the metrics for each apparent variation. I wonder if high-level patterns in token occurrence could be found, which would lend support to the idea that, for example EVA [otolar] and [otolaiin] are inflected forms of [otol]. Then, can anything be generalized about the situations where [otol] becomes [otolar]? This could at least establish that [otol], [otolar], and [otolaiin] are meaningfully connected, and give us more clues not only to the meaning of [otol], but also of [-ar] and [-aiin] on the ends of other vords.
I wouldn't be surprised if this approach showed a large number of the VMS's many one-token types (
hapax legomena) to be inflected or compounded forms of much more common types.