Koen G > 06-12-2017, 09:26 AM
Koen G > 07-12-2017, 10:25 AM
Koen G > 10-12-2017, 10:33 PM
Quote:The two attached histograms are based on the length of word occurrences (not word "types" as in Rene's graph discussed above).
Latin has a spike for words 2 characters long. These correspond to a number of hugely frequent particles like prepositions (in, ex, ab, de...), conjunctions (et, ac...), pronouns (id, tu, me...).
Latin also has a long tail of long words, which of course get less frequent with the growing of length. In the vulgate, 9% of the words are 10 or more characters long.
Voynichese has a simpler distribution, with a single peak corresponding to length 5 (corresponding to the central, but secondary, peak for Latin). Long words are rarer: less than 1% are 10 or more characters long.
The mapping from Voynichese to Latin should stretch the histogram in two opposite directions, possibly by making short words shorter and long words longer.
Koen G > 10-12-2017, 10:36 PM
Quote:It is very much simplified with respect to the graphs posted by VViews: I have only produced five-bars histograms (first bar is beginning of word, last bar is end of word). The colouring of the graphs was done manually (more frequent letters are darker).
This is based on unique words in Takahashi’s transcription. I have applied the following substitutions:
ckh -> K
cth -> T
cfh -> F
cph -> P
ch -> C
sh -> S
iin -> M
in -> N
eee -> W
ee -> E
I only include plots for the 25 more frequent “characters”.
The process was rather complex and (as always) I don't exclude I have made errors.
Koen G > 14-12-2017, 10:04 AM
Koen G > 29-01-2018, 09:48 PM
(29-01-2018, 09:12 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.These graphs represent the expected (red) and actual (green) counts for consecutive words in which the last letter of the first word is the same as the first letter of the second word. For instance (where '.' represent word boundaries):
but.these
others.said
seventh.hour
[...]
In my opinion, this evidence (if confirmed) could support the word-boundaries transformations discussed by You are not allowed to view links. Register or Login to view..
It is clear that, if what we observe in the VMS is due to euphonic transformations, these are quite different from those that happen in Latin languages. In these languages, transformations are limited to short and frequent words (mostly prepositions and articles) and typically affect the end of the words. The phenomena discussed by Emma affect longer words and mostly seem to happen at the beginning of words (I am thinking in particular of the You are not allowed to view links. Register or Login to view. on the basis of the ending of the preceding word).
Koen G > 04-02-2018, 10:43 PM
Quote:In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation).
Heaps' law means that as more instance text is gathered, there will be diminishing returns in terms of discovery of the full vocabulary from which the distinct terms are drawn.
Quote:These rapid tests should not be relied upon for any real conclusions, but they do seem to point towards Voynichese tokens having unique individual identities. There is more work to be done in analysing random and cipher texts, but I hope the tools I include in this post may help other researchers carry out their own tests.