(19-04-2025, 01:08 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.unless you consider language-dependence, i.e. some words appearing more in A and others more in B.
I have considered the languages A and B separately. The tables of statistics that I have presented in this thread are for the separate languages. I believe that the split between A and B is significant and that each needs to be examined separately. Statistics applied to the whole of the manuscript are sometimes biased towards A or B ( e.g. distribution of
daiin ). If the issue is why there is an A/B split then I have already proposed a scenario for this in
You are not allowed to view links.
Register or
Login to view.
(19-04-2025, 01:00 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Unfortunately, my analysis of iin, in does not seem to me to be consistent with the meaningful language hypothesis.
This is assuming the text is a cipher? I'm not sure I understand how any distribution of almost anything in a ciphertext could be inconsistent with a meaningful plaintext.
(19-04-2025, 02:17 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.This is assuming the text is a cipher?
No, I do not assume that. I have never had much belief in the manuscript being in cypher. I have posted my disbelief on the subject in the following thread:
You are not allowed to view links.
Register or
Login to view.
I made some checks on the vocabulary of the different sections of the VMS and I have to say the situation is not as clear cut as I thought before.
I used the RF1a-n transcription, sections 'Herbal Currier A', 'Herbal Currier B', 'Stars Currier B', Pharmaceutical Currier A' and 'Balneological Currier B' as defined in the metadata of the transcription (I assigned manually to each section the pages marked 'Text only').
I checked the most frequent words with -aiin/-ain endings: daiin/dain, aiin/ain, qokaiin/qokain, okaiin/okain, otaiin/otain.
Herbal Currier A shows different ratios between the couplets: daiin/dain = 4.18, aiin/ain = 3.82, qokaiin/qokain = 1.34, okaiin/okain = 1.92, otaiin/otain = 1.92. The dependence of the -aiin/-ain ending ratio on what precedes it supports the idea that -aiin and -ain are different.
Herbal Currier B shows ratios rather similar to Currier A: daiin/dain = 3.7, aiin/ain = 3.86, qokaiin/qokain = 1.33, okaiin/okain = 1.7, otaiin/otain = 1.43. It too supports -aiin/-ain being different.
However....
Stars Currier B has a very different ratio for daiin/dain from both Herbal A and Herbal B: daiin/dain = 2.47, while the other ratios are more similar: aiin/ain = 3.23, qokaiin/qokain = 1.21, okaiin/okain = 1.57, otaiin/otain = 1.48.
Pharmaceutical Currier A has completely different ratios: daiin/dain = 8.23, aiin/ain = 8.8. And this supports, instead, the idea that -aiin/-ain are the same and in those days the scribe was particularly 'aiin-ish'. The qokaiin/qokain ratio is 2 but, while in Herbal A/Herbal B/Stars B they are common words, they are rare in Stars B appearing just 2 and 1 times respectively, so their ratio is a dubious number. The same goes for otaiin/otain: three and zero occurrences respectively, and the only case I've seen where there are more otaiin than okaiin (why Stars B has so few okaiin/otaiin with respect to the bulk of the manuscript I cannot say...).
Balneological Currier B is an outlier: all the ratios are more or less skewed toward -ain, to the point that some even invert: daiin/dain = 1.65, aiin/ain = 2, qokaiin/qokain = 0.52, okaiin/okain = 0.71, otaiin/otain = 0.5 (I'd notice Balneological Currier B is an outlier also when considering its vocabulary as a whole, ie. shedy/chedy/qokedy/qokain are much more frequent than in the rest of the VMS.). This supports both aiin/ain being different (dependence of the ratios on the prefix) and aiin/ain being equal (all the ratios skewed towards -ain).
Tentative (non-)conclusion: inside each section the ratios aiin/ain are different depending on what precedes them, which supports aiin/ain being different (with the notable exception of Stars B, where unfortunately only two ratios can be calculated reasonably). However, the ratios vary a lot among different sections, and more or less in the same direction for each word couplet, which instead supports aiin/ain being equivalent. I still think they are not the same, but not so firmly as before I looked, and all in all: I can't make sense of all the statistical differences I saw between the different sections.
(19-04-2025, 07:07 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I can't make sense of all the statistical differences I saw between the different sections
Certainly the ratios are higher in the language A pages. This follows because the frequencies of
in are lower in the language A pages. I have attached plots of frequencies for suffices
iin and
in for the major work groups to show this.
Generally I like to split the manuscript into six groups:
Bio B2
Herbal A1
Herbal B2
Pharma A1
Stars B3
Text B2
These being the most uniform. Each work group seems to exhibit its own language characteristics, in addition to the major differences between A and B. We all know about this, differences in the frequencies of
eo,ka,ol,q,d etc. Again, if the issue is why this is happening then the scenario I pointed to earlier might explain this. In short, it might be because the various sections were not written at the same time, and between each there might have been a shift in the author's use of his own script.
Could "m" be a variant of "d"? Also, what's going at the top-right corner of f3r? "daimm" but one "m" is a minim plus a flourish and the other is a "e" plus the same flourish?
Ah, oops, I thought EVA-g was different. In that case, I think voynichese.com has the wrong transcription, and I'll amend my first question to be, could "g" be a variant of "d"?
(10-05-2025, 11:00 PM)extent_of_foxes Wrote: You are not allowed to view links. Register or Login to view.what's going at the top-right corner of f3r? "daimm"
I take it you are curious to know more about the funny character
m. This might be a good opportunity to say something what I think might be happening. In my opinion
m might be the same as
r. There are several pieces of evidence that led me to this possibility.
Firstly have a look at some matrices of affinities for characters. Here they are, for the Herbal A1 pages and for the Bio B2 pages.
You will notice that the values for the character pairs
ar,am,or,om are all high. What is this thing that I call affinity? It is a measure of the liking that each character in a pair has for the other character. Thus when the measure for
ar in Herbal A1 is 5 this means that
ar occurs 5 times what would be expected if characters
a and
r appeared randomly. In Bio B2 the value is 9.21. Neither
r nor
m have any particular affinity with any other character. These affinity values immediately suggest that there is some commonality in the two characters.
Now list the top
m words and then count the words in which
m is replaced by
r.
Generally you will see that where there is an
m word there will be an
r word, and that the top
m words generally give top
r words. Again, commonality.
Also there is the visual evidence. From You are not allowed to view links.
Register or
Login to view.
[
attachment=10583]
Both characters start with a downstroke and have an upwards loop from near the top. The difference is that
m continues the loop with a downward swing. Otherwise both characters have the same 'body'. In both the top loop has the same forward lean. Why the downswing? I suspect that it might be just a particular whim of the authors. All of them tend to like to finish words with a flourish.
l,y,r,n are most frequently placed at the end of a word. But also perhaps it pleases the authors to occasionally close a line with a visual kick to emphasize the ending. And we all know that
m occurs generally at the line ends. Definitely so by hands 2 and 3 who write in language B. Hand 1 ( language A ) likes to use it mid-word also ( notably f3r, f3v, f24r, f24v, f52r, f54r, f54v. But notice also a strange thing, that of these, pairs f3r/v and f24r/v and f54r/v are on opposite sides of a page which suggests that the author might have done the pairs in one sitting. )
(11-05-2025, 04:07 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I take it you are curious to know more about the funny character m. This might be a good opportunity to say something what I think might be happening. In my opinion m might be the same as r. There are several pieces of evidence that led me to this possibility.
If symbols have any meaning at all, then the image you included in your post seems to me like a strong evidence that
r and
m are not the same. It shows two line-ending
r's and two line-ending
m's right on top of one another. To me it would appear strange to use two different versions of the same character this way.
The affinity statistics just show that they behave similarly, but maybe it's just a consequence of the curve-line system, if the system itself corresponds to some ground truth.
By the way, did you try doing row/column cosine similarity using the values from the affinity matrix?
I've found a cosine similarity chart I made some time ago, I'm not sure how accurate it is, but from these numbers it looks like
m is less similar to
r (0.33 preceding context similarity, 0.36 following context similarity) than, say, to
k or
s for preceding context (similarity in character affinities vector before
m or these characters) and
l, -
ain,
y for the similarity of character affinity vector after the characters in question.
[
attachment=10587]
There is a caveat though, this chart obviously uses some custom tokenization, treating ain/aiin/am as full tokens, also I find some numbers strange. I'm not using this chart to show any specific behavior, just to reiterate that character statistics do not necessarily mean anything by themselves, in the end it's up to interpretation.
Interestingly, we seem to have been doing similar kind of computations on character statistics, but you ended up believing that the MS is likely a hoax (if I understand your position correctly), and I believe that it's most likely a (meaningful) ciphertext.