Emma May Smith > 26-05-2019, 11:18 AM
Quote:when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail.
Anton > 26-05-2019, 01:00 PM
Quote:I'm most interested in this claim:
Quote:when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail.
bi3mw > 26-05-2019, 03:00 PM
Torsten > 26-05-2019, 07:57 PM
(25-05-2019, 10:20 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.(25-05-2019, 10:03 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.(25-05-2019, 08:54 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.It seems like the language network graphs are also wonky. Vietnamese look particularly bad, but it's easier to point out the errors on the Greek graph:Why is it full of errors like this?
- Multiple single letters are shown as being unconnected to anything despite two letter words which contain that letter existing elsewhere in the graph.
- [ll] occurs at least twice unconnected by any chain.
- [mn] occurs in two different networks.
- [ma] and [mo] are unconnected to either of the two [mn] despite an "edit distance" of one.
- [ot'] is unconnected to [tot'] despite an "edit distance" of one.
- [de] is connected to, um, [de], but not to anything else.
- [o] is connected to both [oe] and [ok], but not [ot].
Please look into the greek You are not allowed to view links. Register or Login to view.. The text is using stress marks. What you interpret as [de] is in fact written as δὲ and δέ. Anyway, do you really believe that the picture for Greek becomes different if the marks are removed?
Hi Torsten, there are no such stress marks represented on the words in question, though they are represented on other words. Besides, that only accounts for one or my objections. Nor does it account for the weirdness in the Vietnamese graph.
This could all be very quickly cleared up if you gave us access to your paper. I'm keen to know if you're made any advance from five years ago, as You are not allowed to view links. Register or Login to view. I'm sorry to judge your paper by your old research and the peripheral information, but you leave us no other choice.
Cryptologia will have given you a certain number of free to access papers you can share with your peers, and you also have the right to share preprint versions of the paper. Can you at least state that you will do this at some point in the future?
Torsten > 26-05-2019, 08:52 PM
(26-05-2019, 03:00 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.What I noticed, the word frequency in the "self-citation text generator" has a different course than in the VMS (Top 30).
You are not allowed to view links. Register or Login to view.
Torsten > 26-05-2019, 09:12 PM
(26-05-2019, 01:00 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Quote:I'm most interested in this claim:
Quote:when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail.
That's a very interesting angle. In particular, it is interesting in which detail they do differ, and what is their relation to most frequent words on the whole. If the page-frequent words are the same as the corpus-frequent words, then that would be a bit trivial.
bi3mw > 26-05-2019, 09:41 PM
(26-05-2019, 08:52 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.....
Did you use numbers for the whole VMS or for the VMS without labels? For the whole VMS I would expect 836 [daiin]-tokens, 537 [ol]-tokens, 501 [chedy]-tokens etc.
Emma May Smith > 26-05-2019, 09:55 PM
(26-05-2019, 09:12 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.(26-05-2019, 01:00 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Quote:I'm most interested in this claim:
Quote:when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail.
That's a very interesting angle. In particular, it is interesting in which detail they do differ, and what is their relation to most frequent words on the whole. If the page-frequent words are the same as the corpus-frequent words, then that would be a bit trivial.
There is much more said in section 2. For instance, we also say that "a token dominating one page might be rare or missing on the next one".
See for instance the pages You are not allowed to view links. Register or Login to view. and f1v. On page You are not allowed to view links. Register or Login to view. the most frequent tokens are [daiin] and [dain] whereas on page You are not allowed to view links. Register or Login to view. only one instance of [daiin] exists:
You are not allowed to view links. Register or Login to view. daiin (7) / dain (6)
You are not allowed to view links. Register or Login to view. chol (5) / shol (4) / ... / daiin (1)
Torsten > 26-05-2019, 10:07 PM
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm most interested in this claim:
Quote:when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail.
This is quite a strong claim and yet has the greatest implications. It's certainly nothing I've ever seen before, though is perhaps not surprising given the general word structure of the Voynich text.
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.If we look at only the ten most common words in the text we can see some such pairs/groups: [daiin, aiin], [chedy, shedy], [chol, ol, or, ar], [dar, ar]. There are many, many more the further down the wordlist we go.[/qote]
This would suggest that, if we allow ourselves the ability to look at the most common ten words (not what they claim) then we would always find such a similar pairing.
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Figure 3 (page 6) shows just how many pairings some words can have.
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Moreover, it would suggest that the similar pairs should tend to be the same words. If the common word pairs on any page tend to be common overall, then the import of the claim is reduced. It is almost like claiming that "a" and "an" (or "an" and "and" for that matter) occur frequently in an English language text.
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.When we look at the additional material on Github this is indeed what we find. Under section 1.3 you can look at the manuscript page by page and the most
common words pairs are listed. From a casual browse we can see that in general the same words do occur as pairs time and again. Sometimes we see some word pairs which are uncommon, but they are not the majority.
We also see an obvious change from Currier A to Currier B, though some word pairs are still shared.
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.My worry is that this observation about the occurrence of similar word pairs is a very simple and obvious fact arising from the word structure, rather than anything deeper.
It illustrates the way in which the rigid word structure in the Voynich text, coupled with most "structurally valid" words actually occurring, creates lots of similar words in general.
(26-05-2019, 11:18 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I think it's best now to turn to Figure 4 (page 8) as a lot of the argument hinges on this.
The graph shows that the edit distance between words is lower the nearer the words occur. Words are more similar by up to 0.3 of an edit distance within 30 (Herbal A) or 60 lines (Quire 20),
before reaching a stable edit distance beyond those numbers of lines. A comparison with a text in English (Alice in Wonderland) is made.
I don't feel that the comparison text is really fair, as a coherent story with a single topic is likely to be much more "flat" in terms of word choice.
The Voynich text is likely to switch topic on a regular basis, and would have more profitably been compared with other herbals.
Indeed, I note that many pages in the Herbal A section have about 10 to 15 lines of text, which could account for much of the difference in edit distance if the similar words are actually different morphological forms of the same word. Likewise, Quire 20 pages have around 30 to 60 lines, meaning that some kind of thematic ordering could influence the similarity of the vocabulary.
Emma May Smith > 26-05-2019, 10:50 PM
Quote:This would mean that every page has it's own topic.