Dunsel > 9 hours ago
(Today, 09:56 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Anyone intending to create a hoax manuscript would have done it in secret and not in some manuscript workshop factory. Under the hoax hypothesis the writer was not bound by any contract of work, was under no obligation to do it quickly, or indeed to do any of it at all. He wrote as much or as little as he wanted, whenever he wanted. Perhaps also only being able to do it in his spare time.
Section by section over a length of time is reasonable and can explain the different 'topics', handwriting styles, language clusters, quality of drawings.
Dunsel > 9 hours ago
(11 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 05:13 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.This is the closest chart I have to a less abrupt transition and you may remember it from my work on <ed>. It's showing the switch between scribe 1 and <ho> and scribe 2 and <ed>. And yes, the pages are reordered but, the background is shaded according to Currier A and B. So there is a gradual change but...
Thanks for the chart. But suppose that there are two sets of pages: Set A where ed is never used and the frequency of ho varies randomly and uniformly between 0.05 and 0.55, and set B where the frequency if ho varies randomly and uniformly between 0.05 and 0.10, while that of ed varies randomly and uniformly between 0.00 and 0.40.
If you plot the frequencies with the pages in some random order, but with all A pages first then the B pages, what you will see is just that: two different random distributions.
But if you sort the pages by decreasing frequency of ho or of the ratio ho/ed, the plot would look just like your plot. You will see a gradual transition from (ho = 0.55, ed = 0.00) to (ho = 0.05, ed = 0.40). A transition that does not exist in the data, and was created wholly by the sorting.
All the best, --stolfi
oshfdk > 8 hours ago
(10 hours ago)Dunsel Wrote: You are not allowed to view links. Register or Login to view.The system is trying to maintain textual plausibility and family continuity, not maximize combinatorial exploration. The Voynich appears to behave similarly. The existence of hundreds of theoretical edit possibilities does not mean a human production process would actually use them.
Jorge_Stolfi > 8 hours ago
(Yesterday, 04:49 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.“They didn’t know statistics.” - They did not need formal probability theory. Repetitive human copy-mutate behavior can naturally generate language-like statistical structure without mathematical understanding (again, You are not allowed to view links. Register or Login to view.).
Quote:"Why fool people with language-like properties if nobody could measure them?” - Medieval cryptographers absolutely knew what ordinary ciphers looked like, and Voynichese does not resemble a normal substitution cipher or shorthand system. The goal may have been plausibility without easy detection as either plain language or conventional cipher.
Quote:People love the mysterious. It's one of the reasons we're all here.
Quote:Look at the lengths to which people TODAY go to in order to scam others out of money. I cannot believe that the 15th century was any different in that regard.
dashstofsk > 8 hours ago
(9 hours ago)Dunsel Wrote: You are not allowed to view links. Register or Login to view.... this workshop thinking they could create collections of these books to sell.
... One small group to write the book and one charming salesman to promote it.
Dunsel > 8 hours ago
(8 hours ago)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand how would human production achieve this? It is trivial on a computer that can process the whole text up to the present moment, compute counts, check presence, etc. Humans can only physically look at past text and notice some words that are present there.
| Source Window | Copy | ED1 | ED2 | Any Match |
|---|---|---|---|---|
| Same page | 11 | 25 | 25 | 61 |
| 1 page back | 3 | 4 | 5 | 12 |
| 2 pages back | 0 | 0 | 2 | 2 |
| 3 pages back | 0 | 0 | 0 | 0 |
| 4 pages back | 1 | 0 | 0 | 1 |
| 5 pages back | 0 | 0 | 1 | 1 |
| Far lookback | 0 | 1 | 0 | 1 |
Dunsel > 7 hours ago
(8 hours ago)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.(9 hours ago)Dunsel Wrote: You are not allowed to view links. Register or Login to view.... this workshop thinking they could create collections of these books to sell.
... One small group to write the book and one charming salesman to promote it.
Somehow I cannot see that the trade in books and manuscripts was as quick and easy as it is today. I cannot see that there would have been a big market for such manuscripts. Not enough to justify a team of opportunists occupying themselves full time on the sharp practice.
I very strongly believe that the VMS is a collection of individual pieces of work, written section by section over a length of time; the reward from each section providing the enthusiasm to write the next one. It seems so logical. It just would have been too risky to try to write the whole manuscript not knowing if you would be able to find a buyer for it.
oshfdk > 7 hours ago
(8 hours ago)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Let's look at some of the things I've had to consider and would also have to be considered in your model.
All of those rules, you could look at the page and know almost instantly how to proceed. Software needs numbers. Weights, probabilities. So could your proposed method be modelled? Yes. Would this be easy? Oh, I can say for certain it would not.
- Current line position - Where are we on the line? Do we need a gallows token here?
- Current page position - Do we need to create a new paragraph? Does this need to be a short line to precede a paragraph on the next line?
- Selected source token - Which of those 20 words are we going to choose? Or, are we going to choose a word on the page we just wrote?
- Selected mutation type - Are we going to copy the word or mutate it? We need a probability.
- Copy vs mutate probability - What percentage of words are copied and which are modified?
- Glyph insertion/deletion weights - When a word is chosen to modify, what letter can be deleted? What letter can be inserted or swapped for another.
- Length-preservation bias - When we mutate, do we need to keep the same word length or can we shorten or lengthen it?
- Repeat-nearby-family bias - Do we need to make this word look like another word we just wrote down?
- Avoid-too-many-new-forms bias - Have we used a word and it's mutation too many times already. Do we need to find a new word?
- Proposed token - Ok, we have chosen the word and we're ready to write it.
- Rejection reason - Does the word look stupid? Does it have 4 vowels? 5 consonants? Have we repeated the word 4 times? Does it look like a possible word with vowels and consonants? Accept or reject and start over.
Dunsel > 6 hours ago
(8 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But forgeries have a logic. To make the investment and risk worthwhile, the forger will try to maximize the attractiveness of the work to potential buyers. A painting forger will try to produce a fake that looks like a Rembrandt, a Pollock, a Van Gogh, or at least the work of some minor known painter; not the work of a pensioner who just watched a couple of Bob Ross videos....
And the VMS does not look at all like that. Its 240 pages are 240 missed opportunities to make the work more appealing to the alleged targets of the scam -- whether they were rich nobles, scholars, physicians, alchemists...
Dunsel > 6 hours ago
(7 hours ago)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(8 hours ago)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Let's look at some of the things I've had to consider and would also have to be considered in your model.
All of those rules, you could look at the page and know almost instantly how to proceed. Software needs numbers. Weights, probabilities. So could your proposed method be modelled? Yes. Would this be easy? Oh, I can say for certain it would not.
- Current line position - Where are we on the line? Do we need a gallows token here?
- Current page position - Do we need to create a new paragraph? Does this need to be a short line to precede a paragraph on the next line?
- Selected source token - Which of those 20 words are we going to choose? Or, are we going to choose a word on the page we just wrote?
- Selected mutation type - Are we going to copy the word or mutate it? We need a probability.
- Copy vs mutate probability - What percentage of words are copied and which are modified?
- Glyph insertion/deletion weights - When a word is chosen to modify, what letter can be deleted? What letter can be inserted or swapped for another.
- Length-preservation bias - When we mutate, do we need to keep the same word length or can we shorten or lengthen it?
- Repeat-nearby-family bias - Do we need to make this word look like another word we just wrote down?
- Avoid-too-many-new-forms bias - Have we used a word and it's mutation too many times already. Do we need to find a new word?
- Proposed token - Ok, we have chosen the word and we're ready to write it.
- Rejection reason - Does the word look stupid? Does it have 4 vowels? 5 consonants? Have we repeated the word 4 times? Does it look like a possible word with vowels and consonants? Accept or reject and start over.
Ok, we can add the line/page relative/absolute position, this sounds reasonable.
As for the rest of the list, mostly it's ok, except the generic "Does the word look stupid?" and also "Have we used a word and it's mutation too many times already." - absolutely not. I don't think we can expect the scribes to keep the tally of all the words they have written. If some word occurs 2-3 times in the sample of 20 tokens, it can be excluded of course, otherwise it's unreasonable to expect the scribe to keep the count of all the tokens used.
No, I meeeeeeannnnnnnnnnnn youu youu youu youu youu need to youu need to make surrreeee it doesn't look stupid. A 'method' I have of filtering out bad words. And trust me, you don't do that and you will get garbage. My method is a 5 word moving window. It looks at the last 5 words to make sure it's not doing too many duplicate words. It looks at consonant and vowel positions in the words to make sure too many letters or vowels or consonants are not being repeated. The Voynich is surprisingly pronouncable because it does have letters that act and look like vowels and consonants. Again, these are all human doable. You can't just tell a piece of code to not make these words look stupid. It needs numbers.