kostrubaty > Yesterday, 05:13 AM
oshfdk > Yesterday, 12:12 PM
Jorge_Stolfi > Yesterday, 01:27 PM
(Yesterday, 12:12 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 05:13 AM)kostrubaty Wrote: You are not allowed to view links. Register or Login to view.So here's my analysisn You are not allowed to view links. Register or Login to view.your generated sample, which you describe as "indistinguishable from Voynichese to the eye" appears a bit odd to me:
Results strongly indicate that the text was generated via simple methodology (order-3 markov chain).
Also this shows that there is zero measurable correlation between text and the imagery. Does it prove anything? Guess not, however result seem to leave little to imagination.
qoeeal shd shos lsheodain oteedy dchdar ol ar okold qokal qopchdy or olkchy qokeey yteey otair deeol shar chedy chor ykodaiin
It's reasonably good, which is of little surprise given the order-3 Markov chain
JoeyB > Yesterday, 02:06 PM
kostrubaty > 9 hours ago
(Yesterday, 12:12 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure order-3 Markov chain is "simple". Can you explain how would the XV century scribes implement it? For starters, this requires some well designed state transitions, and a lot (thousands?) of them. It's one thing to approximate some sequence of characters using a Markov chain, it's a different thing to explain how this sequence of characters was produced.Well if you look at the values the entropy is ~2 that means that characters are chosen from very limited pool, this means that given a character you only have 2 options to choose for the next one. They could have used some form of volvelle with cuotouts, but it can be just a simple flow diagram.
Also, your generated sample, which you describe as "indistinguishable from Voynichese to the eye" appears a bit odd to me:
qoeeal shd shos lsheodain oteedy dchdar ol ar okold qokal qopchdy or olkchy qokeey yteey otair deeol shar chedy chor ykodaiin
It's reasonably good, which is of little surprise given the order-3 Markov chain, however I don't think qoeeal looks very plausible (it doesn't appear in the MS), and neither does lsheodain or ykodaiin, though they look less iffy, while okold doesn't appear as a standalone word, dchdar is quite rare, deeol only appears once as a standalone word and once as a prefix. So, to my eye this looks like very unlikely Voynichese, because it's packed with rare and unusual words.
kostrubaty > 9 hours ago
(Yesterday, 01:27 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 12:12 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 05:13 AM)kostrubaty Wrote: You are not allowed to view links. Register or Login to view.So here's my analysisn You are not allowed to view links. Register or Login to view.your generated sample, which you describe as "indistinguishable from Voynichese to the eye" appears a bit odd to me:
Results strongly indicate that the text was generated via simple methodology (order-3 markov chain).
Also this shows that there is zero measurable correlation between text and the imagery. Does it prove anything? Guess not, however result seem to leave little to imagination.
qoeeal shd shos lsheodain oteedy dchdar ol ar okold qokal qopchdy or olkchy qokeey yteey otair deeol shar chedy chor ykodaiin
It's reasonably good, which is of little surprise given the order-3 Markov chain
Indeed. An order-3 Markov can probably generate also text that will be "indistinguishable from English to the eye" (of someone who cannot read English). Here is the output of Claude Shannon's classic experiment: "IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID. PONDENOME OF DEMONSTURES OF THE REPTAGIN IS. REGOACTIONA OF CRE"
An order-3 Markov will reproduce the frequency distributions of characters, bigrams, and trigrams of the training set. Thus it will reproduce the character entropies H0, H1, and H2, and any correlations of characters across word gaps. It will also approximate correlations over longer distances, because of chained correlations (the 4th letter depends on letters 2 amd 3, but these depend on letter 1, so letter 4 will partly depend on letter 1 too.) IIRC, there is a good chance that it will produce output with a Zipf-like word frequency distribution.
An order-3 Markov may fail to capture some subtler properties of the training set, such as the distribution of word type lengths or a rule like "a word can have at most one letter 't' or 'k', but not both". But the "core-mantle-crust" structure of Voynichese words makes it possible for an order-3 Markov to "learn" the rule "a word can have at most one gallows". That's because the last 3 glyphs will almost always tell the generator whether it is in the "prefix" part ("uphill", before the gallows) or in the "suffix" part ("downhill", after the gallows); and thus it can avoid generating a second gallows in the latter.
On the other hand, an order-3 Markov may miss the rules "at most three benches" (Ch, Sh, ee; counting platform gallows like CTh) and "at most three dealers" (d, l, r, s) that seem to hold for Voynichese words. Do we see exceptions to these rules in the output of your generator?
All the best, --stolfi
kostrubaty > 9 hours ago
(Yesterday, 02:06 PM)JoeyB Wrote: You are not allowed to view links. Register or Login to view.Nice, thanks for sharing all of that esp. the githib repo. FWIW, when I have played with Claude in past I watched it talk itself out of the markov-4 story when I pressed it with two things: first, it basically broke wqhen I forced it to adversarially explain Tavie's vertical impact effect. That thread is active again now but its basically that the first glyph of a line is conditioned on the first glyph of the line above it and it avoids repeating that glyph (and then there's directed transitions right, o->q, y->o, etc). But the left to right character chain Claude is proposing has no page in it, the models I played with didnt have a variable for "look at the line above" and it was only checking reading order. Tavie's work and the threads on this were gold for beating up the linear model.
Second thing I had trouble with was figuring out if the model was really finding coupling or was just memorizing. Claude got very excited that it had reproduced coupling early but then it would turn out it was just handing the samples back, IIRC the threads on generators and held-out vocabulary were super helpful in pressing this point.
Good luck!
oshfdk > 8 hours ago
(9 hours ago)kostrubaty Wrote: You are not allowed to view links. Register or Login to view.As for indistinguishable -> It simply means that given 2 text excerpts one in voynichese one generated you're unable to tell which is which. You're able to tell only because you've checked with known text.
(9 hours ago)kostrubaty Wrote: You are not allowed to view links. Register or Login to view.This actually neatly explains why is the structure like this, they probably prepared the cartouches by prefilling first letter of each line.
Then scribe would use the diagram to look up the text and write it down letter by letter.
Jorge_Stolfi > 8 hours ago
(9 hours ago)kostrubaty Wrote: You are not allowed to view links. Register or Login to view.Well if you look at the values the entropy is ~2 that means that characters are chosen from very limited pool, this means that given a character you only have 2 options to choose for the next one. They could have used some form of volvelle with cuotouts, but it can be just a simple flow diagram.
Quote:As for indistinguishable -> It simply means that given 2 text excerpts one in voynichese one generated you're unable to tell which is which.
oshfdk > 8 hours ago
(8 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.For a Markov of order 3 the Author would need a table with ~20x20 = 400 entries listing the alternatives for the next character given the previous 2, each with its probability.