08-02-2023, 01:59 AM
Thanks Marco, I'll try out the script as soon as I find the time.
Thanks Koen, I finally understand now. You used the Excel STANDARDIZE function to assign a z-score to each frame value of a manuscript using the average and stdev of the entire corpus.
How many manuscripts does your corpus contain, how many of each language?
If you find the time, could you make a graph with the average values of the corpus with error bars (the stdev or standard error) for comparison? And would you mind posting graphs of the raw non-normalized values of the VM texts?
So if your new normalized VM graphs are indeed correct what does this tell us?
.)all VM sections you measured perform in an extremely similar way and fundamentally different from other texts
.)all VM sections more or less follow a highly predictable hyperbolic or x^1/x function with very low values at small frames and a flat and linear distribution from medium to large frames.
.)randomly shuffling the text of other manuscripts drastically modifies and unifies curves, making them quite similar to those of the VM but less flat at medium frame sizes.
.)randomly shuffling VM text only superficially modifies curves also making them less flat at medium frame sizes
.)Thorsten's artificial 'Voynichese' performs extremely similar to actual Voynichese here
I'm no expert on this topic but this are my 2cts:
I find it very,very hard to believe it's possible to arbitrarily write a text with properties that generate such perfect curves, neither deliberately in a manuscript with meaning nor as pseudorandom nonsense. I might be wrong but I don't see how this would be possible. To me the VM texts appear to have been generated by some sort of algorithm just as in Thorsten's experiment.
Clearly the VM doesn't behave like any ordinary text, including repetitive poetry. It would therefore be interesting to compare it to something different with a more formalized structure like accounting or bookkeeping texts containing Roman numerals. But I still doubt this could explain the MATTR properties. Maybe we need to look into non-contemporary sources like computer code as well? I wonder if it's possible to find anything human-made that generates such MATTR curves.
I was never fond of the 'meaningless text hypothesis' but seeing those graphs I can't help but wonder. Maybe there is still meaning encoded but there appears to be some sort of low entropy highly predictable carrier function. Clearly the text isn't randomly shuffled, see the inhomogenities thread. In fact it is quite the opposite and shows some peculiar degree of order and predictability on vord, line paragraph, page and quire level. I have no idea how to reconcile this with the MATTR data suggesting a close relation with randomness though.
Thanks Koen, I finally understand now. You used the Excel STANDARDIZE function to assign a z-score to each frame value of a manuscript using the average and stdev of the entire corpus.
How many manuscripts does your corpus contain, how many of each language?
If you find the time, could you make a graph with the average values of the corpus with error bars (the stdev or standard error) for comparison? And would you mind posting graphs of the raw non-normalized values of the VM texts?
So if your new normalized VM graphs are indeed correct what does this tell us?
.)all VM sections you measured perform in an extremely similar way and fundamentally different from other texts
.)all VM sections more or less follow a highly predictable hyperbolic or x^1/x function with very low values at small frames and a flat and linear distribution from medium to large frames.
.)randomly shuffling the text of other manuscripts drastically modifies and unifies curves, making them quite similar to those of the VM but less flat at medium frame sizes.
.)randomly shuffling VM text only superficially modifies curves also making them less flat at medium frame sizes
.)Thorsten's artificial 'Voynichese' performs extremely similar to actual Voynichese here
I'm no expert on this topic but this are my 2cts:
I find it very,very hard to believe it's possible to arbitrarily write a text with properties that generate such perfect curves, neither deliberately in a manuscript with meaning nor as pseudorandom nonsense. I might be wrong but I don't see how this would be possible. To me the VM texts appear to have been generated by some sort of algorithm just as in Thorsten's experiment.
Clearly the VM doesn't behave like any ordinary text, including repetitive poetry. It would therefore be interesting to compare it to something different with a more formalized structure like accounting or bookkeeping texts containing Roman numerals. But I still doubt this could explain the MATTR properties. Maybe we need to look into non-contemporary sources like computer code as well? I wonder if it's possible to find anything human-made that generates such MATTR curves.
I was never fond of the 'meaningless text hypothesis' but seeing those graphs I can't help but wonder. Maybe there is still meaning encoded but there appears to be some sort of low entropy highly predictable carrier function. Clearly the text isn't randomly shuffled, see the inhomogenities thread. In fact it is quite the opposite and shows some peculiar degree of order and predictability on vord, line paragraph, page and quire level. I have no idea how to reconcile this with the MATTR data suggesting a close relation with randomness though.