The Voynich Ninja
Statistical properties of notebook jottings at the line, paragraph, and page level? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Statistical properties of notebook jottings at the line, paragraph, and page level? (/thread-3486.html)



Statistical properties of notebook jottings at the line, paragraph, and page level? - RenegadeHealer - 20-02-2021

I write manuscripts. In medical school, I was known affectionately as that quirky student who attended every class, and furiously hand-wrote as much of what was said, and presented on the screen, as I could manage to write down. I challenged myself to write as grammatically correctly and completely as possible, so that I'd be able to use my notebook as a text to study later. It seemed to work; I can attest that writing something out by hand is a great way to commit it to memory.  But I eventually realized that writing down every word in grammatically correct and complete sentences wasn't necessary. I just needed to get down the most important words, or even just pieces of the most important words. This was plenty enough to commit the important fact to memory, and recall it later. It was just hard to allow myself to do it.

I'm wondering if anyone has ever studied and statistically analyzed the way most people abbreviate quick notes in a notebook, for later reference. I'm not talking on the level of letters and words, so much as on the level of lines, sentences, paragraphs, and pages. I wonder how much this varies based on the language used, and the personal preferences and ease of handwriting of the writer. There are two competing demands that I see, which place some universal constraints on how notes get abbreviated. The notes need to be quick to write, such that the writer can keep up with what's happening around him. But they also have to contain enough information that the writer can retrieve it unambiguously later on. I wouldn't be surprised if some general statements can be made about what words tend to be kept, and what words tend to be omitted, when people to take notes in a notebook. In particular, I imagine function words are often omitted, unless they're absolutely necessary for interpreting the note correctly.

Notebooks intended to be read by people other than the writer, without the presence of the writer, can't be as liberal with grammar and word omission as notebooks only intended for the writer's eyes. Still, I reckon they're much less dense with function words and complete sentences than published books.

Has anyone found any verbatim transcriptions of a lengthy notebook, in any language and from any point in history? I'm currently looking for some. If I find any that are usable, I'd be very interested to let Marco, Nablator, and other computational linguistics / statistics gurus here run their tests on them, and see how they match up to transcriptions of both the VMs and to published works in known languages. I wouldn't be surprised to find most notebooks sporting:

  • A high type to token ratio
  • Line as a functional unit effects
  • Page and paragraph clustering of similar words
  • Weak word order
  • Ambiguous spacing and spelling
  • Loose grammar and punctuation
  • A slow but steady drift in the rules of abbreviation and overall style of notetaking, from the first page to the last
I fail to see how fast and "just meaningful enough" notetaking would naturally lead to high rates of exact reduplication, though.

If the VMs was someone's notebook, which I think is a reasonable idea given the crudeness of its images and layout (Fisk, 2017), then a transcription of a long, old notebook in a known readable language, might be a good apples-to-apples comparison.

The bad news is, if the VMs really is a notebook, abbreviated in typical ways but written in a novel script and/or language, that does not give me confidence in our ability to decode it. To anyone who has ever borrowed someone else's class notebook, and had to call the owner to make sense of a lot of the notes, I don't think this needs much explanation. This is especially true if the ways that notetakers condense language tends to vary a lot based on the writer, language, and subject matter, and few generalizations about this condensation process can be made.


RE: Statistical properties of notebook jottings at the line, paragraph, and page level? - Koen G - 20-02-2021

It's an interesting idea, and I agree something along these lines would explain certain properties of Voynichese. 

However, the active glyph set of Voynichese appears to be quite small. Wouldn't abbreviations increase the glyph set, if anything? Of course this problem can be solved in a number of ways (like verbose cipher). 

Interpret the glyph set as you will though, positional rigidity remains your enemy. Not just for glyphs that could be abbreviation signs. All glyphs. 

In other words, I'm afraid the notebook scenario cannot account for character entropy, at least not without developing a radically different understanding of Voynichese first. But others may correct me on this.


RE: Statistical properties of notebook jottings at the line, paragraph, and page level? - RenegadeHealer - 20-02-2021

Oh for sure, Koen. The rigidity of glyph placement, and even the use of a novel glyph set, are not explained by this idea. What I set out to account for is mostly the weak-to-nonexistent word order, which remains a glaring problem even to anyone who makes sense of the odd glyph placement and word structure. Even the best anatomy-of-a-vord researchers (Tiltman, Reed, Stolfi, Smith, Ponzi, Fisk) express unease and chagrin in their writing about how their "grammar" systems don't touch the problem of vord order being rather nonexistent.

Perhaps fallaciously, I'm approaching glyph order and word order as two separate hard-to-account-for problems. Because honestly, I cannot conceive of a single solution that could possibly explain both, which is compatible with a meaningful text. In my imagination, the writer used one algorithm to change words to vords, and another algorithm entirely (syntax and grammar simplification, as in notebooks) to render lines of vords.


RE: Statistical properties of notebook jottings at the line, paragraph, and page level? - Koen G - 21-02-2021

Ah, I understand. I agree it is a good idea to tackle one problem at a time. 

My prediction is that TTR will remain language dependant, but TTR is not our greatest problem (it only is when looking at really small windows, i.e. reduplication patterns).

What I would test then is whether you get the "bag of words" effect on the one hand and reduplication patterns on the other. The first will probably depend on the style of the note taker? It would be interesting if someone with experience along these lines could test this. Reduplication patterns are probably trickier, and I suspect you need specific circumstances for these to occur regularly.


RE: Statistical properties of notebook jottings at the line, paragraph, and page level? - DONJCH - 21-02-2021

(21-02-2021, 12:07 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I suspect you need specific circumstances for these to occur regularly

Like when the lecture gets boring maybe?