I write manuscripts. In medical school, I was known affectionately as that quirky student who attended every class, and furiously hand-wrote as much of what was said, and presented on the screen, as I could manage to write down. I challenged myself to write as grammatically correctly and completely as possible, so that I'd be able to use my notebook as a text to study later. It seemed to work; I can attest that writing something out by hand is a great way to commit it to memory. But I eventually realized that writing down every word in grammatically correct and complete sentences wasn't necessary. I just needed to get down the most important words, or even just pieces of the most important words. This was plenty enough to commit the important fact to memory, and recall it later. It was just hard to allow myself to do it.
I'm wondering if anyone has ever studied and statistically analyzed the way most people abbreviate quick notes in a notebook, for later reference. I'm not talking on the level of letters and words, so much as on the level of lines, sentences, paragraphs, and pages. I wonder how much this varies based on the language used, and the personal preferences and ease of handwriting of the writer. There are two competing demands that I see, which place some universal constraints on how notes get abbreviated. The notes need to be quick to write, such that the writer can keep up with what's happening around him. But they also have to contain enough information that the writer can retrieve it unambiguously later on. I wouldn't be surprised if some general statements can be made about what words tend to be kept, and what words tend to be omitted, when people to take notes in a notebook. In particular, I imagine function words are often omitted, unless they're absolutely necessary for interpreting the note correctly.
Notebooks intended to be read by people other than the writer, without the presence of the writer, can't be as liberal with grammar and word omission as notebooks only intended for the writer's eyes. Still, I reckon they're much less dense with function words and complete sentences than published books.
Has anyone found any verbatim transcriptions of a lengthy notebook, in any language and from any point in history? I'm currently looking for some. If I find any that are usable, I'd be very interested to let Marco, Nablator, and other computational linguistics / statistics gurus here run their tests on them, and see how they match up to transcriptions of both the VMs and to published works in known languages. I wouldn't be surprised to find most notebooks sporting:
- A high type to token ratio
- Line as a functional unit effects
- Page and paragraph clustering of similar words
- Weak word order
- Ambiguous spacing and spelling
- Loose grammar and punctuation
- A slow but steady drift in the rules of abbreviation and overall style of notetaking, from the first page to the last
I fail to see how fast and "just meaningful
enough" notetaking would naturally lead to high rates of exact reduplication, though.
If the VMs was someone's notebook, which I think is a reasonable idea given the crudeness of its images and layout (Fisk, 2017), then a transcription of a long, old notebook in a known readable language, might be a good apples-to-apples comparison.
The bad news is, if the VMs really is a notebook, abbreviated in typical ways but written in a novel script and/or language, that does not give me confidence in our ability to decode it. To anyone who has ever borrowed someone else's class notebook, and had to call the owner to make sense of a lot of the notes, I don't think this needs much explanation. This is especially true if the ways that notetakers condense language tends to vary a lot based on the writer, language, and subject matter, and few generalizations about this condensation process can be made.