The Voynich Ninja - Syllables at the end of a line in VMS

Pages: 1 2 3

In the VMS the lines very often end so nicely that one could assume that the last words of a line could be "filler words". If this is so, the last words thus only consist of "character salad", then this could have an effect on whether syllables occur in it and, if so, whether this occurrence deviates in the frequency from other comparison texts. Therefore I counted the syllables in the last word of each line and put them in relation to the total number of lines. The value in the (latin mapped ) VMS deviates very clearly from the Italian divina commedia and the Latin Bible. I would have liked to take the Ackermann (ger) in addition, but unfortunately the text is available to me only reformatted. I would be grateful for links to other, useful comparison texts.

Here are the results in detail:

Dante Alighieri: La divina commedia (ita)
Total letters and spaces: 530871
Total syllables: 161284
Syllable percentage: 30.381015350%
Average syllables per word: 1.651214218
Total syllables in last words of lines: 34805

Average syllables per line ( last words ): 2.427296185

Part of the Bible: Latin Vulgate (lat)
Total letters and spaces: 1252196
Total syllables: 397977
Syllable percentage: 31.782324812%
Average syllables per word: 2.095453971
Total syllables in last words of lines: 24565

Average syllables per line ( last words ): 2.174086202

Voynich Manuscript: (alphabet mapped to latin)
Total letters and spaces: 234658
Total syllables: 55220
Syllable percentage: 23.532119084%
Average syllables per word: 1.457530486
Total syllables in last words of lines: 7528

Average syllables per line ( last words ): 1.442974890

The link to the code:
You are not allowed to view links. Register or Login to view.

(12-10-2023, 06:12 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Voynich Manuscript: (alphabet mapped to latin)

If you know where vowels are, good for you. Smile

(12-10-2023, 06:12 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.In the VMS the lines very often end so nicely that one could assume that the last words of a line could be "filler words". If this is so, the last words thus only consist of "character salad",

I have no expertise in the matter, so this is more like a question to those better informed, but I many times wonder when I read about "nulls" and "fillers" and think how would those affect the information content of the MS. VMS has a quite rigid way of writing that restricts the number of events and thus the information conveyed, and certainty can only contain very limited information.
As to some extent, the Shannon entropy measures the average information you get from all the events, if we even remove from that group of potential events "nulls" and "fillers", how much information is really left to the reader?

(12-10-2023, 07:43 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(12-10-2023, 06:12 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Voynich Manuscript: (alphabet mapped to latin)

If you know where vowels are, good for you.

Well, "know" is probably a bit high Wink

When mapping the VMS, I simply replaced the letters with correspondingly frequent letters in Latin texts.

cat Voynich_full_clean01.txt | tr oehyacdiklrstnqpmfgxzv IEUTASRNMOCDLPBQGHFXYZ

See this conversation:
You are not allowed to view links. Register or Login to view.

This presupposes, of course, that Latin is the correct assumption. Interestingly, the value is even higher if you simply keep the original alphabet. What one could do as an experiment is to change the original alphabet incrementally until one gets a value of, say, over 2. Letter frequencies from other ( medieval ) texts could be taken as a template. This would have to be done with a program, since it would be a bit tedious to do it by hand. - If this were to succeed, my basic assumption would of course be disproved.

Voynich Manuscript ( with original alphabet )
Total letters and spaces: 234658
Total syllables: 63340
Syllable percentage: 26.992474154%
Average syllables per word: 1.671857678
Total syllables in last words of lines: 9140

Average syllables per line ( last words ): 1.751964731

(13-10-2023, 09:53 AM)Scarecrow Wrote: You are not allowed to view links. Register or Login to view.As to some extent, the Shannon entropy measures the average information you get from all the events, if we even remove from that group of potential events "nulls" and "fillers", how much information is really left to the reader?

The Shannon entropy h1 doesn't limit the amount of information much, as it is close to normal. The conditional entropy h2 is better as an upper bound but there is no way to know how much of the information is actually wasted by the encoding process. There could be any amount of nulls that we wouldn't be able to identify.

(13-10-2023, 10:37 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.When mapping the VMS, I simply replaced the letters with correspondingly frequent letters in Latin texts.
cat Voynich_full_clean01.txt | tr oehyacdiklrstnqpmfgxzv IEUTASRNMOCDLPBQGHFXYZ

Well, you should know that simple substitution is impossible: the conditional character entropy is preserved by this mapping, and it's far too low for Latin or any European language. There is no way to count syllables unless you've actually deciphered the text.

If you believe that a simple substitution is nevertheless correct, why not use Sukhotin's algorithm or some other algorithm to detect vowels? You are not allowed to view links. Register or Login to view.

Anyway there isn't necessarily a link between a supposed "character salad" at the end of lines and the number of syllables (counted by an program that doesn't work for any language)... Rolleyes

(13-10-2023, 11:19 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.There is no way to count syllables unless you've actually deciphered the text.

Well, the punch line of the whole thing is that o lot of the last words of a line were just not generated with the (so far unknown ) procedure. The VMS would therefore not be "from one cast". This possible peculiarity of the last words as "filler words" also allows conjectures ( e.g. meaningless with a tendency to low syllable formation when writing ), which, related to the whole manuscript, would indeed make little sense. But, related to the concrete assumptions, one can very well examine whether conspicuously few syllables are to be found or not.

P.S.: Mapping the alphabet throughout the manuscript is, of course, easier than mapping only the last words of a line. But this should not lead to the assumption that one wants to "decode" the entire manuscript.

Note: I have only now read the changes in your last post. Sukhotin's algorithm would be quite a possibility. But I wanted to keep it as simple as possible for now.

(13-10-2023, 11:19 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.why not use Sukhotin's algorithm or some other algorithm to detect vowels?

Is the code correct ?

Code:
#!/usr/bin/env python

from collections import defaultdict

from operator import itemgetter

import re

import sys

def classify_chars(text):

    if not text:

        return set(), set()

    text = f"{text}{text[0]}"

    matrix = defaultdict(int)

    row_sums = defaultdict(int)

    consonants = set()

    vowels = set()

    for first, second in zip(text, text[1:]):

        consonants.add(first)

        if first != second:

            matrix[first, second] += 1

            matrix[second, first] += 1

            row_sums[first] += 1

            row_sums[second] += 1

    while any(row_sum > 0 for row_sum in row_sums.values()):

        vowel = max(row_sums.items(), key=itemgetter(1))[0]

        vowels.add(vowel)

        consonants -= vowels

        for consonant in consonants:

            frequency = matrix.get((consonant, vowel), 0)

            row_sums[consonant] -= 2 * frequency

        row_sums = {key: value for key, value in row_sums.items() if key not in vowels}

    return vowels, consonants

def sukhotin_syllabification(text):

    vowels, consonants = classify_chars(text)

    syllables = []

    current_syllable = ""

    for char in text:

        current_syllable += char

        if char in consonants:

            syllables.append(current_syllable)

            current_syllable = ""

    return syllables

def calculate_syllable_statistics(text):

    total_letters = len(re.sub(r'[^\w\s]', '', text))  # Count letters and spaces

    total_syllables = len(sukhotin_syllabification(text))

    lines = text.split('\n')

    last_word_syllables = 0

    total_lines = len(lines)

    for line in lines:

        words = line.split()

        if words:

            last_word = words[-1]

            last_word_syllables += len(sukhotin_syllabification(last_word))

    total_words = sum(len(line.split()) for line in lines)

    average_syllables_per_word = total_syllables / total_words if total_words > 0 else 0

    average_syllables_per_line = last_word_syllables / total_lines if total_lines > 0 else 0

    # Calculate the percentage using the rule of three

    syllable_percentage = (total_syllables * 100) / total_letters

    return total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line

def detect_syllables_in_file(filename):

    try:

        with open(filename, 'r', encoding='utf-8') as file:

            text = file.read()

            total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line = calculate_syllable_statistics(text)

            return total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line

    except FileNotFoundError:

        print(f"Error: File '{filename}' not found.")

        return None, None, None, None, None, None

if __name__ == "__main__":

    if len(sys.argv) != 2:

        print("Usage: python syllable_detection.py <text_file>")

    else:

        filename = sys.argv[1]

        total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line = detect_syllables_in_file(filename)

        if total_syllables is not None and total_letters is not None and syllable_percentage is not None and average_syllables_per_word is not None and last_word_syllables is not None and average_syllables_per_line is not None:

            print(f"Total letters and spaces: {total_letters}")

            print(f"Total syllables: {total_syllables}")

            print(f"Syllable percentage: {syllable_percentage:.9f}%")

            print(f"Average syllables per word: {average_syllables_per_word:.9f}")

            print(f"Total syllables in last words of lines: {last_word_syllables}")

            print(f"Average syllables per line ( last words ): {average_syllables_per_line:.9f}")

I don't know Python and I don't understand what you are trying to do, sorry.

Sukhotin's algorithm is quite straightforward and can be done by hand.
Of course this is not efficient for long texts, but it would allow to check your code with an example.

You are not allowed to view links. Register or Login to view.

(Sorry, I also don't know python).

Edit:
doing this will also not be the answer to all questions. It requires to have a good idea about what are the individual characters in the text.
For this, one should not use Eva. I would recommend something along the lines of the FSG alphabet, or this:
You are not allowed to view links. Register or Login to view.
These two options tend to generate quite similar statistics.

Pages: 1 2 3