The Voynich Ninja

Full Version: Binomial distribution in VMS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
Question: What if the words in the VMS have been shortened or expanded with fill characters (here X) so that they end up corresponding to a binomial distribution (with whatever system)?

Here is a line from a comparative text ( regimen sanitatis ):

Input example:
Capitulum primum De regulis sumptis ex parte elementorum nostro corpori occurrentium ab extra

Output example:
Ca primum DeXXXXX regul sump exXXXX parteX elemento nost corpo occu abXXXX extra

Distribution in the entire, modified text ( regimen sanitatis ):

[attachment=9032]

Code:
import sys
import numpy as np
from scipy.special import comb

def calculate_binomial_distribution(n, max_length):
    """Berechnet eine Binomialverteilung für Wortlängen."""
    k_values = np.arange(1, max_length + 1)
    # Berechne die Binomialverteilung für die Formel choose(9, k-1) / 2^9
    probabilities = [comb(n, k-1) / (2 ** n) for k in k_values]
 
    # Normiere die Verteilung
    probabilities /= np.sum(probabilities)
 
    return probabilities

def adjust_word_lengths(words, target_distribution):
    """Passt die Wortlängen an die Zielverteilung an, indem Wörter gekürzt oder verlängert werden."""
    adjusted_words = []
    max_word_length = len(target_distribution)
 
    length_bins = np.arange(1, max_word_length + 1)
    length_probs = np.array(target_distribution)
 
    for word in words:
        current_length = len(word)
        target_length = np.random.choice(length_bins, p=length_probs)
     
        # Falls die Zielwortlänge kleiner ist, kürze das Wort
        if target_length < current_length:
            adjusted_words.append(word[:target_length])
        # Falls die Zielwortlänge größer ist, verlängere das Wort mit 'X'
        elif target_length > current_length:
            adjusted_words.append(word + 'X' * (target_length - current_length))
        else:
            adjusted_words.append(word)  # Wenn die Länge passt, bleibt das Wort unverändert
 
    return adjusted_words

def process_text(file_path):
    """Liest den Text aus der Datei, passt die Wortlängen an und gibt den neuen Text zurück."""
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            lines = file.readlines()
    except FileNotFoundError:
        print(f"Error: The file {file_path} was not found.")
        sys.exit(1)
    except IOError as e:
        print(f"Error: An error occurred while reading the file: {e}")
        sys.exit(1)
 
    max_word_length = 15  # Maximale Wortlänge festlegen
    n = 9  # Anzahl der Würfe für die Binomialverteilung
 
    # Berechne die Binomialverteilung
    target_distribution = calculate_binomial_distribution(n, max_word_length)
 
    adjusted_lines = []
    for line in lines:
        words = line.split()
        adjusted_words = adjust_word_lengths(words, target_distribution)
        adjusted_lines.append(' '.join(adjusted_words))
 
    return '\n'.join(adjusted_lines)

def main():
    if len(sys.argv) != 2:
        print("Usage: python adjust_word_length.py <filename>")
        sys.exit(1)
 
    file_path = sys.argv[1]
    new_text = process_text(file_path)
    print("Modified text:")
    print(new_text)

if __name__ == "__main__":
    main()

Quote:ChatGPT

One could design a volvelle that aims to change the word lengths of a text according to a specific distribution. This could be done through the use of rotating disks, each giving specific instructions on how words should be edited.

Example of a volvelle for word length manipulation

Here is a hypothetical description of what a volvelle could look like for this task:

    Circle 1: Defines the possible word lengths from 1 to 15 (depending on the maximum word length).
    Circle 2: Gives the probability for each word length based on a certain distribution (e.g. binomial distribution).
    Circle 3: Instructions for shortening or expanding words to achieve the target distribution.

Use of a volvelle

1. user enters the text.
2. volvelle is rotated to obtain the rules for shortening or expanding words.
3. instructions are applied to the text.
And, by replacing the endings, what will that give?

Capitul9 prim9 De regul9 sumpt9 ex parte elemeng nostro corpori occurrenti9 ab extra
(17-08-2024, 12:14 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.And, by replacing the endings, what will that give?

Capitul9 prim9 De regul9 sumpt9 ex parte elemeng nostro corpori occurrenti9 ab extra

Well, this would possibly lead to a shortening of the words, but there would be no binomial distribution.

BTW: I would assume that the lengthened words might not have been filled with the same letters (e.g. XXXX), but with phrases (e.g. daiin ? ).
(17-08-2024, 12:28 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Well, this would possibly lead to a shortening of the words, but there would be no binomial distribution.

Are you assuming or have you calculated? What does the distribution look like?
(17-08-2024, 12:42 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.Are you assuming or have you calculated? What does the distribution look like?

I strongly assume so, but if you have a longer text with the abbreviations I can plot that too.
I could not recall the details of the Stolfis' binomial stuff, so i found it and thought to add it here.

Stolfi, "On the VMS Word Length Distribution", Feb 2002
You are not allowed to view links. Register or Login to view.

Julian Bunn.Computational attacks on the Voynich Manuscript, "Word Length Distributions", Aug 2021
You are not allowed to view links. Register or Login to view.

-- Just a thought, does the binomial distribution hold per folio or even per section ?
(17-08-2024, 12:33 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Question: What if the words in the VMS have been shortened or expanded with fill characters (here X) so that they end up corresponding to a binomial distribution (with whatever system)?

Could it work like this,
First you need a system to encode a language into partial-voynichese,
then you apply this secondary system to achieve binomial distribution that results in full voynichese.

This secondary system must have rules for the shortening - lengthening process,
for shortening, lets just say you chop off what you do not need.
but for additional glyphs you may need to add up to like 7 additional glyphs in rare cases,
so you would need rules to generate the glyphs for the glyph-bins 1-7.
Explanation of the code above:

    calculate_binomial_distribution:
        This function calculates the binomial distribution for the word lengths based on the formula choose(9,k-1)/2⁹. The distribution is normalized to a sum of 1.

    adjust_word_lengths:
        This function adjusts the word lengths of the words from the text to the target distribution.
        If the target word length is shorter, the word is shortened accordingly.
        If the target word length is longer, the word is lengthened by appending "X".

    process_text:
        This function reads the text from a file, adjusts the word lengths using adjust_word_lengths and returns the new text.

    main:
        This function expects the path to a text file as a parameter and outputs the modified text with adjusted word lengths.

[attachment=9034]
[attachment=9036]
(17-08-2024, 03:35 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.-- Just a thought, does the binomial distribution hold per folio or even per section ?

I have picked out the balneological section ( Quire 13 , from You are not allowed to view links. Register or Login to view. to You are not allowed to view links. Register or Login to view. ) as a test. The binomial distribution is clearly recognizable here.
[attachment=9037]

Here for comparison folio 75r:
[attachment=9038]

And here the complete VMS:
[attachment=9039]
Thanks for the graphs bi3mw, interesting that a section shows a binomial distribution and a folio subset does not.

Honestly, i dont know what to make of this binomial distribution business.

1. Statistical corollary of the method used to generate voynichese.
2. As postulated in post#1, it was done deliberately.
3. Coincidence.
4. Other ?
    .
Pages: 1 2 3 4 5 6 7