bi3mw > 12-10-2023, 06:12 PM
Scarecrow > 13-10-2023, 09:53 AM
(12-10-2023, 06:12 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.In the VMS the lines very often end so nicely that one could assume that the last words of a line could be "filler words". If this is so, the last words thus only consist of "character salad",
bi3mw > 13-10-2023, 10:37 AM
(12-10-2023, 07:43 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(12-10-2023, 06:12 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Voynich Manuscript: (alphabet mapped to latin)
If you know where vowels are, good for you.
nablator > 13-10-2023, 10:44 AM
(13-10-2023, 09:53 AM)Scarecrow Wrote: You are not allowed to view links. Register or Login to view.As to some extent, the Shannon entropy measures the average information you get from all the events, if we even remove from that group of potential events "nulls" and "fillers", how much information is really left to the reader?The Shannon entropy h1 doesn't limit the amount of information much, as it is close to normal. The conditional entropy h2 is better as an upper bound but there is no way to know how much of the information is actually wasted by the encoding process. There could be any amount of nulls that we wouldn't be able to identify.
nablator > 13-10-2023, 11:19 AM
(13-10-2023, 10:37 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.When mapping the VMS, I simply replaced the letters with correspondingly frequent letters in Latin texts.
cat Voynich_full_clean01.txt | tr oehyacdiklrstnqpmfgxzv IEUTASRNMOCDLPBQGHFXYZ
bi3mw > 13-10-2023, 01:54 PM
(13-10-2023, 11:19 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.There is no way to count syllables unless you've actually deciphered the text.
bi3mw > 13-10-2023, 05:57 PM
(13-10-2023, 11:19 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.why not use Sukhotin's algorithm or some other algorithm to detect vowels?
#!/usr/bin/env python
from collections import defaultdict
from operator import itemgetter
import re
import sys
def classify_chars(text):
if not text:
return set(), set()
text = f"{text}{text[0]}"
matrix = defaultdict(int)
row_sums = defaultdict(int)
consonants = set()
vowels = set()
for first, second in zip(text, text[1:]):
consonants.add(first)
if first != second:
matrix[first, second] += 1
matrix[second, first] += 1
row_sums[first] += 1
row_sums[second] += 1
while any(row_sum > 0 for row_sum in row_sums.values()):
vowel = max(row_sums.items(), key=itemgetter(1))[0]
vowels.add(vowel)
consonants -= vowels
for consonant in consonants:
frequency = matrix.get((consonant, vowel), 0)
row_sums[consonant] -= 2 * frequency
row_sums = {key: value for key, value in row_sums.items() if key not in vowels}
return vowels, consonants
def sukhotin_syllabification(text):
vowels, consonants = classify_chars(text)
syllables = []
current_syllable = ""
for char in text:
current_syllable += char
if char in consonants:
syllables.append(current_syllable)
current_syllable = ""
return syllables
def calculate_syllable_statistics(text):
total_letters = len(re.sub(r'[^\w\s]', '', text)) # Count letters and spaces
total_syllables = len(sukhotin_syllabification(text))
lines = text.split('\n')
last_word_syllables = 0
total_lines = len(lines)
for line in lines:
words = line.split()
if words:
last_word = words[-1]
last_word_syllables += len(sukhotin_syllabification(last_word))
total_words = sum(len(line.split()) for line in lines)
average_syllables_per_word = total_syllables / total_words if total_words > 0 else 0
average_syllables_per_line = last_word_syllables / total_lines if total_lines > 0 else 0
# Calculate the percentage using the rule of three
syllable_percentage = (total_syllables * 100) / total_letters
return total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line
def detect_syllables_in_file(filename):
try:
with open(filename, 'r', encoding='utf-8') as file:
text = file.read()
total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line = calculate_syllable_statistics(text)
return total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line
except FileNotFoundError:
print(f"Error: File '{filename}' not found.")
return None, None, None, None, None, None
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python syllable_detection.py <text_file>")
else:
filename = sys.argv[1]
total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line = detect_syllables_in_file(filename)
if total_syllables is not None and total_letters is not None and syllable_percentage is not None and average_syllables_per_word is not None and last_word_syllables is not None and average_syllables_per_line is not None:
print(f"Total letters and spaces: {total_letters}")
print(f"Total syllables: {total_syllables}")
print(f"Syllable percentage: {syllable_percentage:.9f}%")
print(f"Average syllables per word: {average_syllables_per_word:.9f}")
print(f"Total syllables in last words of lines: {last_word_syllables}")
print(f"Average syllables per line ( last words ): {average_syllables_per_line:.9f}")
nablator > 13-10-2023, 07:37 PM
ReneZ > 14-10-2023, 02:02 AM