Thanks Rene and Marco for the very constructive hints. I will think about the change of the alphabet. First of all to the intended syllable - detection:
The code presented by me with the implemented Sukhotin - algorithm is faulty ( unfortunately I am not a programmer either ). The output of the result into a text file looks like this:
CO,MME,DI,A
A,l,ig,h,ie,r
I,NF,ER,NO
I
vi,ta
os,cu,ra
sm,ar,r,it,a
du,ra
fo,rt,e
p,au,r,a
mo,rt,e
tr,ov,ai
sc,or,te
v’,in,tr,a,i
pu,nt,o
ab,b,an,do,n,ai
gi,un,to
va,lle
c,om,pu,nt,o
sp,al,l,e
p,i,an,et
ca,lle
qu,et,a
d,ur,at
pi,et,a
af,f,an,n,at,a
ri,va
gu,at,a
fu,ggi,va
......
The easiest solution to output the syllables correctly with Python is to use the library "You are not allowed to view links.
Register or
Login to view.". Here is the output:
COM,ME,DIA
Ali,ghie,ri
IN,FER,NO
I
vi,ta
oscu,ra
smar,ri,ta
du,ra
for,te
pau,ra
mor,te
tro,vai
scor,te
v’in,trai
pun,to
ab,ban,do,nai
giun,to
val,le
com,pun,to
spal,le
pia,ne,ta
cal,le
que,ta
du,ra,ta
pie,ta
af,fan,na,ta
ri,va
gua,ta
fug,gi,va
......
The output of the last words ( syllable-separated ) in all lines in the VMS is here ( for demonstration only ) :
You are not allowed to view links.
Register or
Login to view.
The implementation of "pyphen" in the code looks like this:
Code:
#!/usr/bin/env python
import re
import sys
import pyphen
def calculate_syllable_statistics(text, dic):
total_letters = len(re.sub(r'[^\w\s]', '', text)) # Count letters and spaces
syllables = dic.inserted(text).split('-')
total_syllables = len([s for s in syllables if s]) # Filter out empty syllables
lines = text.split('\n')
last_word_syllables = 0
total_lines = len(lines)
for line in lines:
words = line.split()
if words:
last_word = words[-1]
last_word_syllables += len([s for s in dic.inserted(last_word).split('-') if s])
total_words = sum(len(line.split()) for line in lines)
average_syllables_per_word = total_syllables / total_words if total_words > 0 else 0
average_syllables_per_line = last_word_syllables / total_lines if total_lines > 0 else 0
# Calculate the percentage using the rule of three
syllable_percentage = (total_syllables * 100) / total_letters
return total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line
def detect_syllables_in_file(filename):
print("Processing, please wait...") # Meldung hinzufügen
try:
with open(filename, 'r', encoding='utf-8') as file:
text = file.read()
dic = pyphen.Pyphen(lang='it_IT')
total_syllables, total_letters, syllable_percentage, average_syllables_per_word, last_word_syllables, average_syllables_per_line = calculate_syllable_statistics(text, dic)
# Ausgabe in der Konsole
print(f"Total letters and spaces: {total_letters}")
print(f"Total syllables: {total_syllables}")
print(f"Syllable percentage: {syllable_percentage:.9f}%")
print(f"Average syllables per word: {average_syllables_per_word:.9f}")
print(f"Total syllables in last words of lines: {last_word_syllables}")
print(f"Average syllables per line (last words): {average_syllables_per_line:.9f}")
# Erstellung der Ausgabedatei für die Silben der letzten Wörter
output_filename = "last_word_syllables.txt"
with open(output_filename, 'w', encoding='utf-8') as output_file:
lines = text.split('\n')
for line in lines:
words = line.split()
if words:
last_word = words[-1]
last_word_syllables = [s for s in dic.inserted(last_word).split('-') if s]
# Schreibe die Silben der letzten Wörter getrennt durch Kommas
last_word_syllables_str = ','.join(last_word_syllables)
output_file.write(last_word_syllables_str + '\n')
except FileNotFoundError:
print(f"Error: File '{filename}' not found.")
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python syllable_detection.py <text_file>")
else:
filename = sys.argv[1]
detect_syllables_in_file(filename)
This line needs to be adjusted for latin ( I am still searching in the documentation) :
dic = pyphen.Pyphen(lang=''it_IT)
LANGUANGE_ALIASES:
'af': 'af_ZA',
'bg': 'bg_BG',
'cs': 'cs_CZ',
'da': 'da_DK',
'de': 'de_DE',
'el': 'el_GR',
'en': 'en_US',
'en_Latn_GB': 'en_GB',
'en_Latn_US': 'en_US',
'et': 'et_EE',
'hr': 'hr_HR',
'hu': 'hu_HU',
'it': 'it_IT',
'lt': 'lt_LT',
'lv': 'lv_LV',
'nb': 'nb_NO',
'nl': 'nl_NL',
'nn': 'nn_NO',
'pl': 'pl_PL',
'pt': 'pt_PT',
'pt_Latn_BR': 'pt_BR',
'pt_Latn_PT': 'pt_PT',
'ro': 'ro_RO',
'ru': 'ru_RU',
'sk': 'sk_SK',
'sl': 'sl_SI',
'te': 'te_IN',
'uk': 'uk_UA',
'zu': 'zu_ZA',
You are not allowed to view links.
Register or
Login to view.
Edit: Unfortunately I can't find any support for Latin

I have contacted the developers.