oshfdk > 13-05-2026, 04:29 PM
(13-05-2026, 03:25 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.I don‘t see how those tables allow the diagnose of something „artificial“.
Someone may assume that k and t
- could be consonants of a natural language
- are predecessed each by one or two out of the most frequent or all vowels of a natural language
- which may themselves are following 1-2 consonants at the beginning position
- or are predecessed by one or two consonants out of a limited, language-appropriate set
- and followed by the „second“ syllable in an equivalent structure.
Stefan Wirtz_2 > 13-05-2026, 07:32 PM
(13-05-2026, 04:29 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I don't think this will produce a similar distribution.
nintus > 13-05-2026, 07:41 PM
oshfdk > 13-05-2026, 07:57 PM
(13-05-2026, 07:32 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.For what? What could you prove with this?
Jorge_Stolfi > 13-05-2026, 10:29 PM
(13-05-2026, 07:57 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.This can show that the way prefixes and suffixes in the Voynich Manuscript behave does not occur in natural languages. I think this may be true for any normal representation of any natural language in the history of humankind.
#! /usr/bin/python3
import os, sys, re;
from sys import stdin as inp, stdout as out, stderr as err
def main():
inp.reconfigure(encoding="utf-8")
out.reconfigure(encoding="utf-8")
out.write("# Created by {convert_pinyn_to_numeric.py} - do not edit.\n")
out.write("# -*- coding: utf-8 -*-\n")
pinyin_vows = r"āēīōūǖ" + r"àèìòùǜ" + r"áéíóúǘ" + r"ǎěǐǒǔǚ"
unmark_vows = r"aeiouü" * 4
ntones_nums = r"111111" + r"444444" + r"222222" + r"333333"
pats = []
subs = []
nv = len(pinyin_vows)
for i in range (nv):
pats.append(re.compile(pinyin_vows[i]))
subs.append(unmark_vows[i] + ntones_nums[i])
for line in inp:
line = line.strip()
if re.match(r"[ ]*([#]|$)", line):
continue
else:
m = re.fullmatch(r"([<][a-z][0-9.]+[>])[ ]*(.*)", line)
if m != None:
loc = m.group(1)
line = m.group(2)
else:
loc = ""
line = re.sub(r"[\]\[.,;:()]", " ", line)
words = line.split()
out.write(loc);
for word in words:
word = word.lower()
for i in range (nv):
word = re.sub(pats[i], subs[i], word)
word = re.sub(r"ü", "uu", word)
word = re.sub(r"^(.*)([0-9])(.*)$", r"\1\3\2", word)
if re.fullmatch(r"[a-z]+", word): word += "5"
out.write(" "); out.write(word)
out.write("\n")
return
# ----------------------------------------------------------------------
main()oshfdk > 13-05-2026, 10:59 PM
(13-05-2026, 10:29 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Have you tested the East Asian monosyllabic languages - Chinese,Vietnamese, Tibetan, Thai, Burmese, Lao, Khmer, ...
Jorge_Stolfi > 13-05-2026, 11:33 PM
(13-05-2026, 10:59 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.As I said, I haven't tested anything yet, for the test I need a suggestion of the central character and the word separation algorithm.
oshfdk > 14-05-2026, 10:06 AM
(13-05-2026, 11:33 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.For the core, you could use the vowel that has a tone diacritic.
nablator > 14-05-2026, 10:59 AM
(14-05-2026, 10:06 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Here it is. I repeated @dashstofsk computation as described in the original post. This is what I would expect from a natural language - a lot of underrepresented combinations (and a few hugely overrepresented). Nothing like the Voynich MS chart for which the upper left corner mostly consists of numbers close to one.
oshfdk > 14-05-2026, 11:10 AM
(14-05-2026, 10:59 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.For comparisons, a metric for the bumpiness of the structural decomposition could be standard deviation of the distribution of all cells in the table, without the empty cells of course.
There could be a 2nd metric for how asymmetric the distribution is: something like the (sum for all (i,j) of abs(cell(i,j) - cell(j,i))) / (sum for all (i,j) of abs(cell(i,j)) + abs(cell(j,i))).