The Voynich Ninja

Full Version: The Shape of Words - topological structure in natural language data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Found this Stephen Fitz's (from Keio university, Tokyo) paper yesterday.

This paper presents a novel method, based on the ideas from algebraic topology, for the analysis of raw natural language text. The paper introduces the notion of a word manifold - a simplicial complex, whose topology encodes grammatical structure expressed by the corpus. Results of experiments with a variety of natural and synthetic languages are presented, showing that the homotopy type of the word manifold is influenced by linguistic structure. 

The analysis includes a new approach to the Voynich Manuscript - an unsolved puzzle in corpus linguistics. In contrast to existing topological data analysis approaches, we do not rely on the apparatus of persistent homology. Instead, we develop a method of generating topological structure directly from strings of words.

You are not allowed to view links. Register or Login to view.

These results show that the topology of the word manifold is influenced by linguistic structure expressed by the corpus. Furthermore, we can interpret dimensions of the word manifold by comparing natural and synthetic data.

New?
Quote:The manifold with the closest match in dimension 1 with the Voynich Manuscript is Russian, which would align with the known history of the manuscript, providing support to several mainstream theories about its possible origin.

This is interesting. However, I would not say that the assumption that the VMS could be of Russian origin is a "mainstream theory."
No articles, like Russian, that's why (maybe) : the article doesn't say.
(13-01-2023, 02:30 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.No articles, like Russian, that's why.

Or like Latin.  Which is much more likely given other aspects of what languages were used at the time of the carbon dating for manuscript writing, particularly from the likeliest geographic sources of the identifiable images.

I have not yet read the paper,  but missing this pretty basic observation in order to latch onto a relatively unlikely language such as Russian make me go into this with a very healthy dose of skepticism.

When you hear hoofbeats outside your door, one should think “horse,” not “zebra.”