[Blog Post] Voynich Manuscript: word vectors and t-SNE visualization of some patterns

[Blog Post] Voynich Manuscript: word vectors and t-SNE visualization of some patterns - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Blog Post] Voynich Manuscript: word vectors and t-SNE visualization of some patterns (/thread-138.html)

Voynich Manuscript: word vectors and t-SNE visualization of some patterns - voynichbombe - 19-01-2016

You are not allowed to view links. Register or Login to view.

RE: Voynich Manuscript: word vectors and t-SNE visualization of some patterns - Anton - 19-01-2016

(Moved to News and changed the thread name to the article title.)

Looks potentially interesting but the blogpost itself sadly omits the methodology and the targets. Something's going on there, but, from the post, we don't know what Smile

RE: Voynich Manuscript: word vectors and t-SNE visualization of some patterns - voynichbombe - 19-01-2016

I guess it would need more reading into the blog owners targets and methods. Machine learning is no piece of cake Angel

I'll try to invite the writer to take part in the discussion.

RE: Voynich Manuscript: word vectors and t-SNE visualization of some patterns - voynichbombe - 24-01-2016

In the meantime Nick already took on the topic on his blog (and strongly rests his tounge in cheek about it).

I thought I'd nevertheless give my account of understanding on the methodology (and it's fallacies, imho) mentioned in the blog post. It should be noted that it is more a rough sketch of what could be done than a conclusive study. One will notice some learned critique in the comments.
Taking my meager learnings of AI/neural network training from waay back, I'd describe it as follows:
At first a shallow (hence "flat") artifical neural network is trained to find vectors (distance and direction) for sets of certain words and thus trying to uncover contextual relationships in a language unknown. This should work quite well, but largely depends on "normalization" of the input text. For example one would only choose fitting parts of the text which do not contain _any ambiguities.
It is already complicated here, because there are many assumptions that tune the output:

- EVA transcription
- natural language(s) which are culturally extinct
- the copying scribes were already out of knowledge of either the whole of the language, or a lot of it's details, hence "!" and "*" characters stand for unrecognized/ambiguous glyphs and lines containing it have to be ignored.

It gets even more assumptious when taking the next stage, "machine translation", which should also work quite well, given one already knows the meaning of some of the words in the unknown language. In this case it's the "star names" the author pretends (note he uses the term deliberately) to know.

So much so far from my side. I intend to consultate a friend of mine who is into autonomuous robotics and everything that comes along.. I also tried to invite the author, no luck so far.

While there might be only show stoppers for some of you, I think the approach is very interesting and should be investigated by more knowledgable peers.

RE: Voynich Manuscript: word vectors and t-SNE visualization of some patterns - Anton - 25-01-2016

I am acquainted with the ANN basics, the problem is not with that, but with that we don't find in the article some very important points such as:

- what are the authors' objectives
- what are the steps intended to reach them
- what are the limitations of the proposed approach (iif any)
- what were the steps actually taken
- what the results mean, or are supposed to mean (I really don't understand that cluster pic, it is presented that the two clusters of vords result somehow, but what are they for or what do they tell us - it's not clear)

As Rene pointed elsewhere, it is not clear what were the input text blocks and whether input feeding was consistent.

It would be a very interesting result if, say, based on analysis of the VMS text except f68r1 and f68r2, we would obtain the suggestion (from the ANN algorythm) that all vords serving as star labels in f68r1 and f68r2 are "similar". Is this the result obtained in the paper? I don't know. It's excessively brief and a bit messy. We just need more detail and incorporation of the "objective -> result" discourse.