quimqu > 11 hours ago
Jorge_Stolfi > 9 hours ago
quimqu > 7 hours ago
(9 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(11 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.The UMAP projection below
Could you please explain what are the two coordinates?
All the best, --stolfi
(Since now there is another "jorge" posting to this forum, I should sign "stolfi" from now on...)
oshfdk > 7 hours ago
(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.UMAP is a way to turn complex data into a 2D picture so we can see patterns. In this case, each paragraph is first turned into a set of numbers (a vector) that describe how often different words appear together, sort of its fingerprint. Those fingerprints live in a huge mathematical space (maybe hundreds of dimensions).
Jorge_Stolfi > 7 hours ago
(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view. In other words, the text seems to follow a very regular pattern, where nearby paragraphs (in terms of UMAP) share nearly the same word combinations. This is amazing.
quimqu > 7 hours ago
(7 hours ago)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.UMAP is a way to turn complex data into a 2D picture so we can see patterns. In this case, each paragraph is first turned into a set of numbers (a vector) that describe how often different words appear together, sort of its fingerprint. Those fingerprints live in a huge mathematical space (maybe hundreds of dimensions).
I've only heard a bit about UMAP and never had chance to use it or understand what it does exactly, but some cursory reading on the internet suggest that UMAP tries to preserve topology of the underlying set and that its low dimensional axes are usually meaningless. Is this correct? I'm not sure how to interpret these graphs.
(7 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view. In other words, the text seems to follow a very regular pattern, where nearby paragraphs (in terms of UMAP) share nearly the same word combinations. This is amazing.
Indeed!
However the sequence cannot be the current page sequence as defined by the folio numbers. That must be the "true" page sequence.
Many years ago I created You are not allowed to view links. Register or Login to view. but connected the dots in the "official" page order, and the result was a rat's nest for each section.
All the best, --stolfi
quimqu > 6 hours ago
(7 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view. In other words, the text seems to follow a very regular pattern, where nearby paragraphs (in terms of UMAP) share nearly the same word combinations. This is amazing.
Indeed!
However the sequence cannot be the current page sequence as defined by the folio numbers. That must be the "true" page sequence.
Many years ago I created You are not allowed to view links. Register or Login to view. but connected the dots in the "official" page order, and the result was a rat's nest for each section.
All the best, --stolfi
oshfdk > 6 hours ago
(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.That’s why, in natural language data, you normally get cloud-like clusters (similar topics grouped together), but in the Voynich, the result is very different: the points form continuous lines. That suggests that each paragraph is most similar to the next and previous one, like a chain or trajectory.
quimqu > 6 hours ago
(6 hours ago)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(7 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.That’s why, in natural language data, you normally get cloud-like clusters (similar topics grouped together), but in the Voynich, the result is very different: the points form continuous lines. That suggests that each paragraph is most similar to the next and previous one, like a chain or trajectory.
So, if this result is correct, it can imply that each paragraph is somehow built using the previous paragraph? Something like self-citation or similar methods?
oshfdk > 6 hours ago
(6 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.Not exatcly. I need to understand anf find the sequence. Note that the UMAPS shown are not directly calculated with the original words, but wit the topic distribution of the paragraph. We have two topics, but it seems that the distributions are moving from a paragraph to the other.