Hi everyone.
I’m Joaquim Quadrada—Quim for short (that-s the reason of my nickname
quimqu). I’m 51, from Barcelona, a native Catalan speaker. I formerly was a mechanical engineer and moved into data science two years ago, finishing a master that lasted 3 years. Even if during the master we worked the linguistic part of data science,
I’m not a linguist. I open this thread because I think graph methods can serve the Voynich community as a practical, transparent way to poke at structure and test ideas, and Graphs are quite a new area to investigate.
By “graph” we understand a set of points and lines. The points (nodes) can be words or bits of metadata such as “section”, “Currier hand”, “writing hand”, or “position in line”. The lines (edges) connect things that co-occur or belong together. We can also add information to the edges, direction, and a lot of things that creates the relationship between the nodes. Once you cast the transliteration as one or more graphs (and yes, we can join graphs), you can ask graph-native questions: which links are unexpectedly strong once chance is controlled, which words act as bridges between otherwise separate clusters, which small patterns (A→B→C chains or tight triangles) recur at line starts, how closely word communities align with metadata nodes (sections, hands, line-position), and whether any directed paths repeat often enough to count as reusable templates. None of this decides whether the text is language or cipher, but it can highlight stable regularities, quantify them, and rank hypotheses for experts to examine.
I’d like to open a brainstorming thread to push ideas that are worth trying on top of these graphs.
As a concrete example, I started with the first lines of paragraphs (what I call L1) and compared them to all other lines. Building co-occurrence graphs, the L1 network consistently comes out denser and more tightly clustered. When I switch to a small sliding window (a “micro-syntax” view), the L1 graph splits into more distinct communities, which is what you’d expect if opening material uses more fixed combinations. I also looked for actual line-start bigrams that repeat. A couple of pairs do appear at L1 and not elsewhere, but the evidence is thin; they behave more like soft habits than hard formulas. To see broader context, I built a bipartite graph that connects words to their metadata (position, section, hand). Projecting this graph shows a clear cohort of words that lean toward L1, and it also shows which sections and Currier hands share that opening behavior. All of this is descriptive and testable; nothing here presumes a linguistic reading or a cipher.
This, for example, is the graf for the first lines of the paragraphs:
![[Image: mZs74BP.png]](https://i.imgur.com/mZs74BP.png)
To illustrate what I mean by
opening units at L1, here’s a table with the two bigrams that pass the defined thresholds: they have positive ΔPMI versus the rest of the text (ΔPMI > 0 means the bigram is
more tightly associated in L1 than in the other lines) and they always occur at the start of a line. I’ve added short KWIC (Key Word in Context) snippets for context.
Bigram | Count | ΔPMI | Line-start | KWIC (examples) |
polor sheedy | 2 | 6.234 | 100% | f112v: [polor sheedy] … sheedar … | f115v: [polor sheedy] … qokechy … |
tshor shey | 2 | 4.799 | 100% | f15r: [tshor shey] … chtols … daiin | f53v: [tshor shey] … oltshey … qopchy |
What I wish now are linguistics eyes and instincts. If you can suggest a better unit than whole EVA “words” (for example, splitting gallows and benches, or collapsing known allographs), I will rebuild the graphs and quantify how the patterns change. If you have candidates for discourse-like items that might prefer line starts, I can measure their positional bias, role in the network, and the contexts they pull in. If there are section or hand contrasts you care about, I can compare their “profiles” in the bipartite projection and report which differences are solid under shuffles and which are noise.
I’ll keep my end practical: small, readable tables; KWIC lines for anything we flag; and ready-to-open graph files. If this sounds useful, I’ll post the current outputs and code, and then iterate with your guidance on segmentation, normalization, and targets worth testing.
My only goal is to make a common playground where your expertise drives what we measure.