(31-08-2019, 11:43 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.What I'd like to be able to do is select a plant page and see which pages it "connects" to the most, vocabulary wise. I tried to do this manually but soon gave up, there are so many variables to keep track of.
That's what I was trying to do with my VMS Concordance. It took me years. Years.
I went through the entire manuscript and recorded which tokens occurred in which sections and with what frequency, and how they connected throughout the manuscript—a network of tokens. It was a HUGE project with minimal returns.
This is the summary page for a two-glyph token
lk. It documents where it occurs, how many times on a specific folio it occurs, the specific sections in which it occurs, variations that include glyphs before and after the token, comments on anomalies or unusual patterns. And that's just the summary page. It is also color-coded within the transcript, plus is set up as charts and frequency distribution lists. This is just a peephole of all the information I collected. I did this for every token in the manuscript:
PS, I didn't use folio numbers, I used mnemonics (whatever gave me a quick picture of the plant or pool folio in my head, it didn't have to be a correct ID, it preferably had to be funny or goofy to make it easier to remember). This allows me to scan through the tokens quickly and see where they occur. I cross-referenced each one with the color-coded transcript to get the folio numbers, so this is only one small part of all the data and only a small part of the visualization aids.
SP stands for "small-plants" section.
You can see, for example, that
lk occurs on 6 of the zodiac-figure pages, all of the pool pages, all of the rota on the rosettes folio
except rotum 3. It seems to be preferential to the pool pages (doesn't occur on as many plant pages as some tokens). It is rather sparse on the starred-text pages, but when it occurs, it occurs quite a few times on individual folios.
I haven't released this information because it became too big, too complex, and needs to be double-checked (which would probably take a year of dedicated attention), plus I'm still trying to figure out what it means. It combines several applications. This uses my own fonts, my own transcript, my own database, and a couple of other utilities. It's not something one can simply hand to someone else. Also, even if there were some practical way to do it... it's almost impossible to release raw data to an academic audience if it hasn't been double-checked—the responses tend to be scathing, and focused on small mistakes or omissions rather than all the good information that can potentially gleaned from it. Plus... the documentation that goes with it is more than 1100 pages long (not including the transcript and the database). Not exactly the sort of thing one can easily submit for publication.