The Voynich Ninja

Full Version: Categorizing the text-only pages
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
The VMS has several fairly well definable sections but there are some pages

especially the text only pages, well, slight misnomer; text pages plus other odd ones,

that are not quite so easy to place:

Code:
f1r:  Q1 ; Intro page

f57v:  Q8 ; (not all text, 4 people, circular text), LFD ascribes tentatively to Scribe1

f58r:  Q8 ; 3 stars,  sequential star points 6,7,8 ,  LFD:Scribe3

f58v:  Q8 ; 1 star at top, 6 pointed,                  LFD:Scribe3

f66r:  Q8 ; column of letters and words {der mus del page},    LFD:Scribe5

f76r:  Q13 ; text only ,  [column of letters] ,      LFD:Scribe2

fRos:  Q14 ; Rosette,  LFD tentatively ascribes to Scribe4

f85r1:  Q14 ; text only,  LFD:Scribe2

f86v6:  Q14 ; text only,  LFD:Scribe2

f86v5:  Q14 ; text only,  LFD:Scribe2

f86v3:  Q14 -(not all text), unfinished T-O map, LFD:Scribe2
I was hoping as a little project to ascribe all pages to definable sections
then look for patterns and links between those sections.

Here is my incomplete and provisional attempt:

.f1r -  pretty certain to be the Introductory page.

.f66r - most likely part of balneo , entire Quire seems to be a single unit.

.fRos, f85r1, f86v6, 86v5, f86v3 - I would call Quire14 a single unit , a section in its own right.

Leaving these as the most problematic to categorize:
f57v,  f58r,  f58v,  f66r.

Any suggestions as to how to classify/categorize these unruly pages would be welcome.
I would say that You are not allowed to view links. Register or Login to view. should be categorized similarly to f85r2 . But unfortunately I have no idea how exactly.
Hi b13mw,
Yes, i see what you mean, its got 4 figures and circular text, my first attempt classed all pages with circles together as 'Cosmological'
and grouped all the all-text pages together as a virtual section.

You are not allowed to view links. Register or Login to view. is an odd one , i would now hesitate to class it with the f85r2, for 2 ( admittedly not very strong ) reasons.

1. f85r2 has a sun at the centre, f86v4 has a central moon and f86v3 has a T-O map (most often representing the earth)
    as such they form a nice group.
  Whereas You are not allowed to view links. Register or Login to view. has a 'cloud'? at the centre

2.. Lisa Fagin Davies ascribes the 6 Q14 pages: f85r1, f85r2, f86v4, f86v6, f86v5, f86v3 to Scribe2
    and You are not allowed to view links. Register or Login to view. as possibly Scribe1

I think maybe You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. are an end to the Herbal section.

Maybe f58r, You are not allowed to view links. Register or Login to view. are an intro to the Recipes ( stars with text ) section .
  Pro: they both have stars-next-to-paragraphs.
  Con: physically they are so far apart.

I should add that when i first did this i was very sceptical of the page/folio/Quire order of the VMS but since Lisa Fagin Davies published
her results i now feel much more confident that the current order is close to how it was meant to be.

You are not allowed to view links. Register or Login to view.
Short interposed question: In which paper does Lisa Fagin Davies list the possible Scribes of the folios ?
Ah, the thread is here, think that the free download has expired Sad
You are not allowed to view links. Register or Login to view.
Project Muse
You are not allowed to view links. Register or Login to view.
I too think You are not allowed to view links. Register or Login to view. logically belongs close to f85r2 and f86v4 - the 4 figures with one turned away seem key in both the first and the last of these 3. Also there is a "cloud" at the center of the bottom left roundel of Ros.

I take your point about the scribes though and "logically" may not equate to "physically".
Wish I had room on the wall to put up lots of copies and stare at them...

It's a good project, Rob.
Hi RobGea,
I looked for similar pages using  an approach similar to what was discussed You are not allowed to view links. Register or Login to view. (but with a few differences).
  • I extracted page text from the Zandbergen-Landini transliteration file ZL_ivtff_1r.txt, ignoring uncertain spaces. I considered all text (paragraphs, labels, anything). EVA was converted to CUVA.
  • For each page, I computed the % of occurrences for 7 CUVA bigrams that are known to be indicative of different dialects: SO, DY, OL, ED, OR, HO, EO (EVA:cho, dy, ol, ed, or, qo, eo).
  • I used Euclidean distance between the 7-dimensional vectors for each page to find the closest match to each text-only page.

As always, it is possible I made errors in the process. Take everything with a grain of salt.
These are the results  (text-only page, best match, distance):
1r 42v 2.74259639758
57v 73r 2.4187029582
58r 89v1 2.44073534002
58v 44r 3.31252622631
66r 105r 1.86491393903
76r 82r 2.28714013563
Ros 105v 2.5224805252
85r1 112r 1.81561862736
86v6 107v 2.05433857969
86v5 55r 1.12671469326
86v3 94v 0.966269113653


Obviously, some pages do not contain much text, so the results are not guaranteed to always be statistically relevant. In particular 44r, 73r and 94v have less than 100 words each.

The attached plots illustrate the results. I simply used PCA to reduce the number of dimensions to 2, this has the effect that the closest match often is not the closest dot in the plots. Green dots are the text-only pages and orange dots the best matches. I could not think of a simple way to connect the couples, but I think the plots are readable. The second plot zooms on the central area where all the relevant dots appear.
I agree You are not allowed to view links. Register or Login to view. is likely the intro of at least a plant section, given the plant on f1v. 

I agree that quires 13 and 14 are single units. I do not agree that the page order and/or folding is correct or original. 

Quire 14 appears to have been refolded. 
[Image: rosette-folding.jpg][Image: UcdkcAjliuJQk0MbHAHKCTwKPYpgPN8JJk39ZuMC...KAxsHcRtmQ][Image: fRos_crd.jpg]
You are not allowed to view links. Register or Login to view.

Right now, if you took the quire out on its own, f85r1 text page is the cover, it opens to the 4 people, with f86v5 text page beside it, folds out to the 3 pages with the 4 people still showing as the first page, and the back page is the chicken scratches.
[Image: f085r1_crd.jpg]   [Image: f085r2_crd.jpg][Image: f086v5_crd.jpg]   [Image: f085r2_crd.jpg][Image: f086v4_crd.jpg][Image: f086v6_crd.jpg]   [Image: f086v3_crd.jpg]
If you fold it the other way, with potential binding on the green line, the chicken scratches are no longer the back of the quire, they are on the front. Then it opens to 2 text pages, f85r1 and f86v6, folds out to the same 3 pages we have now, and ends with back page as f86v5 text.
scratches.
[Image: f086v3_crd.jpg]   [Image: f085r1_crd.jpg][Image: f086v6_crd.jpg]   [Image: f085r2_crd.jpg][Image: f086v4_crd.jpg][font=Tahoma, Verdana, Arial, sans-serif][Image: f086v6_crd.jpg]   [/font][Image: f086v5_crd.jpg]

It seems to make more sense to open the foldout to see the pictures next to the text you already read, kind of a surprise addition, than to open the text page to see another picture  next to the one you already saw, along with more text. So to me that is a point in favour of the green fold as original, plus the damage and soiling there could indicate this as well, although i suppose one could argue that came from being the foldout fold on the other side the way it is folded now.

There are several ideas about quire 13 ordering. Mine puts You are not allowed to view links. Register or Login to view. as the cover of the quire. 76, 80, 84, 77, 78, 81, 82, 75, 79, 83.

Quire 8 looks like leftovers from having tried to sort everything. I think i read once that someone (Nick?) thought the missing pages might have been quire 14...left space for it, but then ended up putting it after quire 13 (which i believe is related in topic) and numbering it there? 

57 is attached to 66. The bifolio could go 2 ways. 57 first or 66 first

[Image: f057r_crd.jpg]   [Image: f057v_crd.jpg][Image: f066r_crd.jpg]   [Image: f066v_crd.jpg]

The first way, it would make either random text/mandala pages within plant pages, or a central text/mandala combo within plant pages, as shown, ie as the center of a quire.

[Image: f066r_crd.jpg]   [Image: f066v_crd.jpg][Image: f057r_crd.jpg]   [Image: f057v_crd.jpg]

The flipped way, it becomes a potential cover for a plants quire, with text at the front and mandala in the back. I think this seems more likely somehow. With the mandala page at the back, it could be a lead-in to other mandala-containing quires, maybe even right before quire 14, i think quire 13 is better placed after. Or maybe in front of the cosmos ones, or the zodiac...

58 and 65 make a bifolio, but one page (both sides) is text, the other plants, so maybe a 2 page intro?
One thing to also consider in relation to this is that folio You are not allowed to view links. Register or Login to view. has a big-plant folio on the obverse side.
I have been experimenting some more with page-similarity measures. I considered a few word-based options:
  • lexicon-similarity: only considering word-types, the ratio between the intersection and the union of word-types in two pages
  • word-histogram difference: Bhattacharyya difference between word histograms for the two pages (this is of course based on word tokens)
  • token-overlap: the number of tokens that appear in both pages divided by the average number of tokens in the two pages

As a measure of "quality" I counted how frequently the verso of a page is found among the five best matches for the recto of the page. The idea is that there are good reasons to assume that "r" and "v" were written consecutively and they often appear to share similar illustrations, dialects and the same scribe. Of course there is no guarantee that this measure is meaningful: I used it just because it seems to make sense to me and it is simple to check.
According to this criterion, the three word-based methods perform better than the bigram-based method I previously posted. Among the three, token-overlap appears to be the best (but histogram is quite close and the lexicon method is not far either).

An example of how the method works:
PageA: one one two three
PageB: one one two two four five

The two pages share three tokens (one, one, two), the order is not significant. Average page-length is 5.
3/5=0.6
This measure falls in the 0-1 range, with 0 corresponding to no overlap between words and 1 to exactly the same word-tokens (possibly in a different order).

These are the measures I found for text-only pages (three best matches for each page, best match first):
1r 8v 0.266 101r 0.222 38v 0.215
57v 66r 0.193 49v 0.161 67r1 0.114
58r 58v 0.214 70r2 0.193 86v5 0.191
58v 107v 0.251 86v6 0.245 86v5 0.224
66r 81v 0.254 78r 0.230 75v 0.224
76r 116r 0.425 75r 0.395 103r 0.388
Ros 85r1 0.184 86v5 0.176 86v6 0.167
85r1 86v6 0.281 84r 0.265 78r 0.262
86v6 86v5 0.315 107v 0.290 85r1 0.281
86v5 86v6 0.315 86v3 0.270 107r 0.266
86v3 86v5 0.270 106v 0.259 106r 0.257

The similarity between 57v and 66r / 49v is largely due the lists of single-glyphs words.
Quire 14 appears to be quite coherent in terms of shared words (though the absolute values for Ros are low, maybe unsurprisingly since the diagram looks so "special").
Pages: 1 2 3