That's exactly what Colin and I have done with Latent Semantic Analysis. More here, and article coming soon:
You are not allowed to view links.
Register or
Login to view.
The evidence from Colin's analytics shows exactly what you're asking - a very strong textual correlation across conjoint bifolia in both the balneology and stars sections. We did NOT find that correlation across conjoint bifolia in the herbal section, which suggests that, as long suspected, each herbal page is it's own semantic unit.
In other words, 104v and 115r (conjoint) are more closely related than, say, 104v and 105r (consecutive).