In the attached csv file, the first column labels lines in Takeshi Takahashi's transcription assigning each line to a paragraph. I simply used the "end of paragraph" = marker from the transcription as a paragraph separator. Paragraph labels are given as folio+progressive_number_in_folio (from f103r_par01 to f116r_par08).
For some reason, line 29 of You are not allowed to view links.
Register or
Login to view. was not transcribed by Takahashi: I include Stolfi's transcription instead.
The matching with stars seems to be reasonably good, but of course not perfect. For instance:
- f105r: there are 10 stars in the text, and 12 “paragraphs” in the transcription. This is because there is a right aligned line at the height of the third star and a centrally aligned line at the very end of the page: both this lines are counted as “paragraphs”;
- f105v: 10 stars and 10 paragraphs;
- f111v: there are 19 stars, but only 7 paragraphs are marked in the transcription: the first half of the page seems to be a single long paragraph of 25 lines. 10 lines appear at the side of this long paragraph (the last two are joined together by something similar to the common star “tails”).
Please, take this with care since (as always) there could be mistakes on my side.
If we decide that this kind of numbering makes sense, this file (after verification of its correctness) can be used to compute paragraph statistics (e.g. a matrix of the number of common words between any two paragraphs).
Or a similar file could be produced with a more sophisticated system.
[I had to add a .txt suffix to the file name in order to be able to attach it]