-
RE: Stylometric analysis of the VMS with Stylo (R)
bi3mw > 05-11-2023, 04:22 PM
If anyone wants to do their own research, here are the necessary text files. I downloaded the HTM files from Takeshi Takahashi`s website (wget) and then converted them (with html2text) to text files. If you want to work with Stylo you have to copy the files into a folder called "corpus" and specify the parent folder as working directory.
all_pages.zip (Size: 131.57 KB / Downloads: 14)
edit: When converting to text files, additional line breaks were sometimes inserted. However, this is irrelevant for an analysis. -
RE: Stylometric analysis of the VMS with Stylo (R)
bi3mw > 06-11-2023, 10:25 PM
On closer inspection I noticed that there are some comments in the Takahashi files. I will remove them and post the files again as soon as I have have time.
Or does someone have a clean corpus he / she wants to share here ? -
RE: Stylometric analysis of the VMS with Stylo (R)
ReneZ > 07-11-2023, 09:54 AM
(06-11-2023, 10:25 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Or does someone have a clean corpus he / she wants to share here ?
That would not be difficult. In what form would you like to have this?
- One file per page? (zipped)
- Which alphabet (Basic Eva?)
- Only paragraph text, or also labels etc.?
- Any preferred transliteration file? -
RE: Stylometric analysis of the VMS with Stylo (R)
bi3mw > 07-11-2023, 02:25 PM
(07-11-2023, 09:54 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.That would not be difficult. In what form would you like to have this?
- One file per page (zipped)
- Basic EVA
- Just plain text without comments, special characters, labels, etc. ( if possible, use spaces as word separators )
- Transliteration does not matter.
Thanks in advance -
RE: Stylometric analysis of the VMS with Stylo (R)
bi3mw > 08-11-2023, 06:11 AM
(08-11-2023, 03:48 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Just to be sure: should circular text, radial text be included?
Yes, this kind of text should be included.
(08-11-2023, 03:48 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.With labels I meant star labels, zodiac labels etc...
I see, yes with labels. -
RE: Stylometric analysis of the VMS with Stylo (R)
ReneZ > 08-11-2023, 08:10 AM
The zip file should be available via this link:
You are not allowed to view links. Register or Login to view.
I spent (wasted) some time trying to set up an ftp repository, but was not yet successful.
The file includes 227 short text files. The file names may appear mysterious at first.
I will add some explanations in the transliteration thread.
Since you were successful, recently, in setting up and running bitrans, you may want to do the same with ivtt, and then you will be able to repeat this yourself.
Edit: this was based on the RF transliteration, file: RF1a-n.txt (-n stands for native alphabet, i.e. Eva). -
RE: Stylometric analysis of the VMS with Stylo (R)
bi3mw > 10-11-2023, 07:33 PM
Here are the plots ( PCA and cluster ) for the different scribes:
A=Scribe 1 (red)
B=Scribe 2 (green)
C=Scribe 3 (blue)
D=Scribe 4 (black)
E=Scribe 5 (yellow)
PCA
Enlarged
VMS_all_pages_PCA_100_MFWs_Culled_0__PCA__001.pdf (Size: 7.41 KB / Downloads: 21)
Cluster
Enlarged
VMS_all_pages_CA_100_MFWs_Culled_0__Classic Delta__001.pdf (Size: 13.26 KB / Downloads: 15)
-
RE: Stylometric analysis of the VMS with Stylo (R)
MarcoP > 10-11-2023, 09:18 PM
I don't know how robust the PCA diagram can be, since many pages don't contain much text, but I would say this shows the large variance in "language" even for individual scribes. This is particularly clear for Scribe2's Quire13 (high Y=PC2) vs Herbal (low Y) pages; it's also noteworthy that Scribe3's Q20 basically falls between the two clusters by Scribe2. -
RE: Stylometric analysis of the VMS with Stylo (R)
bi3mw > 12-11-2023, 12:44 PM
You can see that scribe 1 falls largely in Currier language "A" while the remaining scribes are spread across "B" and Unknown.