27-09-2016, 04:12 PM
Hi everyone,
[font=arial, sans-serif]I have been working on a project for the Voynich Manuscript based on the well-known 'interlinear' file. I have built a RESTful API on top of the interlinear and am presenting it here: [/font]
[font=arial, sans-serif]You are not allowed to view links. Register or Login to view.[/font]
[font=arial, sans-serif]Essentially it is an online, public version of the Voynich Transcription Tool that gets mentioned from time to time. There is documentation for how to use the API plus a set of examples showing what sort of applications can be built on top of API queries. I've implemented some classic examples e.g. word length distribution and Sukhotin's vowel identification algorithm. I don't think these examples are of great value, but they are interesting in terms of showing a transparent methodology with clear data-set and a repeatable methodology (i.e. source code available).[/font]
[font=arial, sans-serif]An example API query could be:[/font]
[font=arial, sans-serif]You are not allowed to view links. Register or Login to view.[/font]
[font=arial, sans-serif]Which would translate as 'fetch me page You are not allowed to view links. Register or Login to view. Takahashi transcription, without interlinear comments with columns for pageId, currierHand, illustrationType, unitCode, lineNumber and using a morpheme groupings of cfh, ckh, cph, cth, eee, iii, ch, ee, ii, qo, sh'.[/font]
[font=arial, sans-serif][font=tahoma, verdana, arial, sans-serif]There are two main methods (routes) - 'tokens' and 'morphemes'. 'Tokens' gets the effectively raw transcription from the interlinear and 'morphemes' does the same things but applies a grouping algorithm that identifies e.g. 'qo', 'sh' and 'eee' etc. The morpheme groups are user configurable. I took some feedback on this (thanks, Nick Pelling) and decided that it is an open question as to whether we should see 'qo' as a single morpheme, or actually see 'qot' and 'qok' separately from 'qo' etc. There are many similar questions in this topic of word morphology that would be better served, in my opinion, by clearer data.[/font][/font]
[font=arial, sans-serif]My intention in sharing this is to enable transparency, repeat-ability and share-ability of experiments that people conduct on the text. [/font]
[font=arial, sans-serif]Regards,[/font]
[font=arial, sans-serif]Robin[/font]
[font=arial, sans-serif]I have been working on a project for the Voynich Manuscript based on the well-known 'interlinear' file. I have built a RESTful API on top of the interlinear and am presenting it here: [/font]
[font=arial, sans-serif]You are not allowed to view links. Register or Login to view.[/font]
[font=arial, sans-serif]Essentially it is an online, public version of the Voynich Transcription Tool that gets mentioned from time to time. There is documentation for how to use the API plus a set of examples showing what sort of applications can be built on top of API queries. I've implemented some classic examples e.g. word length distribution and Sukhotin's vowel identification algorithm. I don't think these examples are of great value, but they are interesting in terms of showing a transparent methodology with clear data-set and a repeatable methodology (i.e. source code available).[/font]
[font=arial, sans-serif]An example API query could be:[/font]
[font=arial, sans-serif]You are not allowed to view links. Register or Login to view.[/font]
[font=arial, sans-serif]Which would translate as 'fetch me page You are not allowed to view links. Register or Login to view. Takahashi transcription, without interlinear comments with columns for pageId, currierHand, illustrationType, unitCode, lineNumber and using a morpheme groupings of cfh, ckh, cph, cth, eee, iii, ch, ee, ii, qo, sh'.[/font]
[font=arial, sans-serif][font=tahoma, verdana, arial, sans-serif]There are two main methods (routes) - 'tokens' and 'morphemes'. 'Tokens' gets the effectively raw transcription from the interlinear and 'morphemes' does the same things but applies a grouping algorithm that identifies e.g. 'qo', 'sh' and 'eee' etc. The morpheme groups are user configurable. I took some feedback on this (thanks, Nick Pelling) and decided that it is an open question as to whether we should see 'qo' as a single morpheme, or actually see 'qot' and 'qok' separately from 'qo' etc. There are many similar questions in this topic of word morphology that would be better served, in my opinion, by clearer data.[/font][/font]
[font=arial, sans-serif]My intention in sharing this is to enable transparency, repeat-ability and share-ability of experiments that people conduct on the text. [/font]
[font=arial, sans-serif]Regards,[/font]
[font=arial, sans-serif]Robin[/font]