Hi Rene,
Yes, the tool does a very similar job to Elias Schwerdtfeger's Voynich Information Browser. I believe we've followed a very similar path to get to a very similar outcome. I doubt you will find anything new in the tool or the research I've presented. Effectively, it is just a new medium that aligns with common standards being used across the internet nowadays. The key difference is that this API creates a URL addressable reference to the source data for an analysis.
For example, to just get You are not allowed to view links.
Register or
Login to view. by Takahashi, removing inline comments then you would call:
You are not allowed to view links.
Register or
Login to view.
Elias has made a great tool and I'm sure that many people have made good use of it. However, one thing that I am hoping to add in terms of functionality is simply to make the data consistently accessible to any web-based presentation of research. Of course, one might go to the VIB and make a query; download the result; then use that in an analysis and some presentation of that analysis - and make the source document available. The alternative with the API is simply to reference a URL and if anyone wants to inquire what the source was then one can simply point to that URL. The choice of chunking of word-parts - what I call morphemes - is a big part of the API and offers a significant point of difference to the VIB.
The API is also delivering the result as JSON which we can say is one of the current dominant standards for online data interchange. Instead of using the format of e.g.
<f1r.P1.1;H>
fachys.ykal.ar...
I am delivering the token information as e.g.
[
{"pageId":"f1r","unitCode":"P1","lineNumber":"1","item":"fachys"},
{"pageId":"f1r","unitCode":"P1","lineNumber":"1","item":"ykal"},
{"pageId":"f1r","unitCode":"P1","lineNumber":"1","item":"ar"}
...
]
In order to parse <f1r.P1.1;H>
fachys.ykal.ar.. then I will need to pre-process this by removing <*> and then splitting on . and so forth. By presenting the data as a JSON array, then a Javascript programmer will easily see how to process the data without these preliminary steps. In fact, most common programming languages have libraries for processing JSON data.
A second thing is that when people present their findings online, the source is static. However, we know that outcomes can often differ by small and seemingly inconsequential detail. For example, I attempted to replicate Stolfi's classic analysis where he demonstrates a fit of the word-length distribution to a certain function. In his article he makes some thought-provoking comments about the language structure. But if you choose some different assumptions for the experiment, you will find that the fit to the binomial function isn't actually so great. Look, I'm not trying to disparage the experiment - I like that it is objective and he clearly explains his steps and so forth. However, the outcome is different if you change the starting conditions - you can see what I am talking about here:
You are not allowed to view links.
Register or
Login to view.
If you run the plot with the defaults I put in you will get a close fit to the Binom(9, k-1). The defaults are: cfh,ckh,cph,cth,ch,sh. If you check out Stolfi's original analysis, I believe he mentions these and that's why I selected them. I deliberately wanted to get the fit that he found.
Now, please run it again with this selection: cfh,ckh,cph,cth,ch,sh,qo,iin,in,ol,al. The extra 'morphemes' I've added to the list are 'qo', 'iin', 'ol' and 'al' - a selection of morphemes that are common as prefixes and suffixes. Note that the distribution is no longer such an awesome fit. Small choices in the parsing of the text can yield significantly different results.
We can run it a few more times - try the original set of morphemes (i.e. cfh,ckh,cph,cth,ch,sh) but with Bio-B and Herbal-A. To do this you amend the query strings:
* Bio-B - transcriber=H&isWord=1&hasFiller=0&isAmbiguous=0&illustrationType=B&currierLanguage=B
* Herbal-A - transcriber=H&isWord=1&hasFiller=0&isAmbiguous=0&illustrationType=H&currierLanguage=A
And so on...
I think all the work you and others put in a few years ago is important - we need to 'carry the fire' if you take my meaning.
All the best,
Robin
(09-10-2016, 08:16 AM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Hi Robin,
Two questions: Do you have the list of transcriber codes? And which transcription is the most complete?
Hi David -
The list of transcriber codes is available by this API call:
You are not allowed to view links.
Register or
Login to view.
Which will give you:
{"values":["H","C","F","N","U","D","X","J","G","V","Z","R","K","Q","L","P","I","T"],"dailyRequestCounter":"23"}
This maps to the list in the interlinear:
# Transcriber codes
# -----------------
#
# [ The following transcriber codes were inherited from INTERLN.EVT: ]
#
# C: Currier's transcription plus new additions from members of the
# voynich list as found in the file voynich.now.
# F: First study group's (Friedman's) transcription including various
# items as found in the file FSG.NEW.
# T: John Tiltman's transcription of some pages.
# L: Don Latham's recent transcription of some pages.
# R: Mike Roe's recent transcription of some pages.
# K: Karl Kluge's transcription of some labels from Petersen's copies.
# J: Jim Reed's transcription of some previously unreadable characters.
#
# [ The following codes were added by J. Stolfi after 05 Nov 1997,
# in the unfolding of "[|]" groups:
#
# D: second choice from [|] in "C" lines.
# G: second choice from [|] in "F" lines, mostly from [1609|16xx].
# I: second choice from [|] in "J" lines.
# Q: second choice from [|] in "K" lines.
# M: second choice from [|] in "L" lines.
#
# The following codes were assigned by J. Stolfi for use in
# "new" transcriptions:
#
# H: Takeshi Takahashi's full transcription (see f0.K).
# N: Gabriel Landini.
# U: Jorge Stolfi.
# V: John Grove.
# P: Father Th. Petersen (a few readings reported by K. Kluge).
# X: Denis V. Mardle.
# Z: Rene Zandbergen.
# ]
Quote:And which transcription is the most complete?
I am not sure - I haven't looked at this in as much detail as I should! The data in the API is simply a parsed version of the interlinear file - 'text16e6.txt'
I use Takahashi's transcription all the time simply because Stolfi refers to it as 'full'.
Stolfi posted a 'majority-vote' file online which I am intending to roll into the tool. I haven't started an analysis of whether it can be included within the same data-set as the 'text16e6.txt' file.
I think this is a very important question.
Thanks,
Robin
(09-10-2016, 09:31 AM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.Hi Robin,
Another question if you will!
Is it possible to directly query for a single word using the Stolfi interlinear locator code?
IE, I want to get the label at
<f57v.X.2>
So that's folio f57v, line X, word 2.
I would assume a call like
You are not allowed to view links. Register or Login to view.
But I can't seem to narrow it down to the word. Removing unitCode gives me the complete list (from which I can then find my query programmatically of course )
Hi David,
I would use this call (for line number 3):
You are not allowed to view links.
Register or
Login to view.
Which gives:
Code:
{
"parameters": ["pageId=f57v", "lineNumber=3", "transcriber=H", "isWord=1"],
"selectedColumns": ["pageId", "unitCode", "lineNumber", "item"],
"tokens": [{
"pageId": "f57v",
"unitCode": "X",
"lineNumber": "3",
"item": "otardaly"
}, {
"pageId": "f57v",
"unitCode": "Y",
"lineNumber": "3",
"item": "ocfhor"
}, {
"pageId": "f57v",
"unitCode": "Y",
"lineNumber": "3",
"item": "okear"
}],
"dailyRequestCounter": "25"
}
Which will give you all token on the lines - you will have to get the 2nd token yourself.
Cheers,
Robin