I put together a rough python script based on the word boxes from the XML files by Job (voynichese.com). The script goes through each word box splitting it into smaller boxes for strokes that are not connected. Distances of 5 pixels or less (~0.2 mm at 600 dpi) are ignored.
I ran the script on Q20. This is the output for the first 3 lines of f106r. Spaces are expressed as micrometers (1000=1 mm).
pshdar 1355 shoefy 2625 yteedy 2498 sho 296 l 2328 korchy 2709 she 423 ky 1905 otchedy 3810 o 423 kshed 2074 qotedy 1651 qoted 1736 yteeo 381 dy
sh 296 edch 254 y 2625 yt 254 chedy 1947 chees 1439 otshes 2540 o 847 kcho 2244 chdy 2074 qo 254 tee 296 dy 2963 ch 423 e 508 d 762 ch 381 e 296 d 254 y 1990 ch 339 e 423 dy 1228 qota 508 r 931 rod
d 296 sh 296 es 847 l 466 che 339 dy 1609 lkch 296 edy 2921 ytchdy 2074 o 847 r 550 ch 423 eo 550 s
In this image, spaces are coloured according to width (<800 green, <1600 blue, <2400 purple; wider spaces red):
[
attachment=8299]
Histograms for the distribution of spaces for a few last-first combinations:
[
attachment=8300]
This analysis obviously is very rough. In particular, spaces are measured horizontally across boxes, which can result in underestimation with respect to the actual distance between strokes. For example,
l often has a long leftward descender: I slightly cut descenders and ascenders, but they certainly have a huge impact in reducing these measures.
Moreover, I probably made errors I am not aware of, so everything should be taken with caution.
The histogram including all spaces possibly shows two overlapping distributions: one peaking close to 0 and a much smaller one peaking at ~2mm.
The comparison between r.a (one of Patrick's examples of ambiguous pairs) and r.c could be particularly interesting. The left stroke of a is very similar to the left stroke of c: the difference is probably due to something deeper than stroke shape. In the case of r.a, there is a drift from a normal close to 0 distance. For r.c, it is clearer that the space tends to be close to 1.5 mm.
EDIT: the last word of the image above shows one of the problem with my script. Words are broken into unconnected fragments and parts of the words are assigned to each fragment based on the position of the characters in the word. I made no attempt at OCR here. The parts of cheos where labelled ch-eo-s instead of the correct che-o-s. My impression is that these errors are not terribly frequent nor very systematic, so I expect that histogram shapes are significant.