The Voynich Ninja

Full Version: Using AI to do research - Specifically online searching
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
(02-05-2026, 03:29 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.
(02-05-2026, 03:16 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.
(02-05-2026, 02:42 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.
(02-05-2026, 02:16 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.It would be nice if the AIs could do some kind of real-time OCR so that typed(or even handwritten) documents that haven't read digitised could be read and analysed.

It can do it fairly well now.. Gemini is the best model, but really one page at a time.

One page at a time is really a problem.

I didn't explain clearly. I mean as part of its search or maybe Google could automatically OCR all documents found. I don't know how far we are now from that being computationally feasible.

There are quite a lot of typed and scanned, but not digitised inventories which are a lot of effort to read manually and for which being able to search inside would be really helpful. (Obviously, being able to read handwritten documents as well would be amazing).

My view is the first person to build an algorithm and robot to scan manuscripts at volume will make some serious money... There's OCR for typed documents and HTR (Handwriting text Recognition) for written.  Transkribus is the gold standard, but the frontier models are catching up fast... But there are 10s of thousands of Manuscripts that no one has even looked at...

Well, obviously handwriting recognition is that much harder especially when extended to older less legible documents and unusual scripts. I am just talking about typed documents that are already scanned and available online just not yet digitised. Obviously, this wouldn't solve the problem of all the documents which have not been scanned and uploaded to the internet.
Transkribus is very good at this, but unfortunately it costs money...

I’d be careful with Gemini, because it tends to “hallucinate” when transcribing text. It’s very good at recognizing words that other AI systems can’t, but on the other hand, this also means it recognizes many words that aren’t correct. Gemini wants to please.

You are not allowed to view links. Register or Login to view. 

(I’ve already had two entire books transcribed, and it worked really well.)
Pages: 1 2 3 4 5 6