Briana Giasullo's workflow for improving OCR of BHL handwritten texts

Ambrosia10 · March 24, 2026, 7:57pm

A presentation I’ve been sharing frequently is one by Briana Giasullo, a Cataloging and Digital Resources Librarian from the Academy of Natural Sciences, on her solution to the issue of handwritten texts/fieldnotes in BHL having pretty much useless OCR. Briana explains how she used Amazon Textract to generate plain text files, then she used Zooniverse to find volunteers to check the transcription generated by Amazon textract. She then uploaded the corrected text files to BHL replacing the useless OCR with text. BHL could then extract species names making the handwritten text more findable. https://www.youtube.com/watch?v=PXQDWqoB8Xg&t=229s

rdmpage · March 24, 2026, 9:49pm

Interesting example, this raises the question of who gets to upload corrected OCR text? Is there a general mechanism for this, or do you have to be a BHL member? What happens if a user spots errors? For example, I think there are several mistakes in the transcription of https://www.biodiversitylibrary.org/page/59782282 (image below). How do I fix those? From my perspective “agency” is a big issue with the BHL platform. There is no obvious way for people to contribute to improving the content.

bhl_nicole · March 25, 2026, 12:17am

Yes, we can upload corrected OCR into BHL. The workflow is the same as for uploading transcriptions. At present this is only possible via the BHL Dashboard, which you need a login for (and training).

However, like adding article data to BHL, the time consuming part is the gathering, checking, and formatting the content/data for upload. Or it was. Most of this work can now be done by AI.

The only data required by BHL for both transcriptions and corrected OCR is: pageID, SequenceNumber and Text. Perhaps, like @rdmpage did for article data, we can explore other pathways? A BioStor for transcriptions and corrected OCR?

PageID	SequenceNumber	Text

Topic		Replies	Views
Field Notes Explorer: local AI transcription for handwritten (and other) field notes Research and Projects handwriting , ai , ocr	3	10	March 29, 2026
SOTA OCR (state of the art optical character recognition) Technology and Tools ai , ocr	3	23	March 27, 2026
About the Technology and Tools category Technology and Tools apis , metadata , ai	0	4	March 20, 2026
Handling special characters when searching for taxonomic names Feature Ideas	2	15	March 25, 2026
About the BHL Content Contributors category BHL Content Contributors metadata , macaw , workflow , pagination , digitization	0	2	March 20, 2026

Briana Giasullo's workflow for improving OCR of BHL handwritten texts

Related topics