What the TAXACOM community wants from BHL

rdmpage · April 17, 2026, 2:30pm

I asked the TAXACOM mailing list for feedback on BHL. After extracting the responses I asked ChatGPT to summarise them, which I have edited and present below. Nothing will be a surprise to seasoned BHL users, such as the difficulties searching, the taxonomic name finder misinterpreting scientific names (e.g., deciding that “scutellum” is always a scientific name as opposed to a key piece of insect anatomy, resulting in 704 pages of results mostly not about Scutellum), the gaps in coverage, and the inability of users to be able to flag OCR errors or bad taxonomic names.

Summary of TAXACOM responses:

Expanding and Completing the Content

A consistent message was the need to extend and complete BHL’s coverage.

Users highlighted:

Keeping pace with the “moving wall” of public domain content
Filling gaps in journal runs, including missing volumes
Prioritising rare or at-risk materials held in obscure locations
Expanding ephemeral publications such as nursery catalogues
Including important but currently undigitised journals, particularly from underrepresented regions

In short: completeness matters as much as scale.

Strengthening Integration with Biodiversity Data

Many contributors see BHL not just as a library, but as infrastructure within a wider data ecosystem.

Key suggestions included:

Integrating content directly rather than linking to external repositories
Enabling better linking from taxonomic databases (e.g. to original species descriptions)
Improving alignment between bibliographic metadata and how scientists actually cite literature

The goal is a more connected system where literature, names, and data work seamlessly together.

Making Content Easier to Find

Despite the breadth of content, finding specific items remains a challenge.

Common issues:

Difficulty locating known works due to inconsistent metadata
Variations in titles, publication dates, and author formats
Limited advanced search functionality (e.g. AND, NOT, filtering within results)

For many users, discovery—not access—is the main barrier.

Improving OCR and Name Recognition

Automated text extraction and name indexing are powerful features, but users report significant quality issues.

These include:

Poor OCR accuracy, especially in older texts or plates
Problems recognising special characters (e.g. ligatures like æ/œ)
Scientific name detection that:
- Misses real names
- Flags non-names as taxa
Challenges handling historical formatting conventions

These issues create noise and reduce confidence in automated tools.

Enabling Community Contributions

Several contributors suggested that users themselves could help improve BHL.

Ideas included:

Allowing users to flag errors in OCR or metadata
Introducing controlled editing by trusted contributors
Enabling users to correct or remove incorrect taxonomic name tagging

A hybrid model combining automation with expert input could significantly improve data quality.

Keeping the Interface Stable (While Improving It)

There was strong support for maintaining the current interface and structure.

At the same time, users suggested targeted improvements:

Preserve deep links and avoid disruptive redesigns
Add a full-screen reading mode, especially for smaller screens

Stability is seen as a major strength—changes should be incremental, not disruptive.

Pigsonthewing · April 17, 2026, 2:39pm

Thanks for your efforts on this, Rod.

On these points especially I strongly recommend having a look at how Trove, the Australian National Library system, allows users to edit and correct OCR. Its interface is commendably intuitive and the barrier to entry consequently low.

rdmpage · April 17, 2026, 2:46pm

Trove often comes up as an example of an editing interface. I quite like Hypothesis:

It has a nice user interface, handles web pages and PDFs, and has an API so annotations are easy to harvest.

Ambrosia10 · April 17, 2026, 7:52pm

+1 from me on this. Trove’s example is not only fabulous but is also tremendously successful. https://trove.nla.gov.au/

Ambrosia10 · April 17, 2026, 7:54pm

Thanks for this summary Rod, you are correct that none of this comes as a surprise to folk like me (who use BHL all the time) but is a good summary of the wants/needs of the wider biodiversity community.

RichardLitt · April 18, 2026, 11:56pm

I am curious about BHL’s use of OCR for scientific names. Is the tooling behind this open source?

Ambrosia10 · April 19, 2026, 2:43am

Hi @RichardLitt, information on the tools/methods that BHL uses to extract scientific names from OCR can be found here https://about.biodiversitylibrary.org/ufaqs/how-are-scientific-names-identified-throughout-the-bhl-corpus/ This includes gnfinder which has a github repository. Hope this info goes some way to answering your question.

rdmpage · April 19, 2026, 9:17am

@RichardLitt As @Ambrosia10 says the code is open source, and if you are running a Mac it’s available through Homebrew.

gnfinder is very fast, and relies on a combination of what text strings look like, and whether they match a dictionary list of taxonomic names and substrings of taxonomic names. It is very good at finding things which could be names. Based on Claude Code’s reading of the source, it doesn’t take into account the context of the string, which I suspect is one reason why it produces lots of false hits for “scutellum” and other words that are both taxonomic names and also place names, anatomical terms, etc.

There is a paper on gnfinder here:

bhlbianca · April 22, 2026, 4:43pm

I know it’s not a simple “flag” but we do have a “Report an Error” button on BHL. Please feel free to use.

rdmpage · April 22, 2026, 4:49pm

The problem with the “Report an Error” is that it disappears into BHL, the user has no idea wherther somebody has already flagged this as an error, the user is also powerless to fix it. “Greatness lies in the agency of others”, it would be nice if BHL enabled people to flag errors and/or suggest fixes.

Pigsonthewing · April 22, 2026, 6:14pm

we do have a “Report an Error” button on BHL

I don’t see that; where is it, please?

Pigsonthewing · April 22, 2026, 6:34pm

OK, I see it now, but only because I searched the page’s HTML source. CTRL-F for “report” or “error” on the page found nothing.

It is a tiny yellow warning triangle - so small on my screen that it just looks yellow, with the exclamation mark its the centre indistinguishable.

What’s more, if I increase the zoom in my browser by two clicks above the default (which I often have to do), it is sent to the right, out of the viewport, with no horizontal scroll bar appearing.

I’m a frequent BHL user, and if I didn’t know that was there, I would bet good money that a significant number of other users do not, also.

Topic		Replies	Views
Briana Giasullo's workflow for improving OCR of BHL handwritten texts Technology and Tools ai , ocr , handwriting	2	27	March 25, 2026
Handling special characters when searching for taxonomic names Feature Ideas	2	21	March 25, 2026
Welcome to BHL Community Forum! :wave: General	1	49	April 24, 2026
Introduce yourself to the BHL community General bhl-community	24	179	May 6, 2026
About the Technology and Tools category Technology and Tools apis , metadata , ai	0	7	March 20, 2026