What the TAXACOM community wants from BHL

I asked the TAXACOM mailing list for feedback on BHL. After extracting the responses I asked ChatGPT to summarise them, which I have edited and present below. Nothing will be a surprise to seasoned BHL users, such as the difficulties searching, the taxonomic name finder misinterpreting scientific names (e.g., deciding that “scutellum” is always a scientific name as opposed to a key piece of insect anatomy, resulting in 704 pages of results mostly not about Scutellum), the gaps in coverage, and the inability of users to be able to flag OCR errors or bad taxonomic names.

Summary of TAXACOM responses:

Expanding and Completing the Content

A consistent message was the need to extend and complete BHL’s coverage.

Users highlighted:

  • Keeping pace with the “moving wall” of public domain content
  • Filling gaps in journal runs, including missing volumes
  • Prioritising rare or at-risk materials held in obscure locations
  • Expanding ephemeral publications such as nursery catalogues
  • Including important but currently undigitised journals, particularly from underrepresented regions

In short: completeness matters as much as scale.


Strengthening Integration with Biodiversity Data

Many contributors see BHL not just as a library, but as infrastructure within a wider data ecosystem.

Key suggestions included:

  • Integrating content directly rather than linking to external repositories
  • Enabling better linking from taxonomic databases (e.g. to original species descriptions)
  • Improving alignment between bibliographic metadata and how scientists actually cite literature

The goal is a more connected system where literature, names, and data work seamlessly together.


Making Content Easier to Find

Despite the breadth of content, finding specific items remains a challenge.

Common issues:

  • Difficulty locating known works due to inconsistent metadata
  • Variations in titles, publication dates, and author formats
  • Limited advanced search functionality (e.g. AND, NOT, filtering within results)

For many users, discovery—not access—is the main barrier.


Improving OCR and Name Recognition

Automated text extraction and name indexing are powerful features, but users report significant quality issues.

These include:

  • Poor OCR accuracy, especially in older texts or plates
  • Problems recognising special characters (e.g. ligatures like æ/œ)
  • Scientific name detection that:
    • Misses real names
    • Flags non-names as taxa
  • Challenges handling historical formatting conventions

These issues create noise and reduce confidence in automated tools.


Enabling Community Contributions

Several contributors suggested that users themselves could help improve BHL.

Ideas included:

  • Allowing users to flag errors in OCR or metadata
  • Introducing controlled editing by trusted contributors
  • Enabling users to correct or remove incorrect taxonomic name tagging

A hybrid model combining automation with expert input could significantly improve data quality.


Keeping the Interface Stable (While Improving It)

There was strong support for maintaining the current interface and structure.

At the same time, users suggested targeted improvements:

  • Preserve deep links and avoid disruptive redesigns
  • Add a full-screen reading mode, especially for smaller screens

Stability is seen as a major strength—changes should be incremental, not disruptive.

2 Likes

Thanks for your efforts on this, Rod.

On these points especially I strongly recommend having a look at how Trove, the Australian National Library system, allows users to edit and correct OCR. Its interface is commendably intuitive and the barrier to entry consequently low.

2 Likes

Trove often comes up as an example of an editing interface. I quite like Hypothesis:

It has a nice user interface, handles web pages and PDFs, and has an API so annotations are easy to harvest.

1 Like

+1 from me on this. Trove’s example is not only fabulous but is also tremendously successful. https://trove.nla.gov.au/

1 Like

Thanks for this summary Rod, you are correct that none of this comes as a surprise to folk like me (who use BHL all the time) but is a good summary of the wants/needs of the wider biodiversity community.

1 Like

I am curious about BHL’s use of OCR for scientific names. Is the tooling behind this open source?

Hi @RichardLitt, information on the tools/methods that BHL uses to extract scientific names from OCR can be found here https://about.biodiversitylibrary.org/ufaqs/how-are-scientific-names-identified-throughout-the-bhl-corpus/ This includes gnfinder which has a github repository. Hope this info goes some way to answering your question.

1 Like

@RichardLitt As @Ambrosia10 says the code is open source, and if you are running a Mac it’s available through Homebrew.

gnfinder is very fast, and relies on a combination of what text strings look like, and whether they match a dictionary list of taxonomic names and substrings of taxonomic names. It is very good at finding things which could be names. Based on Claude Code’s reading of the source, it doesn’t take into account the context of the string, which I suspect is one reason why it produces lots of false hits for “scutellum” and other words that are both taxonomic names and also place names, anatomical terms, etc.

There is a paper on gnfinder here:

1 Like

I know it’s not a simple “flag” but we do have a “Report an Error” button on BHL. Please feel free to use.

The problem with the “Report an Error” is that it disappears into BHL, the user has no idea wherther somebody has already flagged this as an error, the user is also powerless to fix it. “Greatness lies in the agency of others”, it would be nice if BHL enabled people to flag errors and/or suggest fixes.

1 Like

we do have a “Report an Error” button on BHL

I don’t see that; where is it, please?

OK, I see it now, but only because I searched the page’s HTML source. CTRL-F for “report” or “error” on the page found nothing.

It is a tiny yellow warning triangle - so small on my screen that it just looks yellow, with the exclamation mark its the centre indistinguishable.

What’s more, if I increase the zoom in my browser by two clicks above the default (which I often have to do), it is sent to the right, out of the viewport, with no horizontal scroll bar appearing.

I’m a frequent BHL user, and if I didn’t know that was there, I would bet good money that a significant number of other users do not, also.

1 Like