Newcastle City Library holds tens of thousands of digitised photographs, maps and documents, and somewhere inside that collection an unknown number of files are duplicates — identical or near-identical images catalogued separately, clogging search results and consuming server storage that costs real money to maintain. The library, based on Laman Street in the city centre, has been working through a remediation project that staff began in earnest in late 2024, according to public documentation tabled at Hunter Joint Organisation meetings. It is unglamorous, painstaking work. It is also increasingly urgent.
The pressure to sort this out has sharpened in 2026 for a simple reason: artificial intelligence indexing tools have arrived in regional library systems and they surface duplicate content in ways that older catalogue software never did. Institutions that leave the problem unaddressed now risk polluting AI-assisted search results for researchers, journalists and school students alike. The University of Newcastle's Cultural Collections unit, housed at the Auchmuty Library on the Callaghan campus, flagged this specific risk in a working paper circulated to NSW university library networks in March 2026, noting that unchecked duplication undermines the reliability of automated metadata generation.
What Newcastle Is Doing — and What It Is Not
Newcastle's approach so far has been manual-first. Archivists at both Newcastle City Library and the University of Newcastle have used a combination of perceptual hashing software — tools that assign a numeric fingerprint to each image — and human review to flag suspected duplicates before any deletion occurs. No file is removed without a second opinion. That conservative method has a cost: the University of Newcastle's Cultural Collections team estimated in the same March 2026 working paper that a fully manual review of a 50,000-image archive takes roughly 1,400 staff hours at current resourcing levels.
Compare that to Rotterdam, where the Gemeentearchief Rotterdam completed a fully automated duplicate-purge of its 220,000-image digitised collection in 2023 using open-source deduplication pipelines developed at Delft University of Technology. The Dutch institution removed around 18,000 files in eight weeks with minimal human intervention. The result was faster, cheaper and, by Rotterdam's own published account, highly accurate — but critics inside the Dutch heritage sector noted that roughly 60 contextually distinct photographs were incorrectly flagged and deleted before the error was caught.
Richmond, Virginia's Valentine museum took a middle path. Its 2024 digital remediation project, publicly documented in the American Alliance of Museums journal that October, paired automated flagging with a crowd-sourced verification step, inviting registered community members to review flagged pairs online before archivists made the final call. Participation rates were modest — around 340 verified community contributors over six months — but the model drew genuine interest from Australian heritage bodies, including a delegation from the State Library of NSW that visited Richmond in February 2025.
The Local Stakes Are Higher Than They Look
Newcastle's digital heritage collections carry material that exists nowhere else: pre-federation photographs of the BHP steelworks site at Mayfield, flood survey maps of Throsby Creek dating to the 1950s, oral history recordings from Hunter coalfields communities that have since been resettled. Losing even a handful of those files to an incorrect automated deletion would be irreversible. That risk calculus explains why local institutions have been slower to automate than their European counterparts.
City of Newcastle's draft Digital Preservation Policy, released for public comment in April 2026, explicitly states that no deletion workflow for heritage-grade assets will proceed without dual-staff sign-off regardless of the automation tool in use. The policy is expected to be formally adopted before the end of the 2026 calendar year.
For community members and researchers who rely on these archives, the practical upshot is straightforward. If you are searching Newcastle City Library's online catalogue and encounter what appears to be a duplicated entry — same image, different catalogue numbers — the library's local studies desk on Laman Street takes direct reports via its online feedback form. Those reports are being actively reviewed and folded into the ongoing remediation queue. The Auchmuty Library's Cultural Collections unit accepts similar notifications through the University of Newcastle's library portal. Both institutions have said publicly that community reporting has already helped surface errors that automated tools missed.