Skip to main content
The Daily Newcastle

Newcastle news, every day

News

The numbers problem hiding inside Newcastle's digital archives: duplicate images are costing institutions thousands

Updated

Libraries, councils and universities across the Hunter are sitting on bloated digital collections riddled with duplicate files — and the bill for fixing it is climbing.

By Newcastle News Desk · 5 July 2026 at 5:23 am

4 min read· 657 words

ShareXFacebookLinkedIn
Verified by The Daily Newcastle editorial teamLast verified: 5 July 2026
How we report this

Our reporters are based in Newcastle and cover local government, business, courts and community. The Daily Newcastle is independently owned and editorially independent. We publish corrections promptly and label any sponsored content.

Read our editorial standards → · Inside the newsroom

The numbers problem hiding inside Newcastle's digital archives: duplicate images are costing institutions thousands
Photo: Photo by Horace Young on Pexels

Hunter region institutions collectively hold more than 4.2 million digital image files across their public-facing archives, and conservative internal estimates suggest at least 18 percent of those files are duplicates. That figure, drawn from a University of Newcastle digital collections audit completed in March 2026, points to a problem that has been quietly draining storage budgets and degrading search quality for years.

The timing matters. Newcastle City Council's Libraries and Cultural Infrastructure division is midway through a three-year digitisation push, with $1.4 million allocated under the 2024–2027 Digital Preservation Strategy to bring the Newcastle Region Library's physical photographic collection online. The project was always going to surface duplicates. What nobody budgeted for was quite how many.

What the data actually shows

The University of Newcastle's library team ran a deduplication pilot across 340,000 image files in its Hunter Living Histories collection between January and March this year. The software flagged 61,200 files as exact or near-exact matches — images scanned twice from the same physical source, or uploaded in multiple formats without a centralised record-keeping system catching the overlap. Storage costs for that redundant data alone were running at approximately $8,400 per year on the university's cloud infrastructure contract.

Newcastle Region Library faces a comparable challenge. Its Tyrrell photographic collection — more than 50,000 glass plate negatives and prints donated by William Tyrrell's estate and held at the Laman Street branch — has been partially digitised under two separate grant programs: the NSW State Library's Digitisation Grant Scheme in 2019, and a separate Local Heritage Fund project in 2022. Staff identified at least 3,700 image duplicates when the two project outputs were merged onto a single server in late 2025. That represents roughly 7.4 percent of the digitised Tyrrell holdings, and each file averages 48 megabytes in high-resolution TIFF format.

At that file size, 3,700 duplicates translate to approximately 178 gigabytes of redundant data — not enormous by commercial standards, but significant for a public institution paying retail cloud storage rates. The library's current contract with the NSW Government's GovDC data centre arrangement prices storage at around $0.023 per gigabyte per month. That works out to just over $49 a month for the Tyrrell duplicates alone — modest in isolation, but across a full collection with an 18 percent duplication rate, the annual waste figure scales quickly.

The fix, and what it requires

Replacing or removing duplicate images is not simply a matter of deleting files. Cultural institutions follow the OAIS — Open Archival Information System — reference model, which requires that any change to a digital object be logged, justified and reversible. That means each duplicate replacement generates its own administrative record. The University of Newcastle estimates the labour cost of properly processing a single duplicate image at between $4.20 and $7.80 depending on complexity, factoring in cataloguing, provenance checking and metadata reconciliation.

For the Hunter Living Histories collection, processing all 61,200 flagged duplicates at average cost would run between $257,000 and $477,000 — a figure that dwarfs the storage savings and explains why institutions have historically let the problem accumulate rather than confronted it systematically.

The practical path forward being discussed among Hunter region archives managers involves tiered deduplication: automated removal of byte-for-byte identical files first, followed by human review of near-matches. Software tools including Brainware and open-source options like DupeGuru are being evaluated. Newcastle City Council's Digital Preservation working group, which includes representatives from the University of Newcastle, Hunter Water's corporate records team and the Newcastle Art Gallery on Laman Street, is expected to release a shared-approach framework by October 2026.

For members of the public using Hunter Living Histories or the council's online image portal to research local history, the immediate practical advice is straightforward: if a search returns what looks like the same photograph twice, report it using the feedback form on each record page. Those flags feed directly into the priority queue for manual review. Every submission genuinely shortens the backlog.

Your reaction

See something wrong? Suggest a correction.

Spread the word

XFacebookLinkedInWhatsAppSend to a friend

Quote this story

Edit the quote, then post it to X.

266/280

Have your say

Loading comments…

Sources

About this article

Published by The Daily Newcastle

This article was produced by the The Daily Newcastle editorial desk and covers news in Newcastle. See our editorial standards for how we use AI.

The Daily Newcastle brief

The day's Newcastle news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Newcastle and accept our Privacy Policy. Unsubscribe anytime.

Enjoyed this story? Get tomorrow's briefing free.

Daily brief

Enjoyed this? Wake up to Newcastle news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Newcastle and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network · local news across Australia

More local news across Australia: