This is a guest blog post from the Decentralised Public Library, an Arweave initiative.
Journalism is a crucial tool in any democracy — the work you do is vital in holding powerful individuals, corporations, and governments accountable for their actions. To do this effectively, journalists and fact checkers like yourself rely on a vast array of evidence and documentation, much of it from ephemeral web sources.
However, the methods we rely on for citing and storing web evidence today are deeply flawed. In turn, this can limit the perceived reliability and trustworthiness of the stories you’re telling.
Here, we will explore how these problems can be solved by using the permaweb, and the Arweave web extension archiving tool specifically.
The Problem: Link rot and lost sources
When citing or referencing a piece of information, we typically link to the content with a URL (‘unique resource locator’). URLs on the traditional web simply point your web browser to the location of your web source. However, traditional web links are not truly permanent, and are subject to loss and change at an alarming rate. When this happens it’s very difficult or even impossible to relocate your web evidence.
When a link ‘breaks’ and no longer points to an active web page, this is known as ‘link rot’. When this happens you’ll often find a ‘404 error’ like this one, which ironically I found in this article about online news accountability here.
Link rot is surprisingly common, affecting even the most prestigious media outlets. Just in researching for these blog posts alone, we have found link rot in online news publications from the New York Times, Washington Post, BBC News, the Guardian, and many more. Furthermore, studies repeatedly demonstrate that link rot is surprisingly common in: U.S. Supreme Court opinions, Harvard law journal articles, world-renowned scientific publications, links shared on social media, and even Library of Congress reports.
Link rot may even make it more difficult for investigative journalists to hold governments accountable for potential war crimes.
Of course, when you encounter a 404 error, you can try searching for the content elsewhere on the web (for example with the Wayback Machine), but this is time consuming and definitely not guaranteed to be successful. Ultimately, things can simply be deleted or lost from the web, despite what some may wish.
Causes of link rot are varied, including an organisation moving to a new domain name, restructuring their site map, or deleting pages. We explore some of these reasons in more depth here.
Naturally, your work as a journalist relies heavily on reliable access and linking to web sources. As research has repeatedly shown, traditional web links are simply not resilient enough for you and your readers to rely on.
The Solution: Use permaweb links
The permaweb is designed to offer a solution to link rot. By using the Arweave web extension, all archived pages are automatically given a truly unique, truly permanent identifier that will never, ever change. This means you can say goodbye to link rot! Here’s an example of a permaweb link with a unique identifier: https://arweave.net/saYqBP_aEVTlpc62xaCOtWgmVKPdQe566Z5Wiz_kRFs.
Quite rightly, you might wonder — surely that arweave.net link could rot too? Well, that’s correct, one day far into the future the arweave.net domain might indeed become inaccessible — perhaps we stop hosting it for some reason, or perhaps your internet provider or even your government might restrict access to it. That’s okay though, anyone can set up a new ‘gateway’ domain and use the very same unique identifier (‘saYqBP_aEVTlpc62xaCOtWgmVKPdQe566Z5Wiz_kRFs’, in our example) to access the very same content. This means that these identifiers can outlive the Arweave project itself, indefinitely. Ultimately, this gives the content a vastly longer lifespan and solves link rot not just for ourselves, but for our children and our grandchildren too!
The Problem: Content drift and changing web pages
As well as ‘link rot’, links on the traditional web can also suffer from ‘content drift’. Content drift is subtly different than link rot, though both have serious negative consequences. With link rot, the original URL no longer functions at all, but with content drift the URL still takes the user to a functioning web page, but the content has changed significantly since it was originally cited. It’s a common problem, too — one study found that 75% of scholarly articles suffered from content drift in their references.
The reasons for unreliable, changing content on a web page are numerous, ranging from innocent updates provided to a breaking news article to the more sinister phenomenon of stealth editing. A discussion of how stealth editing can undermine trust in media and journalists can be found here.
As you might imagine, content drift can be a major problem when you’re citing web sources to support an investigative journalism piece. Readers will likely be unaware that content drift has occurred, and if the new content doesn’t match the reason for your citation it may give them reason to doubt the accuracy of your reporting.
When you write an important piece of journalism and sink so much of your time and care into it, you need its sources to live just as long as the piece itself — the permaweb can help with this.
The Solution: Permaweb snapshots of pages
When you archive a page onto the permaweb using the Arweave web extension, it essentially stores a permanent, frozen snapshot of the page’s contents as it was at the time of archiving. This means that you don’t have to worry about the page being lost, its contents changing subtly (or dramatically!), or it being deleted. Your web evidence will remain on the permaweb forever, as it was when you archived it. Just like with link rot, the permaweb offers a genuine solution to content drift that you can rely on.
Once a web page or PDF is perma-archived, it cannot be altered, edited, or otherwise tampered with.
This provides extra confidence and reliability in your vital web evidence. Whether the page was perma-archived yesterday or a year ago, you know with mathematical certainty that what you’re viewing hasn’t changed since it was archived.
The Problem: Maintaining backups
When trawling the web for evidence or research, one time-consuming challenge can be finding efficient and reliable methods of storing and organising said evidence. Yes, you can store lists of URLs from the traditional, impermanent web, but as we’ve demonstrated they will eventually suffer from link rot and content drift. Alternatively, I’m sure many people print physical copies and/or use your browsers built-in ‘Print to PDF’ function. These copies and a sufficient number of backups then have to be stored somewhere, which can be very inconvenient and unreliable, especially with physical copies.
The Solution: Lots of automated, permanent backups
The permaweb offers approximately 150 automated backups as standard, and the technology is designed to gradually increase this number over the coming months, years, and generations! Simply put, this means you don’t have to worry about maintaining your own backups! The permaweb will automatically fuel the creation of an increasing number of replications of your backups, with no extra cost to you at all.
So, in summary, the permaweb is a very powerful tool for any journalist. By providing truly permanent links and snapshots of vital web sources to backup your work forever, the permaweb allows future generations to learn from your important work.
If you want to try out some perma-archiving for yourself, you can do so for free by following the steps in our very beginner-friendly guide here.