This is a guest blog post from the Decentralised Public Library, an Arweave initiative.
What is link rot?
We’ve all been there — you’re trying to access a page online and you suddenly receive the dreaded 404 error — ‘Page not found’. Just like that, a piece of unique internet history is gone. This phenomenon is called ‘link rot’.
Link rot is surprisingly common — across major social media platforms 30% of shared links are totally dead within two years. We all imagine the internet to be timeless — after all, sometimes it seems like removing something from the internet is nearly impossible. However, this is simply not true — the traditional web is shockingly fragile and not at all immortal.
Over 20 years, 98.4% of web links suffer from rot, becoming totally inaccessible to future generations.
It’s not just casually-shared social media links that are at risk of rot though. Even the most prestigious institutions in the world really struggle with link rot — 50% of U.S. Supreme Court opinions contain dead links, and so do 70% of Harvard academic journals. Both legal judgements and academic research rely very heavily on access to historic evidence decades into the future — link rot jeopardises this.
Why do links rot?
The most common causes of link rot are pretty mundane. Perhaps the original creator has lost interest in their old hobby and let the domain registration expire, or perhaps they ran out of time and money to keep the site active. Sadly, this isn’t just a problem for small-scale hobbyist websites — many huge websites that have been acquired by tech giants are promptly shut down in the years after their acquisition. Sometimes this is because they’re not profitable enough to keep online, and sometimes it’s because the parent company wants to reduce competition for their own original products. This is exactly what happened with Google’s acquisition of photo-sharing site Picasa in 2004, and then its subsequent shutdown of the site in 2016, automatically migrating its entire userbase to their native product, Google Photos. In cases like this, any URLs linking back to Picasa content will have broken when the service and its domain were retired.
Websites don’t maintain themselves, and ultimately unless people work to preserve them and keep them accessible, they will all fall offline eventually.
Something as simple as a site changing its domain name will trigger any links back to the old domain to rot away, often with no way for the reader to find the new location.
Although more rare, there are also deeply sinister, political causes of link rot, like government censorship. Some governments dedicate a huge amount of time and resources to information control, attempting to take down websites to hide information from their citizens. For instance, China’s government has targeted scientific and academic publications extensively for takedowns of specific articles in recent years, and both Springer and Cambridge University Press have bowed to this pressure. Even the USA has been accused of censoring climate change information from government websites for political reasons.
How can we fight link rot?
There are a range of things we can all do to reduce the frequency of link rot across the web.
Firstly, we should be aware of the important work of projects like the Internet Archive’s Wayback Machine, which crawls the web and makes backups of millions of websites. If you encounter a 404 error out in the wild, you should always check the URL with the Wayback Machine to see if there’s an archived version available. They even have a nifty browser extension (for Chrome and Firefox) that automatically replaces broken links with archived sites, if they happen to have a backup of that page. There are also national web archives that perform similar functions — for example, the UK Web Archive, though they focus heavily on UK websites, and aim to capture pages just once per year. Remember, most web archiving projects accept requests so let them know if a site you need backed up is missing!
Also, you can begin to permanently archive the decaying traditional web yourself! One option is the Arweave web extension for Chrome, which allows you to permanently capture snapshots of webpages storing them immediately on the ‘permaweb’.
The permaweb is similar to the traditional web, except permaweb content cannot be lost, altered, or purposefully deleted. The links never change, so you know that you won’t suffer from link rot!
Also, for content creators and developers it’s brilliant, because you don’t have to waste your time configuring servers, maintaining domains, or repeatedly paying to host content year after year. You can check out more about the permaweb here, and play around with some free archiving tokens here. As you may have noticed, most of the links in this article are actually permaweb snapshots of traditional webpages, placing them beyond the reach of accidental and intentional alteration and loss.
Once you’ve claimed your free tokens, it’s super easy to start archiving with the Arweave web extension. Check out our handy step-by-step guide here: https://docs.arweave.org/info/archiving/step-by-step-beginners-guide
No more 404!