Make Digital Preservation Easier

If major companies think it’s too hard or costly to leave up sites filled with user-generated content, perhaps we need to change the motivations.

Today in Tedium: Look, I’m not going to tell you that Yahoo Answers was the height of cultural artifacts. But the thing is, it had value. And the reason why it had was because of the amount of time that it was online, the sheer number of its answers, and its public-facing nature. But sites do not stay stationary, encased in amber, and there is significant financial motivation for large companies to only play the hits. After all, it’s why Top 40 radio isn’t all Dishwalla, all the time. But after seeing yet another situation where a longstanding Yahoo-owned website is shutting down, I’m left to wonder if the problem is that the motivations for maintaining sites built around user-generated content simply do not favor preservation, and never will without outside influence. How can we change that motivation? Today’s Tedium, in a follow-up to the post we wrote as Yahoo Groups was getting shut down, ponders the issue from the corporate perspective. — Ernie @ Tedium

Today’s Tedium is sponsored by Refind. More from them below.

“I understand your usage of groups is different from the majority of our users, and we understand your frustration. However, the resources needed to maintain historical content from Yahoo Groups pages is cost-prohibitive, as they’re largely unused.”

A statement sent to an archivist in 2019 as Verizon took steps to shut down the vast majority of the existing Yahoo Groups, the last major element of Yahoo’s user-generated content apparatus that was dismantled, with Groups meeting its maker a little over a year ago. It’s worth keeping in mind that at the scale Verizon works—making billions of dollars per year, on average—the costs of continuing to host such content would have been relatively minimal—especially given the fact that, uh, it owns a big chunk of the network through which that content is distributed.

Denny muller 1q L31aac APA unsplash

(Denny Muller/Unsplash)

The problem with corporate motivations is that they aren’t the same as the user’s, even when the user made the content

Whether Google, Verizon, Disney, Nintendo, or Sony, the corporate motivations for keeping content available online for long periods differ greatly from the motivations that drive external visitors.

Users very much have an expectation of permanence just as they did with physical media, but in the context of online distribution, these companies have competing interests driving their decision-making that discourage them from not taking steps to protect historic or vintage content.

And in the case of user-generated content, there might be outside considerations at play. Perhaps they are concerned that something within an old user agreement might come to bite them if they leave a website online past its sell-by date, opening up to liabilities. Perhaps the concern is old, outdated code that may look novel on the outside but is effectively a potential attack surface in the wrong hands. After all, if they’re not keeping an eye on it, who’s to say someone can’t take advantage of that?

And then there are reasons that are a little more consumer-hostile. Nintendo recently ended sales for a bunch of old Mario content in both digital and physical form. It evokes the old gating of home video releases that Disney used to do in an effort to keep its old content fresh and make more money from that old content.

When it comes to websites, though, much of that content is user-generated, even if a technology company technically maintains it. I have to imagine that there’s an expectation that a company only has limited capability for maintenance costs, and the motivation for doing so is limited.

But on the other hand, as digital preservationist David Rosenthal has pointed out, in the grand scheme, preservation is not really all that expensive. The Internet Archive has a budget—soup to nuts—of around $20 million or less per year, around half of which goes to pay for the salaries of the staff. And while they don’t get all of it (in part because they can’t!), they cover a significant portion of the entire internet, literally millions of websites. They have a fairly complex infrastructure, with some of its 750 servers online for as long as nine years and petabyte capacity in the hundreds, but given that they are trying to store decades worth of digitized content—including entire websites that were long-ago forgotten—it’s pretty impressive!

So the case that it costs too much to continue to simply publicly host a site that contains years of historically relevant user-generated content is bunk to me. It feels like a way of saying “we don’t want to shoulder the maintenance costs of this old machine,” as if content generated by users can be upgraded in the same way as a decade-old computer.

One thought I have is that this issue repeatedly comes up because the motivations for corporations naturally lean in favor of closure when the financial motivation has dried up. Legislation could be one way to manage this to sort of right the axis in favor of preservation—but legislation could be difficult to pass. (This was the crux of my case for trying to make the legislation for the National Register of Historic Places apply to websites.)

In my frustration about this issue last night on Twitter, I found myself arguing for legislation that balances liability in favor of preservation of public-facing content. But I’m a realist—a law like that would have many moving parts and may be a tough sell. So, if we can’t encourage a law, maybe we need to build strategies to make maintaining a historic website easier to lift.

Refind

Refind — Get a little bit smarter every day. You're eager to learn new things, but overwhelmed with too much content? Give Refind a try. Get a daily selection of links that move you forward, tailored to your interests. The best from all around the web, curated by experts and our algorithm. Sign up for free here.

2012

The year that the genealogy platform Ancestry.com launched a new site, Newspapers.com, to offer paid archives of newspapers to interested parties. The company, which charges about $150 per year for access to the archive, has helped maintain access to the historic record for researchers who need it. (I’m a subscriber and it is worth it.) With the exception of paid services for Usenet like Giganews, this model has not really been tried for vintage digital-only content, which seems like a major missed opportunity for companies raising concerns about financial costs for maintaining old platforms, like Yahoo/Verizon. Certainly I would prefer it to be free, but if I had to have a choice between free and non-existent, I’d pay money to access old content. Just throwing that out there.

Ethan hoover e IVJ Akj1u Cs unsplash

(Ethan Hoover/Unsplash)

A middle ground: An “analog nightlight” mode for websites

In some ways, I think that part of the motivation for taking down old or outdated websites is the expectation that the internal systems must also stay online.

But I think archivists and historians would be more than happy if public-facing content—that is, content that appeared on search engines, or was a part of the main experience when logged in at a basic level—was prioritized and protected in some way, which would at least keep the information alive even if its value was limited.

There’s something of a comparison here that I’d make: When the U.S. dropped the vast majority of its analog signals in favor of digital tuning, it led to something called the “analog nightlight,” in which very minimal, basic information was presented on analog stations was presented during the period before it was turned off. A TV host parlayed basic information to viewers about the transition, and told them what to do next. It didn’t entirely work—TV stations in smaller markets didn’t actually air the analog nightlight—but it helped give a sense of continuity as a new medium found its footing.

This approach, to me, feels like a path forward that could minimize the crushing pain of a loss of historic content while taking away much of the risks that come with continuing to host a site that may no longer be popular in the modern day but still continues to have value in a long-tail sense.

In the case of an “analog nightlight” equivalent for websites, the goal would be to essentially to shut down any sort of attack surface through good design and planning. Before the site is taken offline in its original form, users are given the chance to download their old content or remove it from the website over a period of, say, 60 days. This is not too dissimilar to the warnings that site operators offer when they shut down currently—and looks like what Yahoo Answers is doing.

But once the deadline is hit, the site operators launch a minimal version of the original platform, with no way to log in or comment. The information is static, and there’s no directly accessible backend. That’s actually the important part of this—the site needs to be untethered from its original content-management system so no new content can be added. Instead, the content would be served up as a barebones static site (perhaps with advertising, if they roll that way), so as to minimize the “attack surface” left by a site that is not actively being maintained.

This reflects relatively recent best practice in the content-management space. Platforms like Netlify have gained popularity in recent years because they actively separate the form of distribution from the means of production, meaning that security risks are minimized. This is a great approach for live-production sites, but for sites that are intentionally meant to stay static, it removes one of the biggest risk factors that might discourage a content owner from continuing to maintain the work.

As far as liability concerns go, language could be included on the page to allow for users to remove old content if they so choose, along the lines of the “right to be forgotten” measure of the European Union’s General Data Protection Regulation (GDPR), though that measure includes a carve-out for purposes of historical research, which an archived version of a website would presumably cover. But the thing is, sites that are driven by user-generated content are generally protected by Section 230 in the United States anyway, so the onus for liability for the content itself falls onto the end user.

And if, even after these steps, a company still feels uncomfortable about hosting a dead website, they should reach out to librarians and archivists to donate the collection for maintenance purposes—perhaps with a corresponding donation to said nonprofit so they can cover the hosting costs. The Internet Archive actually offers a service like this!

The one site that makes me think that a model like this could work is Gawker. The news and gossip site, which was taken offline by the combination of a lawsuit and a corporate asset sale that specifically excluded it, remains online nearly five years after its closure in a mode very similar to this. Comments are closed and not visible to end users, which is a true shame as those comments often fed into the writing. But the content—the part that was truly valuable and important—is still out there, accessible and readable, even if you can’t do anything with it other than read it.

There are no ads. It’s a shrine to a platform that a lot of people cared about, even if others found it controversial. And there’s no reason what Gawker did couldn’t work in an equivalent way for Yahoo Answers.

Look, I’m going to be the first to fully admit that the motivations for protecting publicly accessible user-generated content simply remain only if the owner of that content feels “nice” about it.

And even then it feels like a bit of a surprise.

Space Jam Website

It’s still online, but it moved.

Over the weekend, Warner Bros. got a little bit of flak for replacing its long-online Space Jam website, which dated back a quarter-century in its original form, with site for the sequel. But I think what the company did was actually shockingly noble. They not only left the old site online, but they made it accessible from the new one. The work done to maintain this was not perfect—I think they should do archivists a solid by putting in 301 redirects on the old URLs of the vintage site, so they go to the new place—but the fact that they showed the initiative at all is incredibly impressive given what we’ve seen of corporate motivations when it comes to preservation.

Honestly, part of this was a result of people who were associated with the website’s creation still being at the company years later and being willing to speak up for preserving it—a 2015 Rolling Stone article explains that the site actually briefly was taken down after it went viral in 2010, only for employees involved in the creation of the site (now with leadership roles in the company) to swoop in and save it after some executive made the call to shut it down.

“If we had left the company, the site probably would not exist today,” said Andrew Stachler, one of the employees involved with saving the effort. “It would’ve gone down for good at that time.”

But imagine if they weren’t there. We’d be telling a different story right now.

And perhaps that’s what many companies need—someone who is willing to go to bat for the purposes of archival and protection of historic content.

In the digital age, preservation is the act of doing nothing but minimal upkeep and being comfortable with that fact. As proven time and time again, companies are more than comfortable with killing services entirely rather than leaving well enough alone.

Perhaps the way to save user-generated content is by making it as painless as possible to keep the status quo.

--

So yep, another rant from me on preserving internet history. Find this one an interesting read? Share it with a pal!

And thanks again to Refind for sponsoring.

Ernie Smith

Your time was just wasted by Ernie Smith

Ernie Smith is the editor of Tedium, and an active internet snarker. Between his many internet side projects, he finds time to hang out with his wife Cat, who's funnier than he is.

Find me on: Twitter