Google Books: The Failings of Snippet View

You may or may not have noticed this, but Tedium relies pretty heavily on archival content to exist. This newsletter and publication is not just about offering up weird facts: It's often about introducing facts that have otherwise been forgotten about by the public at large.

Sometimes, this takes me in some interesting directions. An example I regularly cite is my piece on lemon juice squeeze bottles. It was the rare piece I had the chance to mostly write ahead of time, and while there were some incredibly useful starting points on the issue from the Beach Packaging Design, I find it important that, even if an existing resource like this offers a starting point, I spend time doing some digging of my own. The odds are high that something got missed—nothing against earlier diggers, just the nature of the game.

And what I found was a familiar frustration. There's a certain era of history—after about 1930, before about 1970—in which information about obscure topics is hard to come by through digital resources. The reason for this is not because periodicals didn't exist at this point. It's that these publications, which may or may not exist today, are copyrighted—by someone.

This means that the publications, unless they've expressly given their permission, they cannot be pulled from Google Books, despite the fact that it's clear something useful is there—relevant sentences from the platform's "snippet view" get you on the right track, but the information is otherwise utterly useless for archival purposes. The book might be hidden in a library thousands of miles away, impossible to acquire in any way except in the physical form.

I ran into another example of this last week on my 911 history story. While most of the story could be told from old New York Times articles, along with from other newspapers during the era, one frustration I spotted was that of Fire Engineering magazine, which had written about emergency phone numbers as early as 1937. However, the publication only has archives on the web going back to 1990.

Before that, however, the only place you're going to find them is in Google Books. And Google Books is completely, utterly unsatisfying on this front, because the information is locked behind the world's most depressing peephole. There's a sentence that speaks to the article I'm writing, but can I tell you whether it's relevant? Probably not!

There are other resources, of course. I use fees from Patreon to pay for a subscription to the Ancestry.com-owned Newspapers.com, which has proven an immensely valuable resource. But it does not cover trade publications, which often have a level of specificity on narrow topics that newspaper articles don't. And the New York Times is an archivist's dream. I wish every publication put 10 percent of the effort into their archives that the Times has into theirs.

On the other hand, the Internet Archive, which I frequently talk up as being a useful and important resource, has many trade publications of this nature, specifically in the technology sector. (The search is a little awkward, but they're working on it.) But ultimately, neither of these resources have the resources of Google—through no fault of their own, of course. Google is Google: They're expected to face a fine of more than a billion dollars from the European Union this week, but they're so big that this is pocket change.

And the reason for this state of affairs is because of a combination of two frustrating things highlighted by Wired back in April: One, Google got sued, though they eventually won; and two, as a result of the legal battle, they lost their energy for this incredibly hard, incredibly important work. If Google really cared about this resource, it would find a way to make it financially sustainable—and maybe even figure out a way to make snippet view less of an issue for publications with dormant copyrights.

It'd be great if Google donated their book-search efforts to a research library, or perhaps allowed the Internet Archive to carry the torch. This stuff is still important.

Here's the way that I like to think of all this stuff: Alone, located in a physical book, this information is somewhat limited in value. In the aggregate, that value is immense. Google got halfway to unlocking its value. Someone—maybe Google, maybe not—has to finish the job.

Down With Snippet View

The limitations of Google Books can be seen in the way that it handles obscure trade publications. Everyone who made that old magazine? Probably dead.

Subscribe to Tedium

Get more bizarre takes in your inbox.