Pretty Darn Fascinating

The story of the PDF, the portable document format that’s become one of the internet’s defining information formats. It’ll be with us after we’re long gone.

By Ernie Smith

Today in Tedium: Every one of our file formats has a story. The GIF, for example, came to being thanks to a need to serve up images on pokey Compuserve connections with limited RAM. The MP3, meanwhile, was built around the contours of Suzanne Vega’s unaccompanied voice on “Tom’s Diner.” And the ZIP file came to life in a brutal legal battle that was egged on by the whims of BBS users. These stories have been discussed at length by others, but there’s a file format I see every day, one that, more than any other, has allowed our society to go (mostly) paperless. It’s the Portable Document Format, or PDF, a file format that was exactly what the business world needed at the time of its release. Today’s Tedium discusses the past, present, and future of the PDF. — Ernie @ Tedium

Today’s GIF features a PDF of an essay about “The Camelot Project,” as shown in Adobe Acrobat.

The Smithee Letter

The Smithee Letter is a sales letter meets David Lynch meets Cormac McCarthy meets Harold Pinter meets Sarah Ruhle. The winding, dark, strange character study is fictional but the products and brands are very real. So, do your part and save "Smithee" today by subscribing and clicking every email like their life depends on it, because it does. Save "Smithee" Now

“What industries badly need is a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks. These documents should be viewable on any display and should be printable on any modern printers. If this problem can be solved, then the fundamental way people work will change.”

— John E. Warnock, the cofounder of Adobe, discussing his thought process around the need for a simple document format in an essay revealing the existence of The Camelot Project (which is, of course, in PDF format). Warnock, who was also responsible for helping to develop Adobe’s bedrock PostScript document scripting language, noted that PostScript and its sister language Display PostScript was too heavy for most computers being made at the time he wrote his essay, around 1990. “The Display PostScript and PostScript solutions are the correct long-term solution as the power of machines increases over time, but this solution offers little help for the vast majority of today’s users with today’s machines,” he explained.

Form 1040

(Ken Teegardin/Flickr)

Why the “killer app” for the PDF may have been, of all things, tax forms

Around the time that Warnock and his colleagues at Adobe were trying to figure out the difficult problems of creating a simple file format that could be used to read documents on regular people’s computers, the Internal Revenue Service was dealing with an annual headache that it faced in working with the U.S. Postal Service.

Basically, every year just before tax season, the IRS would mail out tax forms to hundreds of millions of people around the United States. This annual mailing was, during non-Census years, the largest annual mailing that the postal service had to deal with—around 110 million individual mailings annually, according a 1991 New York Times article. And the IRS, dealing with a complicated tax code, had to manage and deal with a wide variety of exceptions and differing forms, for both businesses and individual taxpayers.

This was not only incredibly wasteful—never a good thing when you’re the Internal Revenue Service—but it represented something of a logistical nightmare, because it also hinted at the ways that paper gummed up the works throughout the federal government.

This was a situation where the PDF would have been of immense value. Certainly, software solutions had existed on the market at that time—among others, TurboTax on the PC and MacInTax on the Mac—but the average American user wasn’t necessarily at a point where they would trust their computer to do their taxes. But they might be cool with printing the forms.

Fortunately, Adobe was ready. At the end of 1992, the company first showed off its PDF technology, given the brand name Acrobat, at the trade show COMDEX. The trade press of the time wrote of Acrobat with much excitement, as it represented the ability to take a document as it would show up on a printed page—if it even needed to be printed at all. It was even named “Best of the Show” that year.

But Warnock admitted that, early on, his approach to solving the problem of aggressive paper didn’t catch on right away.

“When Acrobat was announced, the world didn’t get it. They didn’t understand how important sending documents around electronically was going to be,” Warnock said in a 2010 interview with Knowledge@Wharton.

But the fact of the matter was, Adobe had the perfect use case already out there in the form of the IRS, not to mention the rest of corporate America.

An Adobe Acrobat 1.0 promotional video.

Adobe had a potential solution to cut down on the mountains of paper being produced by offices the world over. And as Adobe had the de facto market standard already with PostScript, it also had the inside lane. You can see where this is going.

According to NetworkWorld, the IRS was already distributing tax forms in PDF format in early 1994, a move that helped build broad momentum behind the format.

But one element was missing, and that element was the web, which made the concept of accessing tax documents relatively easy. And by the 1996 tax season, that element was ready to go, as the Internal Revenue Service booted up its web servers—complete with more than 600 documents ready for download in PDF format, according to a 1996 column from tech guru Kim Komando.

A case study on Adobe’s website notes that the IRS went all-in on the PDF around this time, giving copies of its software to more than 100,000 employees as of 2001, and saving millions of dollars in printing costs in the process.

Beyond saving all the mailing of most of those forms, it helped the company save lots of headaches by making materials easier to find in audits. Instead of having to put stuff in obscure file cabinets, it could be accessed electronically by tax examiners and auditors.

“In terms of employee satisfaction alone, Acrobat pays for itself,” an IRS official told Adobe. “Add to that the benefits of easier document administration and less paper storage, and it’s clear that Acrobat and Adobe PDF provide real returns to the agency and the people we serve.”

Clearly there’s some fluff in that quote, but the IRS was very much a microcosm of the business world at large. The PDF, in a very short amount of time, became one of the most important ways business users shared documents. It simplified the hard work of going to Kinko’s, because the file format was able to easily embed assets like fonts and images, simplifying one of the hardest parts of getting a file printed. (Of course, you generally couldn’t make changes in PDF form.) Eventually, the PDF became searchable and even editable.

And most importantly, in the case of the IRS, “fillable.” The IRS quickly created versions of its tax forms that allowed end users to put in their own numbers, and, eventually, even their own signatures.

While none of this was as lightweight as, say, a text file nor as flexible as HTML, it sure beat PostScript for the average person.

And the PDF became the long-term solution.

2007

The year that Adobe first announced its plan to made the full PDF specification available as an official, open standard, rather than as a de facto standard, as it had been up to that point. (The company still builds its own proprietary extensions to the format with new versions of Acrobat.) Adobe had made the free to distribute going back to 1993, making money on the sale of tools to make the PDFs. The PDF became officially standardized through the International Standards Organization (ISO) in 2008; here’s the technical document in case you need something long and boring to read.

Document archive

(myfra/Pixabay)

Perhaps the most important role of the PDF in the modern day is archival

Let’s just admit something straight out: Standardization is boring.

It’s a dull topic, but it’s something that is incredibly important in the world of archival. The reason for this is obvious, of course: If you randomly change the way you produce and store microfilm, for example, that microfilm becomes a pain to reuse.

But this also cuts both ways. There are things that you don’t necessarily want out of a standard. Let’s say you don’t care about interactivity because you’re trying to digitize documents that date back hundreds of years.

Still, there may be niceties you want, like the ability to make the text searchable. And perhaps you want to ensure maximum compatibility, working with all variants of a tool.

All these reasons, and more, are why the PDF/A format was created in 2005. Unlike a standard PDF, which is designed to take advantage of the fact it’s made for a computer, PDF/A was designed to be maximally reproducible, to the point where it could replace a printed document if the original paper was lost.

“Everything that is required to render the document the exact same way, every time, is contained in the PDF/A file: fonts, colour profiles, images etc. PDF/A is also an ISO standard, guaranteeing that future software generations will know how to open and render PDF/A files,” explains Shawna McAlearney, a marketing specialist for Appligent Document Solutions, in an FAQ on the PDF Association website.

This is good for organizations such as the Internet Archive and the Library of Congress, who are saving information for the long haul and need it to be readable 30 years from now. But it does lead to some controversy at times in the archival space, such as when the format was extended in 2012 to allow for the embedding of files like spreadsheets and HTML documents.

But some critics of the quick uptake of the PDF/A are out there. In a paper on the subject, Marco Klindt of the Zuse Institute Berlin lays out a variety of issues with the format from an archival perspective, including (among other things) that it can be cumbersome to use.

(Notably, usability expert Jakob Nielsen has also strongly come out against the use of PDFs for the same reason, stating on his consultancy’s website: “PDF is good for printing, but that’s it. Don’t use it for online presentation.”)

Klindt, who also lays out legal and integrity issues with the format, suggests that the desire for a suitable preservation format limited discussion of whether or not the format really made sense in the long run.

“Familiarity of PDF led to fast and widespread adoption of PDF/A as a solution in the field of digital archiving,” he writes. “This fact may have muted prophetic voices demanding the quest for and development of more suitable content containers for research work (text and data) with reuse in mind.”

Even if this is the case—certainly I’ve loaded my share of 300-megabyte PDFs over the years, and there are plenty of documents online that have no business being PDFs—it’s certainly worth admiring how much the format has done to digitize and protect our collective knowledge.

In 50 years, these PDFs, even with their weaknesses, will help us document history with little of the ephemeral nature of the web. And unlike in paper form, those PDFs won’t suffer from frayed pages.

The history of our generation will probably be in PDF form.

“[Adobe’s] board wanted to kill it. I said, ‘There’s just no way. This is solving an important problem, and we are going to hang in there until it works.’”

— Warnock, speaking to Knowledge@Wharton about Acrobat’s early years. These days we take for granted the fact that PDFs are common basically everywhere online, but there was a point when the PDF format was in such dire shape that Adobe had to stop charging for Acrobat Reader, a move Warnock described as a “very risky choice.” (They charge lots of money for Acrobat instead.) But the decision to stick with the client and make it free ultimately proved the key to Adobe’s success as a company. Even though people might be quicker to think of Photoshop when they think of Adobe, a 2013 profile of the Adobe cofounder by his alma mater, the University of Utah, ultimately put the company’s success at the feet of the document format Warnock created. “The PDF put Adobe on the map,” author Jason Matthew Smith wrote.

The PDF format has evolved and changed over time, but when it comes down to it, it works generally as it did 25 years ago, when it was first released for public consumption.

But the way it works still evades most people. One of those people is a guy named Paul Manafort. You might know him as Donald Trump’s former campaign manager.

We don’t delve into politics much on Tedium, but it’s not often that politics creates a perfect example of how people struggle with PDF files.

Justice Department Special Counsel Robert Mueller’s 2018 indictment of Manafort noted how the lobbyist and his colleague, Richard Gates, collaborated on modifying a PDF document by converting the document into Word format, changing an amount in the document, then changing it back to a PDF.

This created something called a paper trail, bolstering Mueller’s case against Manafort.

But the funny part is this: According to the PDF Association, none of this was actually necessary. Beyond the fact that the conversion from Word to back creates subtle changes in format that can be tracked, software like Adobe Acrobat can be used to directly edit the text in a file!

Here’s the association’s take:

Manafort could have readily altered the PDF himself. Had he done so, he would have avoided a key part of the paper trail that may land him in federal prison. He probably even had a PDF editor already on his computer.

In the money-laundering business, after all, it seems likely that one would frequently need to assemble pages from multiple PDF files; you need a PDF editor for that. For most of his money-laundering career, Manafort was almost certainly just one or two clicks away from the editor mode.

The result is that PDF editing is likely to play a significant role in a major political scandal.

Love it or hate it, that’s how prevalent this document format is.

--

Find this one an interesting read? Share it with a pal! And RIP to John Warnock, a document icon.

And for those of you looking for something interesting in their inbox, The Smithee Letter is a good bet.

Ernie Smith

Your time was just wasted by Ernie Smith

Ernie Smith is the editor of Tedium, and an active internet snarker. Between his many internet side projects, he finds time to hang out with his wife Cat, who's funnier than he is.

Find me on: Website Twitter