Today in Tedium: The internet’s many messy algorithms—the things that tell a server somewhere how to handle a given issue—have never been perfect. To the contrary, they break often, partly because humans remain far more capable to find the edge cases of a software platform than software developers are of anticipating them. Nowhere is this better exemplified than with the nature of swearing, a sport that bots have always been more prudish about than the public as a whole. Today’s Tedium talks about algorithms, swearing, and the burden that Scunthorpe bears. — Ernie @ Tedium
There’s nothing rotten in Scunthorpe. But email clients and apps aren’t convinced.
The town of Scunthorpe, England, population 81,559, has a tough reputation to shake.
And it’s not just its reputation as unromantic that’s the problem. In 2013, Hotels.com revealed that the “industrial garden town,” as it’s called on its city signs, was voted the least romantic city in the U.K.
The fact that it’s home to a giant steel processing center doesn’t help, but odds are that the true problem is what’s hiding in plain sight—in its name. Clearly featuring one of the harshest profanities in the English language, one generally associated with the hands-on directing style of David O. Russell, it has long created headaches for the city. Those headaches that have grown since the dawn of the internet era, when algorithms made that somewhat cheeky problem into something of an existential threat.
Since the internet first went mainstream in the late ’90s, Scunthorpe has faced a major indignity from the algorithmic twists and turns of the connected internet. Large companies have long struggled with words that have profanities baked in, with Scunthope giving AOL in particular some big headaches. After the company entered the U.K. market in 1996, Scunthope residents interested in getting online quickly found that they literally couldn’t, because the system was programmed not to accept it.
One user, confused as to why he couldn’t get onto AOL, was told by customer support that, for AOL sign-in purposes, the name of his town was now Sconthope.
When the local Scunthorpe Evening Telegraph questioned this approach to an AOL spokeswoman, she admitted that they were working on a fix.
“We have renamed the town in order not to cause offense,” the spokeswoman said, according to a reproduction of the article published online. “But our technicians between the U.K. and America are now working to remove the block on the name.”
And this, friends, was how the Scunthorpe Problem, the tendency for algorithms to not make room for edge cases with curse words, was discovered.
AOL wasn’t alone. Nor was Scunthorpe the only entity that caused profanity filters a problem. Metropolitan areas that include “sex” in them, like Middlesex County, New Jersey, are frequent targets of poorly programmed algorithms. And AOL struggled with names as well. AOL community leader Douglas Kuntz couldn’t use his own name on the service, for example.
At least his last name wasn’t Callahan. He might have been labeled a terrorist by accident.
“We try to find the best trade-off of precision, recall and safety. People who opt in to SafeSearch are mostly OK with us being on the conservative side.”
— Matt Cutts, the longtime search-engine guru for Google, explaining to CNET in 2004 why the search engine was aggressively filtering its SafeSearch results in ways that negatively affected completely innocuous sites. Cutts, who is currently on leave from Google in an apparent effort to fix the Pentagon, was forced by the publication to defend the blocking of the Sussex Archaeological Society and other Sussex-related businesses, because (of course!) they had “sex” in their titles. “In most cases it’s a pretty unambiguous usage,” he said.
Five examples of algorithms screwing with the English language online
- *“In Saturday’s opening heat, Homosexual pulled way up, way too soon, and nearly was caught by the field, before accelerating again and lunging in for fourth place.“* — A 2008 Associated Press story published on the American Family Association’s OneNewsNow website, whose meaning was changed significantly when the site’s filters to change “gay” to “homosexual” ran directly into Olympic runner Tyson Gay.
- Quickly, people figured out that working around profanity online was very easily, using techniques such as the aggressive use of symbols to write out naughty words in a way undetectable by filters, say if I wanted to throw a
/X\0743|^ph|_|CK3|^
out there without offending any dainty sensibilities. Turns out that this mindset led to l33tspeak, the way that we desecrated the English language during the IRC days, before emoji. - A certain erectile dysfunction drug—one I won’t name here out of an interest of self-preservation—has a tendency to drive spambots crazy. I will say this much: Socialism is but one word that causes headaches for this neocon blogger.
- In 2010, a Canadian magazine admitted that found all the competition too much around its product online. So the magazine changed its name to Canada’s History. What was the name before the big change? Uh, The Beaver.
- And of course, the best wounds are always self-inflicted. In 2004, the Horniman Museum in London had email issues that apparently came about after the museum’s own spam filter decided that its name stood for “horny man”—and started blocking emails.
“It seems like a small thing, but it’s an important detail in the design. Nearly every other computer system refuses to recognize curse words, and, in effect, condescend to their users. Apple, by recognizing the reality of English discourse, exhibits respect for their customers.”
— Peter Merholz, an early blogger who actually invented the term “blog” in 1999, discussing his pleasure with the autocorrect on the first version of the iPhone not blocking his swears. The purity wasn’t to last: In version 2.2 of iOS, however, Apple added a swearing filter, likely because the device began to gain a mainstream audience. But someone quickly figured out that if you add profanities as contacts, you can write whatever you ducking want.
How far have we gotten toward solving the Scunthorpe Problem?
I’m sorry to say, as you might guess, that the problem is still with us—sometimes involving Scunthorpe specifically.
Back in April, a band that was promoting itself via paid Facebook posts found something unusual—it could post about its Scunthorpe show, but it couldn’t promote it.
The interesting thing about the saga is that it gave one company the opportunity to show off just how advanced their filters had become. Inversoft, a company whose most well-known product is an elaborate profanity-filtering platform, used Facebook’s saga to show off how it can tell the difference between a word that merely features a profanity and one that exists mostly for the profanity’s sake.
The company’s Kelly Strain explained that Facebook was using a very broad algorithm on its advertising platform, making it so that every detected use of a profanity would get through the abyss.
A narrower algorithm (you know, like Inversoft’s tool) would be able to tell the difference, contextually, between something that exists to get a rise out of you and a more unintentional use case. (If you’ve ever messed with regular expressions, you know how challenging these edge cases are.)
“Facebook simply needs an intelligent profanity filter that can properly identify embedded entries so that the people of Scunthorpe and those around the world using the word in a genuine fashion can freely discuss this town,” Strain wrote.
Inversoft, which has also invested a lot of time into detecting l33tspeak, is not the only game in town on the profanity-fighting front.
The plugin BanBuilder is also well-suited to the task, though the developer of the software seems to go out of the way to apologize, first for creating an app that’s meant to censor, and then for creating an app that will never be perfect.
“It perhaps goes without saying that someone who is really determined will find a way to post something awful, regardless of what profanity filter you use. You should know that walking in,” writes Alison “snipe” Gianotto, after going through a number of examples of how people can be perfectly profane without swearing.
(One example involves a giraffe and a bunny … annnnnnd I’ll leave it at that.)
“This is a constant problem with euphemisms. Using them can be like trying to conceal the naked body of an actress beneath a gossamer gown. Euphemizing presents a forlorn hope that renaming something might change its essence. Negative connotations are not in taboo words themselves, but in what they refer to. As a result, euphemisms can only protect our sensibilities for so long.
— Author Ralph Keyes, writing in his book Euphemania: Our Love Affair with Euphemisms about the weaknesses with relying on euphemisms to explain something. In some ways, talking l33t on an IRC channel is a form of euphemism. But it extends a little bit over time—people are going to find new ways to be profane, to say things that pussyfoot around the problem. (Pussyfoot—now there’s a word that some algorithm somewhere has banned for no good reason.)
Is it a folly to fight off the rise of profanity online? Should we just say “fuck it,” and embrace our future where we have no verbal limits?
The truth is, no matter how many filters we put onto individual words or how advanced our technology, the end result either leads to false positives or people working around the blocks entirely. And that’s even with the innovations we’ve seen in the last two decades.
It’s worth keeping in mind that our euphemisms continually adapting. In IRC circa 1997, we had eight equals D. In 2016, we have the suggestive eggplant.
Tom Scott, the excellent YouTuber and overall hilarious Brit, covered this issue last month, doing an on-location clip in the unfortunately named British city of Penistone to help fill the clip with visual puns, while making an astute point about the overall flimsiness of word filters: They’re the linguistic equivalent of the Streisand Effect.
“Just ask any classroom in any school. It only takes one person to know how to evade it, and suddenly, every one of their friends does too,” Scott noted.
It’s OK to call a lost cause a lost cause.