Plants - Drugs Mind - Spirit Freedom - Law Arts - Culture Library  
Erowid - Honest Global Drug Information
Meme Cultivation:
Silent Censorship?
The risks and benefits of digital content filtering
by Fire Erowid and Earth Erowid
Jun 2005
Citation:   Erowid F, Erowid E. "Silent Censorship? The risks and benefits of digital content filtering" Erowid Extracts. Jun 2005;8:16-18.
There is nothing inherently wrong with content filtering. At its simplest and most elegant, it is the process of taking an unmanageable dataset and pulling out just what is needed or wanted. As a general concept, content filtering includes everything from systems that allow someone to find a recipe for chicken breasts without sorting through porn sites, to the Chinese government's banning of certain topics from search results.

The new reality of omnipresent access to the collective knowledge banks presents broad and novel challenges to the practice of content filtering because old models are no longer functional. Before the internet boom, libraries and schools used humans to filter books, periodicals, and resources that were then provided by the institution. Those doing the filtering were most often local community members who worked in the library or school and answered to those they served (or their parents). The staggering volume of information now available makes this type of hand filtering impossible.

There is a fundamental shift underway towards accessing virtually all information and media through a computer or another electronic device. This shift has created both the opportunity and the need for pervasive machine filtering. Machine filtering has already had a huge impact on what people write and think, but, in the future, human intellectual endeavor will be shaped far more intensely by how these filters work.

"Any content-based regulation of the Internet, no matter how benign the purpose, could burn the global village to roast the pig."
-- U.S. District Judge S. Dalzell in a 1996 CDA opinion
To most people, the filtering inherent in current search technologies (such as Google or PubMed) is already hidden and inscrutable. Future filtering technologies have the possibility of being built even more deeply and undetectably into the fabric of the digital information space. In the past, censorship required physical acts of removal and ceremonial burning of books. In the future, censorship may be completely silent and invisible.

Filtering as Censorship... #
Most people support the rights of individuals to choose what they and their young children view. If you don't want to watch sexually explicit content at home, you certainly shouldn't have to. If you don't think your eight year old son is ready for GHB-overdose stories, or you don't want to see certain advertisements, you should be able to make those choices. Using filtering software to identify and block the content you don't want can certainly be a reasonable choice.

But as information technologies develop, the line between content filtering and post-Orwellian censorship grows increasingly blurry. There are a multitude of methods that can be used and numerous points in the publication process where content can be filtered. How and at what stage content gets filtered conveys very different messages about choice versus control.

The same content can be filtered many ways: by an individual choosing not to buy a book; a bookstore choosing not to sell that book; a library being forbidden to carry the book; a library being forbidden to record the existence of the book in its database; an author or publisher being forbidden to produce a book; or an individual being forbidden to own the book. At the extreme end of that spectrum, a government could ban the distribution of all information on a given topic.

We draw the line between filtering and censorship based on two issues, consent and intent. Filtering becomes censorship when a) the choice of what is filtered is no longer controlled by those seeking the information, and b) data is blocked because the filtering entity doesn't want the information to be viewed.

U.S. Federal Legislation #
The traditional concept of censorship involves a government (or religious organization acting as government) ba nning or using criminal law to restrict access to information. Over the last ten years, the U.S. government has tried to enact several pieces of legislation that would censor online information. Most of these have been designed to restrict the materials that children are allowed to view.

In 1996, the Communications Decency Act (CDA) was passed, criminalizing the transmission of "indecent" materials to minors. In 1997, it was ruled unconstitutional by a unanimous Supreme Court (9-0). The Court found that the Act's overly broad language violated the First Amendment guarantee of free speech.1

In 1998, U.S. Congress passed the Child Online Protection Act (COPA), which would have made it the responsibility of web content providers to ensure that children did not have access to content "harmful to minors". This would have forced content providers to require proof of age (such as a credit card or driver's license) from those accessing adult material. In June 2004, after a long series of court battles, the Supreme Court upheld a preliminary injunction against COPA, finding that COPA had a significant effect on free speech and was not the least restrictive means available of achieving the original goal.2 The Court suggested that end-user filtering software is both less restrictive and possibly more (or at least equally) effective in protecting children from online indecency.

The Children's Internet Protection Act of 2000 (CIPA) required schools and libraries to implement a policy and "appropriate technology" to restrict children's access to graphical depictions of "obscene" internet content.3 Notably, this Act covers only images, not text. Libraries or schools that do not comply risk losing federal funding used to pay for computers and internet access. CIPA was found constitutional by the U.S. Supreme Court in June 2003 and remains in effect.4

Community Standards #
Although the United States has one of the world's strongest judicial traditions protecting speech and publication, the courts have allowed the banning of works that violate "community standards" of decency and have no redeeming artistic or literary value.

On the internet, however, the concept of "community standards" dissolves, as every piece of media is available everywhere. City, state, and national boundaries become meaningless and the concept provides little practical guidance. As Supreme Court Justice John Paul Stevens wrote:

"In the context of the Internet, however, community standards become a sword, rather than a shield. If a prurient appeal is offensive in a puritan village, it may be a crime to post it on the World Wide Web."5

Three Filter Types #
Problems with nation-wide censorship laws led the Supreme Court to suggest content filtering as a better solution for upholding local standards. The massive expansion of the internet into nearly every home and school room in the United States combined with the exponential growth of online data has created a demand for filtering. Three of the primary types of internet filtering software are content filters, traffic filters, and search engines.

Content Filters #
Many web content filters-—products such as Cybersitter, Cyber Patrol Net Nanny, and Safe Surf—are targeted at parents who want to limit their children's access to "inappropriate materials". These products generally work by categorizing websites into a series of potentially objectionable groupings—such as "sexually explicit", "drugs", or "violence"--and blocking access to any site that belongs to a banned category.

Some programs allow "stealth filtering". They craft a response page for blocked sites which looks like an ordinary unrelated error message, or they simply redirect the request to an unblocked site. This feature is designed to allow parents to block sites while their (young and/or presumably stupid) children remain clueless. It is marketed to parents as a means of controlling what their children see without letting them know that control has been exerted.

Traffic Filters #
Traffic filters are designed for employers who wish to keep employees from "wasting time" or "lowering productivity" while at work. Products in this group include WebSense and Vericept, which are usually installed on firewall or gateway machines. These filters include bandwidth management and adult content categories such as "financial sites", "online shopping", and "news and media". Rather than simply blocking all access to non-business sites, employers can set time limits for access to non-work-related sites: employees might be allowed a "quota" of 90 minutes per day of personal surfing, for example. Similarly, employers can set bandwidth thresholds and filters that reduce access to certain types of web activity (media download sites, instant messaging, etc.) when company bandwidth resources are heavily used.

The framing of the issue by the filter providers is that the "non-business" use of the internet is damaging business. WebSense makes the claim that "Internet misuse at work is costing American corporations more than $85 billion annually in lost productivity."6 Yet some of these products provide companies the ability to block their employees' access to websites about labor organizations, a content-based filtering choice much closer to censorship.

As "reading" increasingly comes to mean "reading online", filtering based on content becomes the equivalent of searching people's bags as they come into the workplace and removing books, magazines, or flyers that are not to the liking of management.

Search Engines #
Search engines are generally designed to filter according to the end user's needs. But because of their role as central hubs on the internet, they deserve some attention as possible sources of filtering and censorship. Each of the major search engines uses a different set of rules for how sites are included. Yahoo and other similar indexes were originally hand coded. Editorial decisions about what and who to link to were based on quality of content and on who could afford to pay their multi-thousand-dollar listing prices.

...several search engine companies, including Google, have agreed to provide services to China that have content-based filtering built in.

Google stands out as a company that did not begin with human-coded listings. It created an algorithm based on counting and weighting the network of links between sites. But, as Google became popular, problems with their "PageRank" algorithm highlighted some of the less obvious effects that specific indexing techniques can have on what people read. Weblog software that included automated linking between any sites using the same (or similar) software caused PageRank to overweight weblog articles. The cross linking increased each site's PageRank, recursively, and came to be known as "Google Bombing". Individual stories or pages that receive thousands of links in a short period of time could force a target page to the top of the search results for a given word or phrase.

Some companies exploited this weakness and forced their pages towards the top of Google's results. Google punished those companies by removing all of their sites from Google search results. This illustrates an important aspect of search engine filtering: to the normal user Google appears to provide a transparent list of everything that has been published on the web, but behind the scenes, the corporate and editorial decisions of these companies directly shape search results.

Several search index companies, including Google, have also agreed to provide services to China that have content-based filtering built in. Search results will not include sites that are disapproved of by the Chinese government.7 The traditions of free speech in Europe and North America will make it difficult for governments to force search engines to exclude material, but if top search providers chose to exclude sites with information or links about a narrow topic, that information could effectively disappear from public view.

How Much of Erowid Is Blocked? #
We have known for years that many web filters block Erowid, generally categorizing it as a "drug, alcohol, or tobacco related" site. Among the top content filters, Erowid is blocked at varying levels.

CyberPatrol's "drugs" category selectively and somewhat randomly blocks pages and sections of Erowid. The entire Chemicals and Psychoactives directories and all pages beneath them are blocked, while the Herbs and "Smart Drug" directories are available. Only portions of the Plants directory are blocked: access is not allowed to any page about cannabis, mushrooms or Salvia divinorum, and a few pages about poppies or morning glory seeds. All dosage pages and a few pharmaceuticals are blocked. The Freedom, Spirit, and Culture sections remain fully accessible.

Cybersitter simply blocks any page under the domain name. As far as we can tell, the major server-based traffic filters are "Erowid aware" and will usually block access to Erowid when content-based filters are turned on. However, many companies choose not to use content-based filtering in the workplace.

Problems with Filters #
A couple of years ago, friends of Erowid mentioned that the domain was being blocked at several German universities that used the free filtering software SquidGuard. We looked at SquidGuard's list of blocked sites and noticed a pattern. Not only was Erowid blocked, but a suspicious number of sites that we linked to were also blocked. It seemed as if just about anything linked to from Erowid was included in their blocked "drugs" category. A little investigation showed how much confidence the SquidGuard maintainers have in their categorization process. A message at the top of the list of sites they block states: "This list is entirely a product of a dumb robot." SquidGuard appears to create its drug-related sites list simply by compiling links from 45 sources, one of which is Erowid. SquidGuard's lists are often used without customization by universities in Europe, the United States, and around the world.

Erowid contains thousands of external links, including links to government and school websites, encyclopedias, major media outlets, sites about meditation, and even Mountain Dew. Many of these sites were blocked by SquidGuard. To test their system, we added a set of very small links at the bottom of an Erowid page, linking to several mainstream German websites, European drug control organizations, U.S. anti-drug campaigns, and a couple of entirely unrelated sites. After about a month, we checked back on SquidGuard's drug-category and, sure enough, the arbitrary links we added were now blocked as "drug content". Amusingly, SquidGuard also blocks a variety of books on Amazon that we link to.

It is far harder to look for these kinds of patterns in the commercial filtering software because they keep their blacklists and categories secret.

Filter Quality and the Mirrored Bubble #
Choosing how to select and categorize sites for blocking is a huge undertaking. There is no established method for doing it well. At the sloppier end, blocked categories and lists are haphazardly put together, creating somewhat random and non-sensical lists of blocked sites.

At the more sophisticated end, millions of sites are associated with one or more categories to allow for more selective blocking of specific types of data. The upsides are obvious: the more specific and comprehensive the categorization, the more accurately you can block what you intend to block and not block what you want to see. The potential problems of comprehensive filtering, however, are even worse than with hamfisted "systems" like SquidGuard. Although no truly sophisticated filters yet exist, they will have the downside of exacerbating the Mirrored Bubble Syndrome. Individuals will have the ability to precisely and preemptively eliminate exposure to media and publications, whether for themselves, their children or employees, or everyone in a school or library.

This has the potential to make the world appear more homogenous than it really is. Instead of the web providing exposure to more and more voices and viewpoints, sophisticated filtering could leave large parts of the population exposed only to media in line with their pre-existing biases and tendencies. In one example, WebSense already includes categories to block dissent, singling out "sites that promote change or reform in public policy, public opinion, social practice, economic activities and relationships".

Information About Psychoactives #
Unsurprisingly, commercial filtering products tend to heavily favor prohibitionist and anti-drug government sites. Although the alleged premise under which sites are blocked for "drug content" is to protect children from sites "promoting illegal drug use", the practical effect is to block factual information or anything that contradicts zealous, "single-message", politically-driven prohibitionist sources.

This editorial bent comes as no surprise; even peer reviewed medical journals uncritically publish articles describing Erowid as "partisan" and "pro-drug" while describing anti-drug government sources as "neutral".

The history of CDA, COPA, and CIPA in the United States make it clear that unconstitutional censorship legislation can pass at the highest levels. Although its censorship section was removed before passage, the first version of the Methamphetamine Anti-Proliferation Act of 2000 included provisions that would have made it illegal (punishable by up to ten years in prison) to publish or provide information relating to the "manufacture of a controlled substance". It also would have criminalized linking to sites that sold "drug paraphernalia".

A Sisyphean Task #
Massive countervailing forces do exist. As John Gilmore famously quipped, "The internet sees censorship as damage and routes around it." While the more conservative elements of society don't like to admit it, creative, artistic, curious, and highly intelligent people drive a lot of technological development. If existing systems become stifled by censorship, new systems will inevitably pop up to replace them.

References #
  1. Opinion of the Court. In Reno v. ACLU. 521 U.S. 844 (1997).
  2. Opinion of the Court. In Ashcroft v. ACLU. 535 U.S. 564 (2002).
  3. American Library Association. "CPPA, COPA, CIPA: Which Is Which".
  4. Opinion of the Court. In U.S. v. American Library Association. 535 U.S. 564 (2002).
  5. Stevens J. Dissent. In Ashcroft v. ACLU. 539 U.S. 194 (2003).
  6. Websense. "Internet Use Statistics". 2003.
  7. Lyman J. "Google's China Filtering Draws Fire". TechNewsWorld. Dec 1, 2004.