Littering the Information Superhighway

Information Pollution Begins at Home


by Claudia Krenz, Ph.D. (datafriend @ gmail-.-com)

There is almost continual chatter--and some talk--about the Information Superhighway. To date, no one has addressed the inevitability of road side litter. As denizens in an increasingly data- or knowledge- driven society, we need to discriminate between trash and what's substantive, between the illusion of knowledge and knowledge itself. trash

   Footnotes    Bibliography

 

Litter comes in many forms and from many sources. There are those pop-up advertising windows (invariably consisting of a bloated graphic, in colors like rotting-caterpillar-on-a-molding-cabbage-leaf green).1 I call them litter and welcome them as much as projectile vomiting; advertising agencies like Doubleclick call them "content delivery devices." IMHO, it is the ad agencies who, having perverted dialog into targeted advertising, litter our virtual landscape with personal information about us all. We humans might not be worth much money as bags of nutrients but data about us are a hot and salable commodity, passed by cookies2 from one Doubleclick customer to another.

Cookies are important to ad agencies. As noted by one, "The key to tracking ... online behavior is the 'cookie,' a unique, anonymous identifier .... 'We probably have one of the richest databases .... of anonymous, individual customer behavior on the Internet .... once we knew which cookies represented the highest value customer, we could see where else those cookies spend time on the Internet.'"

Yet, hatred for cookies is expressed by poets commenting on digital langu(im)age and in the web postings of artists and writers. Cookies are intrusive, an invasion of privacy (Junkbusters, Privacy Site, Privacy International, Center for Democracy and Technology's Privacy Guide, Electronic Privacy Information Center, Dataveillance and Privacy, and Cookie Central).

Many dismiss fear of cookies: A Microsoft™ spokesperson noted that "most people don't quite get how computers work, and they're suspicious ... That's probably why a lot of these privacy concerns are happening." And Doubleclick itself might contend that any so-called concerns stem from its competitors' jealousy (e.g., one of its rivals has also amassed a huge database of consumer information). Certainly, dot.coms everywhere are doing their bit to level the playing field by waging war with each other for their piece of the action, and the pace of B-2-B dumpster diving is increasing ... It's all rather drearily reminiscent of 1919, when the "Horse Activity Association of America tried to lobby government to point out flaws in mechanized farming" (Wasserman, 00).

But crap is crap, no matter what you call it. Saying otherwise doesn't make it so, any more than Indiana's legislating pi to be 3.0 made it so. Indeed, there are a number of similarities between ad agencies and corporate hog farms: There is the myth that the industry is self-regulating, meeting stringent standards--and the biggest one of all, that it as good for the community as is hog excrement for the environment.

Scariest of all though is that the goal of targeted advertising, if realized, will result in each of us getting our own individualized news, tailored to our interests, attitudes, bank accounts ... we'll end up living like hogs on a huge factory farm, lined up row after row in our individualized cells, silent, consuming, no longer living.

One connotation of litter is uselessness: diamonds don't litter a public beach; discarded fast-food wrappers do .... Certainly, online opinion polls are meaningless. And the script on which data about us is passed from our HDDs to somewhere and from there to somewhere else and back again is not without bugs: it's dumb to assume otherwise.

Most of all though, research based on cookies is flawed, because manipulating or deleting4 cookie files is easy .... If only one person (me) deletes her cookie and wafer files, BFD; if two, three, four, a hundred, a thousand, a million--no one knows how many--do so, then cookie-based marketing research has a major problem: new visitors are indistinguishable from repeat visitors who always delete their cookies and those who sometimes delete them.

Yet, included in the advertising industry's new accountability standards is counting the number new visitors.

It blows my mind that those running the data-warehousing circus haven't noticed the large, enormous even, terabytes of crap they're collecting. They just don't get it (or, perhaps, like Enron and Arthur Andersen, they do, but they're pleading the fifth).

Then there's what's done with those data collected about us: market data are typically analyzed with techniques collectively called data mining, which statisticians point out are inherently flawed, coupled with too many incentives for seeing patterns and too few for detecting error. Me, I'll take plain old vanilla statistics anytime. To further underscore the farce, the HR-types doing the hiring have a different idea of the skills needed for data analysis than do those for whom they're doing the hiring. We see a market governed by applying questionable techniques to error-besotted data:

I conjure up titles like "Abott and Costello Collect Doubleclick Data" and "The 3 Stooges Analyze It." A scene from a Philip K. Dick novel comes to mind:

His head was a skull that took in greens and bit them; inside him the greens became rotten things....a dead sack teeming with gubbish....The outside that fooled almost everyone, it was painted pretty and smelled good....the dead bug words popped from his mouth... (Martian Time-Slip, NY: Ballantine, 1964, pp. 133-134).

Gubble, gubble. Think of it, those data whose side effect is your irritation about the invasion of your privacy, your HDD, the loss of your time as those pop-up windows load and cookies are swapped--multiplied over all of us--are meaningless. Gubble, gubble.

It certainly hasn't taken long to trash the Information Superhighway. It can be saved--but will it be? Before we can even begin to answer the question, we must look at the concomitant phenomena of road dirt and road kill. One could regard the demise of many dot.coms as road kill, but I regard it as sanitation. What I care about is how much longer our voices can be heard above the din, the crap, the litter.

A systematic literature review of the different cookie studies published in the different advertising journals in the different years of the last decade would be a hoot: LOL, I wonder how many will be based on no data at all ... Information pollution flows downward like the drainage basin of a river: it self organizes to become a torrent of misinformation, disinformation--trivial, irrelevant, and mundane error. Several recent ad industry studies--their statistical results are quoted everywhere but data and citations are nowhere to be found--have discovered that more than half of surveyed computer users *delete* cookies. Naturally, based on their new self-awareness that their data warehouses--like most corporate hog farms--are full of crap, the ad industry proposes seemingly new novel remedies: More and more cookies include IP numbers as part of the value for the cookie ID# variable; some even 'promise' to restore deleted user cookies. Ad agencies will though doubtless offer the same guarantee as before.

 

Footnotes

1 I don't mean to suggest that, without advertising flotsam, the internet would be, in Virginia Wolfe's phrase, a "clean well-lighted place." Usenet itself has been likened to a "herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it" (Gene Spafford): Decentralization, coupled with our many languages and our sheer numbers, ensure that our virtual world is chaotic and cacophonous.

2 Cookies3 are files that contain information about the sites you visit and what you do while there. They're passed by a server (read the page you're trying to load) to your HDD: Suppose you visit site aaa.com for the first time; while loading the page, you'll see, at the bottom of your browser window, "receiving data from aaa.com," which means that you're loading the page--and may mean that aaa.com is depositing a cookie on your HDD. If you visit aaa.com again, the server will check your cookie file to see if you're a repeat visitor. How does the server recognize you? Because the cookie you got on your first visit contains a unique identification (ID) number.

3 Oh no, a footnote to a footnote: Cookies are, of course, not the only way to identify YOU and what YOU'RE doing; au contraire, consider static IP addresses and pentium chips with unique ID numbers; there are many ways, most more reliable than cookies.

4 What happens when a cookie file is deleted? As shown in the three lines of each of Tables 1 and 2 below, the ID number changes. To obtain the data shown in the first table, I cleared my cache, loaded Wired's news page with Netscape 4.7 (on my old Macintosh Performa), quit Netscape, and then opened the cookie file in a word processor (and pasted in the results). Except for the ID number in the fourth column, the three lines are identical.

TABLE 1. Wired Cookie Unique IDs using Netscape 4.7 on my Macintosh Performa

.wired.com TRUE/FALSE 2145917201 p_uniqid  8IX5N29LCMygD4h1cD
.wired.com TRUE/FALSE 2145917201 p_uniqid  8IX5N29LCMygD4h1cD
.wired.com TRUE/FALSE 2145917201 p_uniqid  8I36q29KOF/GZlcjLB

What this means is that anyone who deletes their cookie gets a new "unique" ID number every time they land on a site. The result is not a database (it is a mess).

Naturally--who wouldn't?--I was compelled to collect more data--hence Table 2--this time using Netscape 4.6.1 on my ibook:

TABLE 2. Wired Cookie Unique IDs using Netscape 4.6.1 on my ibook

.wired.com TRUE/FALSE 2145875565 p_uniqid  8U3bC0NMQt0CdD36OB
.wired.com TRUE/FALSE 2145875565 p_uniqid  8U3bC0NMQt0CdD36OB
.wired.com TRUE/FALSE 2145875565 p_uniqid  8U3bp19KyUW80VM48B

The values in the third column are again identical to each other--but different from those in the previous table--I'd guess that the "214" in the third column stands for "Mac"--and the rest for for different kinds of Macs (this is a nominal variables or labelled category). Are these ID#s unique? yes in the sense of capturing a part of a particular point in time--a cross-section--but without value for making any predictions about future behavior. You can attach whatever labels you want to cookie data: they all work equally fine, because they all are equally meaningless.

Other cookies in the wild (I have been more-and-less systematically studying cookie droppings for years). To collect the cookie immediately following I went to a search engine and pasted in "Performance-Based Data Management Initiative." I purposefully didn't delete my cookie file; I followed a few links on the search engine results page and noticed how many .mil consulting firms had .org addresses; I then opened a new window to onelook and typed in the word "totalitarian": the Google cookie changed not at all. The morale? Constants don't help predict human behavior; variables can be useful. Few notice the difference.

TABLE 3. Google Cookie Unique ID using icab 2.7 on my imac (static during period of cognitive leap from noticing a plethora of .mil businesses with .org addresses to searching for the definition of "totalitarian").

google_series4.5F

cookie: google_s4.5F2

The cookie changed not at all during my cognitive leap and subsequent searching behavior. The cookie is oblivious to the searcher's thoughts and the behaviors they trigger.

What BTW kind of assumptions to produce an expiration date some 30 years hence? I deleted the cookie, as I always do, which means I'm always IDed as a "new unique visitor," which makes cookie data not a dataset but a mess, a pile of numbers: no more meaningful than typical NCLB fudged accountability numbers.

Here's another "unique" ID Google has assigned to me :

Table 4. Another Google Cookie Unique ID using icab 2.7 on my imac (reasoning backwards from the assignment date, it was assigned 3 months and 10 days after the 'unique' ID shown in Table 3).

google_series5.9Q

The only thing these two "unique" Google ID#s have in common is their expiration date: using constants like this suggests google.empire's cookie collectors don't understand what data collection means. The current chatter about a Yahoo study 'reporting' 139 and 249 percent increases in searches on topics displayed after an ad campaign suggests the same--and only confirms my long held belief that marketing research is an oxymoron: toast---and a pox on us all. Where are their data?

5 I sent an email to HHS about last year's proposed changes to extant medical privacy regulations--about individual medical records being subject to data trawling--and got back a response that made me think I'd emailed a trash can (I doubt my comments made it into the Federal Register). But here's what I sent (the first two paragraphs just say that since hhs.gov's web link for e-commenting wasn't working, I was emailing my e-comments).

6 Wouldn't the web be a better place if the dominant philosophy of SEO was language based and organic, the digital analog of one still successfully used by ordinary flowers in the physical world: provide the search engines what they want, useful information, comprehensible and creative content, simultaneously promote your site--and help build a better information superhighway:

Doing so both promotes individual sites--and is the only way to stop information pollution. The search engines will seek out useful, interesting, and useable content--flowers bloom and bees come (and, of course, when people are online searching is when they're most receptive to new information). We also, obviously, turn first to sources we trust, sources that, based on prior experience, are credible. The longterm goal is web content "directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms ... [that] promote understanding, tolerance ... friendship ... peace" (Universal Declaration of Human Rights). What you say is the basis for both page rank and the text summary search engines show on their results pages. Pictures may be worth thousands of words--but they're not used to generate text summaries.

Also remember, that, no matter what you say, it will be interpreted in the context of the post-"tragedy of the commons" digital world. People call commercial surface mail "junk," discarding it with resentment and without examination; people want to be on "no call" lists to escape telemarketing and, as soon as they discovered they had cookies, started deleting them. Public response to unsolicited commercial messages has generalized across media ... Plummeting public confidence in private and public institutions paralleled the growth of the U.S. advertising industry, perhaps because, as people learned to approach commercial messages more critically--not believe everything they said--their skepticism--a rational human adaptive behavior--generalized to non-commercial messages. Swatting away commercial messages has become nearly as automatic as swatting away biting insects. Digital repellant behavior is not limited to individuals: Search engines alter their ranking formulas after they detect new methods of artificial optimization (page rank obtained by one-way link campaigns isn't all that different from baseball records obtained by steroid-enhanced performance). Current best practices of search engine optimization seem to be evolving like drug-resistant strains of malaria:

The alternative to the mutation model of optimization is to give the search engines what they want: useable, useful, interesting, creative text or content. Giving the engines what they want enhances and stabilizes page rank and so the likelihood that searchers will find you. Doing so is, additionally, probably the only way to stop information pollution.

Besides, people who ask questions now have more control over their search results than imaginable even a few years ago. It is rational to try to cut through meaningless commercial clutter: someone looking for anti-popup software might reason that only credible sites include instructions (sites with "faq" or "help" files won't be excluded). For the searcher, a rational goal is a results page with a handful of totally relevant links--even better, one whose text summaries answer the presenting question. It takes little effort to build a better search by pasting terms from the text summaries on a first results page into its search window and then hitting the return key. Additionally, the order of keywords in a query influences the content of the text summaries seen on the results pages. Some ad agencies now argue that people will start "cooperating" when told they can't be individually identified by the individually identifiable data collected to track them: bullshit; past behavior suggests that users will *not* start cooperating.

Bibliography


Last updated July 07.

Andrews, Jim, Digital Langu(im)age: Language and Image as Objects in a Field, Perihelion, II (4), 01

A.S. Waldinger's Internet Cookie

Center for Democracy and Technology's Privacy Guide

Commentary, Spills, Spying, Pollution, Labor Violations, Greed and Political Corruption Define an Industry, Agbiz Tiller Online, 1 Mar 97

Cookie Central Home Page

Cyberatlas Staff, IT Worker Shortage Continues, Datamation, 4 Oct 01

Dick, Philip K., Martian Time-Slip, NY: Ballantine, 1964

Doubleclick NASDAQ quote

Electronic Privacy Information Center Page

Hand, D. J., Data Mining: Statistics and More?, The American Statistician, 52(2), 98

Indiana State Legislature, House Bill 246, 1897

Junkbusters' Home Page

Junkbusters, Junk Data: Heard the Wholesale Electronic Gossip About You?, 13 Sept 00

Meland, M., The Other Online Profiler, Forbes, 25 Feb 00

Privacy International's Home Page

Privacy.org's Privacy Site

Rein, L., Your Data as Online Commodity, Wired, 2 June 98

Reuters, MS Patches Privacy Peephole, Wired, 7 Mar 99

Roger Clarke's Dataveillance and Information Privacy Pages

Rogers, Z., Seven Minds on the Online Ad Market, Silicon Alley Views, 11 Jan 02

SAS Institute, Avenue A: Rewriting the Rules of Online Advertising, Sas.Com Magazine, May/June 01

Scheeres, J., New and Improved Ad Accounting, Wired, 16 Jan 02

Suellentrop, C., Why Online Polls Are Bunk, Slate, 12 Jan 00

Trout, K. [as listed], Discourse on Something or Another, Apocalypse Culture Productions, #19, 31 Oct 95

Wasserman E., Stuck in the Middle, The Standard, 6 Mar 00

Wired News Page

Yahoo, Display Ads Increase Search, 9 May 05