|There is almost continual chatter--and some talk--about the Information Superhighway. To date, no one has addressed the inevitability of road side litter. As denizens in an increasingly data- or knowledge- driven society, we need to discriminate between trash and what's substantive, between the illusion of knowledge and knowledge itself.|
Litter comes in many forms and from many sources. There are those
pop-up advertising windows (invariably consisting of a bloated graphic, in
colors like rotting-caterpillar-on-a-molding-cabbage-leaf green).1 I call
them litter and welcome them as much as projectile vomiting; advertising
agencies like Doubleclick call them "content
delivery devices." IMHO, it is the ad agencies who, having
perverted dialog into targeted advertising, litter our virtual landscape
with personal information about us all. We humans might not be worth much
money as bags of nutrients but data about us are a hot
commodity, passed by cookies2 from one Doubleclick customer to another.
But crap is crap, no matter what you call it. Saying otherwise doesn't make it so, any more than Indiana's legislating pi to be 3.0 made it so. Indeed, there are a number of similarities between ad agencies and corporate hog farms: There is the myth that the industry is self-regulating, meeting stringent standards--and the biggest one of all, that it as good for the community as is hog excrement for the environment.
Scariest of all though is that the goal of targeted advertising, if realized, will result in each of us getting our own individualized news, tailored to our interests, attitudes, bank accounts ... we'll end up living like hogs on a huge factory farm, lined up row after row in our individualized cells, silent, consuming, no longer living.
One connotation of litter is uselessness: diamonds don't litter a
public beach; discarded fast-food wrappers do .... Certainly, online
opinion polls are meaningless. And
the script on which data about us is passed from our HDDs to somewhere
and from there to somewhere else and back again is not without bugs: it's
dumb to assume otherwise.
Most of all though, research based on cookies is flawed, because manipulating or deleting4 cookie files is easy .... If only one person (me) deletes her cookie and wafer files, BFD; if two, three, four, a hundred, a thousand, a million--no one knows how many--do so, then cookie-based marketing research has a major problem: new visitors are indistinguishable from repeat visitors who always delete their cookies and those who sometimes delete them.
It blows my mind that those running the data-warehousing circus haven't noticed the large, enormous even, terabytes of crap they're collecting. They just don't get it (or, perhaps, like Enron and Arthur Andersen, they do, but they're pleading the fifth).
Then there's what's done with those data collected about us: market data are typically analyzed with techniques collectively called data mining, which statisticians point out are inherently flawed, coupled with too many incentives for seeing patterns and too few for detecting error. Me, I'll take plain old vanilla statistics anytime. To further underscore the farce, the HR-types doing the hiring have a different idea of the skills needed for data analysis than do those for whom they're doing the hiring. We see a market governed by applying questionable techniques to error-besotted data:
I conjure up titles like "Abott and Costello Collect Doubleclick Data" and "The 3 Stooges Analyze It." A scene from a Philip K. Dick novel comes to mind:
Gubble, gubble. Think of it, those data whose side effect is your irritation about the invasion of your privacy, your HDD, the loss of your time as those pop-up windows load and cookies are swapped--multiplied over all of us--are meaningless. Gubble, gubble.
|It certainly hasn't taken long to trash the Information Superhighway. It can be saved--but will it be? Before we can even begin to answer the question, we must look at the concomitant phenomena of road dirt and road kill. One could regard the demise of many dot.coms as road kill, but I regard it as sanitation. What I care about is how much longer our voices can be heard above the din, the crap, the litter.|
1 I don't mean to suggest that, without advertising flotsam, the internet would be, in Virginia Wolfe's phrase, a "clean well-lighted place." Usenet itself has been likened to a "herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it" (Gene Spafford): Decentralization, coupled with our many languages and our sheer numbers, ensure that our virtual world is chaotic and cacophonous.
2 Cookies3 are files that contain information about the sites you visit and what you do while there. They're passed by a server (read the page you're trying to load) to your HDD: Suppose you visit site aaa.com for the first time; while loading the page, you'll see, at the bottom of your browser window, "receiving data from aaa.com," which means that you're loading the page--and may mean that aaa.com is depositing a cookie on your HDD. If you visit aaa.com again, the server will check your cookie file to see if you're a repeat visitor. How does the server recognize you? Because the cookie you got on your first visit contains a unique identification (ID) number.
3 Oh no, a footnote to a footnote: Cookies are, of course, not the only way to identify YOU and what YOU'RE doing; au contraire, consider static IP addresses and pentium chips with unique ID numbers; there are many ways, most more reliable than cookies.
4 What happens when a cookie file is deleted? As shown in the three lines of each of Tables 1 and 2 below, the ID number changes. To obtain the data shown in the first table, I cleared my cache, loaded Wired's news page with Netscape 4.7 (on my old Macintosh Performa), quit Netscape, and then opened the cookie file in a word processor (and pasted in the results). Except for the ID number in the fourth column, the three lines are identical.
TABLE 1. Wired Cookie Unique IDs using Netscape 4.7 on my Macintosh Performa
What this means is that anyone who deletes their cookie gets a new "unique" ID number every time they land on a site. The result is not a database (it is a mess).
Naturally--who wouldn't?--I was compelled to collect more data--hence Table 2--this time using Netscape 4.6.1 on my ibook:
TABLE 2. Wired Cookie Unique IDs using Netscape 4.6.1 on my ibook
The values in the third column are again identical to each other--but different from those in the previous table--I'd guess that the "214" in the third column stands for "Mac"--and the rest for for different kinds of Macs (this is a nominal variables or labelled category). Are these ID#s unique? yes in the sense of capturing a part of a particular point in time--a cross-section--but without value for making any predictions about future behavior. You can attach whatever labels you want to cookie data: they all work equally fine, because they all are equally meaningless.
Other cookies in the wild (I have been more-and-less systematically studying cookie droppings for years). To collect the cookie immediately following I went to a search engine and pasted in "Performance-Based Data Management Initiative." I purposefully didn't delete my cookie file; I followed a few links on the search engine results page and noticed how many .mil consulting firms had .org addresses; I then opened a new window to onelook and typed in the word "totalitarian": the Google cookie changed not at all. The morale? Constants don't help predict human behavior; variables can be useful. Few notice the difference.
TABLE 3. Google Cookie Unique ID using icab 2.7 on my imac (static during period of cognitive leap from noticing a plethora of .mil businesses with .org addresses to searching for the definition of "totalitarian").
The cookie changed not at all during my cognitive leap and subsequent searching behavior. The cookie is oblivious to the searcher's thoughts and the behaviors they trigger.
|What BTW kind of assumptions to produce an expiration date some 30 years hence? I deleted the cookie, as I always do, which means I'm always IDed as a "new unique visitor," which makes cookie data not a dataset but a mess, a pile of numbers: no more meaningful than typical NCLB fudged accountability numbers.|
Here's another "unique" ID Google has assigned to me :
Table 4. Another Google Cookie Unique ID using icab 2.7 on my imac (reasoning backwards from the assignment date, it was assigned 3 months and 10 days after the 'unique' ID shown in Table 3).
The only thing these two "unique" Google ID#s have in common is their expiration date: using constants like this suggests google.empire's cookie collectors don't understand what data collection means. The current chatter about a Yahoo study 'reporting' 139 and 249 percent increases in searches on topics displayed after an ad campaign suggests the same--and only confirms my long held belief that marketing research is an oxymoron: toast---and a pox on us all. Where are their data?
5 I sent an email to HHS about last year's proposed changes to extant medical privacy regulations--about individual medical records being subject to data trawling--and got back a response that made me think I'd emailed a trash can (I doubt my comments made it into the Federal Register). But here's what I sent (the first two paragraphs just say that since hhs.gov's web link for e-commenting wasn't working, I was emailing my e-comments).
6 Wouldn't the web be a better place if the dominant philosophy of SEO was language based and organic, the digital analog of one still successfully used by ordinary flowers in the physical world: provide the search engines what they want, useful information, comprehensible and creative content, simultaneously promote your site--and help build a better information superhighway:
Doing so both promotes individual sites--and is the only way to stop information pollution. The search engines will seek out useful, interesting, and useable content--flowers bloom and bees come (and, of course, when people are online searching is when they're most receptive to new information). We also, obviously, turn first to sources we trust, sources that, based on prior experience, are credible. The longterm goal is web content "directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms ... [that] promote understanding, tolerance ... friendship ... peace" (Universal Declaration of Human Rights). What you say is the basis for both page rank and the text summary search engines show on their results pages. Pictures may be worth thousands of words--but they're not used to generate text summaries.
Also remember, that, no matter what you say, it will be interpreted in the context of the post-"tragedy of the commons" digital world. People call commercial surface mail "junk," discarding it with resentment and without examination; people want to be on "no call" lists to escape telemarketing and, as soon as they discovered they had cookies, started deleting them. Public response to unsolicited commercial messages has generalized across media ... Plummeting public confidence in private and public institutions paralleled the growth of the U.S. advertising industry, perhaps because, as people learned to approach commercial messages more critically--not believe everything they said--their skepticism--a rational human adaptive behavior--generalized to non-commercial messages. Swatting away commercial messages has become nearly as automatic as swatting away biting insects. Digital repellant behavior is not limited to individuals: Search engines alter their ranking formulas after they detect new methods of artificial optimization (page rank obtained by one-way link campaigns isn't all that different from baseball records obtained by steroid-enhanced performance). Current best practices of search engine optimization seem to be evolving like drug-resistant strains of malaria:
The alternative to the mutation model of optimization is to give the search engines what they want: useable, useful, interesting, creative text or content. Giving the engines what they want enhances and stabilizes page rank and so the likelihood that searchers will find you. Doing so is, additionally, probably the only way to stop information pollution.
Besides, people who ask questions now have more control over their search results than imaginable even a few years ago. It is rational to try to cut through meaningless commercial clutter: someone looking for anti-popup software might reason that only credible sites include instructions (sites with "faq" or "help" files won't be excluded). For the searcher, a rational goal is a results page with a handful of totally relevant links--even better, one whose text summaries answer the presenting question. It takes little effort to build a better search by pasting terms from the text summaries on a first results page into its search window and then hitting the return key. Additionally, the order of keywords in a query influences the content of the text summaries seen on the results pages. Some ad agencies now argue that people will start "cooperating" when told they can't be individually identified by the individually identifiable data collected to track them: bullshit; past behavior suggests that users will *not* start cooperating.
Andrews, Jim, Digital Langu(im)age: Language and Image as Objects in a Field, Perihelion, II (4), 01
A.S. Waldinger's Internet Cookie
Center for Democracy and Technology's Privacy Guide
Commentary, Spills, Spying, Pollution, Labor Violations, Greed and Political Corruption Define an Industry, Agbiz Tiller Online, 1 Mar 97
Cookie Central Home Page
Cyberatlas Staff, IT Worker Shortage Continues, Datamation, 4 Oct 01
Dick, Philip K., Martian Time-Slip, NY: Ballantine, 1964
Doubleclick NASDAQ quote
Hand, D. J., Data Mining: Statistics and More?, The American Statistician, 52(2), 98
Indiana State Legislature, House Bill 246, 1897
Junkbusters' Home Page
Junkbusters, Junk Data: Heard the Wholesale Electronic Gossip About You?, 13 Sept 00
Meland, M., The Other Online Profiler, Forbes, 25 Feb 00
Privacy International's Home Page
Privacy.org's Privacy Site
Rein, L., Your Data as Online Commodity, Wired, 2 June 98
Reuters, MS Patches Privacy Peephole, Wired, 7 Mar 99
Roger Clarke's Dataveillance and Information Privacy Pages
Rogers, Z., Seven Minds on the Online Ad Market, Silicon Alley Views, 11 Jan 02
SAS Institute, Avenue A: Rewriting the Rules of Online Advertising, Sas.Com Magazine, May/June 01
Scheeres, J., New and Improved Ad Accounting, Wired, 16 Jan 02
Suellentrop, C., Why Online Polls Are Bunk, Slate, 12 Jan 00
Trout, K. [as listed], Discourse on Something or Another, Apocalypse Culture Productions, #19, 31 Oct 95
Wasserman E., Stuck in the Middle, The Standard, 6 Mar 00
Wired News Page
Yahoo, Display Ads Increase Search, 9 May 05