Philip K. Dick Words Project errata sheet example
Claudia Krenz (040804)

Modern OCR makes it feasible to extract what he wrote off the paper and post it in files on the WWW that you can read on your monitor. Being posted as a unit and having no need of being updated would give this site a unique digital signature, which would help establish convergent and discriminant validity. Proposed is posting these letters in 3 formats--PDF and ASCII in addition to HTML--to accommodate diverse connect speeds and HDD sizes, human digital diversity: Posting these letters as described would increase the upper limit on the number of copies--as would their being free.

Why generate and post a list documenting the minimal differences between formats when most people won't want to read it because it will be boring? To mitigate against special and mundane copy entropy. The text below uses the first page of this one letter to indicate what a more-and-less automatically generated errata file would look like, e.g., using a convention like "xstkvrx" to symbolize strike-over characters.

First page from Philip K. Dick's 11/26/74 letter to me:
poor facsimile through a scanner *w/o* modern OCR

 pkd-dcL112474



marginal notes

page 1: * this Rules out it Being either a mere Mechanical Response (i.e., echo phenomenon) or the person's own prior forgotten memory.

 

 

 

Contrast the image of the page to your left w/ its surrounding white margins (the black at the bottom is my cropping error): As suggested, last century's OCR can't separate what he wrote from the paper on which he wrote. Modern OCR can scan pages and output typed words as words (ASCII characters): everything but the now old white paper. W/ modern OCR, Philip K. Dick's letters could be output, posted, and copied w/o any image degradation:

  • His typed words would appear as mine do here on the right side of this page: black words on a white background.
  • His handwritten marginal notes and signatures--embedded as image files--would also be of modern quality, far superior to what's shown here.

What he wrote would be output and posted in the 3 formats, 1, 2, 3 the trivial differences between which are shown in the table below. This table--using the first page of his 11/26/74 letter shown to the left as an example--shows differences between the PDF, HTML, and ASCII versions.

The table's first column lists letter date (associated file names), page and paragraph numbers; the next three, the 3 versions--and the last, a description. The four rows show how hand-corrected spellings and misspellings, strike-over characters, and margin notes would be handled in the three formats: his hand-corrected spellings--actual handwritten characters--would be preserved in the PDF version (the HTML and ASCII versions would contain them as corrected); the appearance of the strike-overs would be maintained visually--identical appearance--in the PDF and HTML versions but represented symbolically ("xstkvrx") in the ASCII version. Each letter file would begin w/ the date and salutation he typed and conclude w/ his marginalia. Boilerplate text now at the top of each letter fragment file would, of course, be deleted in all versions.

Table: ERRATA SHEET example using the first page of his 11/26/74 letter

Version ‡
pkd-dcL# (file names)
page#, paragraph#
PDF 1HTML 2ASCII 3Comment
112674 (dcL112474.gif,dcL112674.html,dcL112674M.gif,dcL112674V.gif) p1, g1 thaced white thached hairwhite thached hairIn this example, he typed "tached," hand inserted an "h" before the "a" but forgot the "t" after it. Words he misspelled would be output as he misspelled them: I'm confident he was trying for "thatched" here (these letters were written even pre-whiteout). The careful reader will have noticed the "sandals (sp)" in the second sentence (third line) of this page: although scarcely discernible using last century's OCR, he had initially typed "sandels (sp)" and hand corrected the word by writing an "a" over the "e" he'd typed. The careful reader will have further noticed that, in the third line of the last paragraph he typed "Boox Six" although he--so it seems to me--meant "Book Six" of the AENEID. Words as he spelled them, taking into account the hand corrections he made, would be so output.

112674, p1, g2 original tongue * original tongue. *original tongue. *In this example, he hand inserted an asterisk symbol. ASCII equivalents of his other half-dozen diagrams are less straightforward--but eminently feasible.
112674, p1, g3 strikeover Virgil strike-over Virgil xstkvrx VirgilIn this example, he typed some characters, backspaced, typed over them, and then typed "Virgil." "xstrkvrx" is used as a text convention to symbolize the strike-overs, which vary in appearance. The same convention would be used throughout the ASCII version--unless someone has a better idea. If he sometimes used strike-over characters to suggest this or that about his fear of devolving into Imperial Rome--or it intruding into our own temporal abnormality--it would be apparent in the PDF and HTML versions.

The actual strikovers or their symbolic representation--plus the spelling comments--serve to remind readers what they're reading: letters he wrote; readers w/ opinions about or interest in what others think are his best works can consult the numerous usenet groups.

112674, p1, end his marginalia as he penned them
(tVmarginalia 112674) <--like this
except legible w/ modern OCR
tVmarginalia 112674
+ typed version
* this Rules out it Being either a mere Mechanical Response (i.e., echo phenomenon) or the person's own prior forgotten memory. A typed version of his marginalia would mitigate against mundane but anticipated typos. Most human readers would be interested in content, what he wrote about--for example that he wrote that he thought he now had proof of something he had previously suspected, that he was hearing ancient Greek words in his nightly visions of this period: Individual readers must decide whether or not to take what he wrote literally (e.g., he was being visited by "aliens" like in the movies) or figuratively (e.g., he was working on v.a.l.i.s., a book best seen in its later polished published form). An errata sheet can only address the more mechanical mundane hermeneutic questions. Interpretation is individual matter.


‡ Why 3 versions? To accommodate diverse HDD sizes and dial-up speeds (the PDF versions would have the largest and the ASCII, the smallest footprints). Although the engines already automatically generate ASCII versions of PDF documents, a site w/ all the letters in 3 versions would serve the goals of universal access and authentication: Having conveniently sized versions would reduce willy-nilly file naming--and sentences nested in paragraphs in pages within individual files named by date are easily cited. The web site itself --the name of its URL makes no difference that I can see--would be consist of his words and whatever minimalist administrative words and code needed to display those words:

I used HTML 3.2 throughout: would an earlier standard be better? I haven't included META tags on the letter fragment files; besides his name and julian dates, which might be useful to readers subsequent or future to you?

Posted the letter files on the internet will complete my role: letter files self-sufficient. In summary, modern OCR could be used to extract what he wrote off the now old paper; this output would be checked by pasting it into any modern spell checker; an errata sheet would document trivial differences between the 3 formats, a simple checklist to facilitate future textual exegesis. Modern file transfer and compression applications make it possible to distribute all the letters he wrote me (in the same mnemonically named files)--for me to deliver the mail--and browsers, for you to read it.
1 PDF format: What's shown here isn't really PDF but in GIF format. The images in this column symbolize that the PDF letter files would be identical in appearance to what he wrote and penned, everything but the paper). One file per mnemonically named letter. His pagination--he always numbered his pages (and I kept them in the order he sent them)--and paragraphing would be preserved in all 3 formats. His lineation also would be preserved in the PDF version. These would not include typed versions of marginalia, because PDF is a look-only format (w/ a primitive find).

2 HTML format (used throughout this proposal): Entries in this column, a combination of text and picture formats, are, actually, as they would appear. One HTML file per letter (w/ embedded content and linked image files of marginalia). This version would identical to what he wrote and penned, everything but the paper and hand-corrected spellings (text reflecting substance).

One way this project differs from the esteemed Project Gutenberg is there is need to preserve his page numbers but no need to preserve his lineation with hard carriage returns in the HTML version: since paragraphing is easily preserved, citing is easy: sentences nested in paragraphs nested in numbered pages nested in letters named by date--which is a necessary condition for establishing convergent and discriminant validity in the unique context of these letters.

All files would be standards compliant. As suggested by all of page one of his 11/26/74 four-page letter--shown above--and page ten of his 2/17/1975 fifteen-page letter (see hypertext link in project index), HTML formatting these letters would be simple.

3 ASCII format: Entries in this column appear as they would in the actual version. One file per letter. This version would be identical in content to what he wrote and pretty similar in appearance since over 95% of what he wrote consisted of typed text. Currently existing applications would most easily translate and speak this version.


I'm confident that what I've proposed can be improved.
back to Philip K. Dick Words Project index and RFC