Why generate and post a list documenting the minimal differences between formats when most people won't want to read it because it will be boring? To mitigate against special and mundane copy entropy. The text below uses the first page of this one letter to indicate what a more-and-less automatically generated errata file would look like, e.g., using a convention like "xstkvrx" to symbolize strike-over characters.
First page from Philip K. Dick's 11/26/74 letter to me:
poor facsimile through a scanner *w/o* modern OCR
page 1: * this Rules out it Being either a mere Mechanical Response (i.e., echo phenomenon) or the person's own prior forgotten memory.
Contrast the image of the page to your left w/ its surrounding white margins (the black at the bottom is my cropping error): As suggested, last century's OCR can't separate what he wrote from the paper on which he wrote. Modern OCR can scan pages and output typed words as words (ASCII characters): everything but the now old white paper. W/ modern OCR, Philip K. Dick's letters could be output, posted, and copied w/o any image degradation:
What he wrote would be output and posted in the 3 formats, 1, 2, 3 the trivial differences between which are shown in the table below. This table--using the first page of his 11/26/74 letter shown to the left as an example--shows differences between the PDF, HTML, and ASCII versions.
The table's first column lists letter date (associated file names), page and paragraph numbers; the next three, the 3 versions--and the last, a description. The four rows show how hand-corrected spellings and misspellings, strike-over characters, and margin notes would be handled in the three formats: his hand-corrected spellings--actual handwritten characters--would be preserved in the PDF version (the HTML and ASCII versions would contain them as corrected); the appearance of the strike-overs would be maintained visually--identical appearance--in the PDF and HTML versions but represented symbolically ("xstkvrx") in the ASCII version. Each letter file would begin w/ the date and salutation he typed and conclude w/ his marginalia. Boilerplate text now at the top of each letter fragment file would, of course, be deleted in all versions.
Table: ERRATA SHEET example using the first page of his 11/26/74 letter
|PDF 1||HTML 2||ASCII 3||Comment|
|112674 (dcL112474.gif,dcL112674.html,dcL112674M.gif,dcL112674V.gif) p1, g1||white thached hair||white thached hair||In this example, he typed "tached," hand inserted an "h"
before the "a" but forgot the "t" after it. Words he misspelled would be
output as he misspelled them: I'm confident he was trying for "thatched"
here (these letters were written even pre-whiteout). The careful
reader will have noticed the "sandals (sp)" in the second sentence (third
line) of this page: although scarcely discernible using last century's
OCR, he had initially typed "sandels (sp)" and hand corrected the word
by writing an "a" over the "e" he'd typed. The careful
reader will have further noticed that, in the third line of the last
paragraph he typed "Boox Six" although he--so it seems to me--meant "Book Six" of
the AENEID. Words as he spelled them, taking into account the
hand corrections he made, would be so output.
|112674, p1, g2||original tongue. *||original tongue. *||In this example, he hand inserted an asterisk symbol. ASCII equivalents of his other half-dozen diagrams are less straightforward--but eminently feasible.|
|112674, p1, g3||Virgil||xstkvrx Virgil||In this example, he typed some characters,
backspaced, typed over them, and then typed "Virgil." "xstrkvrx" is used
as a text convention to symbolize the strike-overs, which vary in
appearance. The same convention would be used throughout the ASCII
version--unless someone has a better idea. If he sometimes used
strike-over characters to suggest this or that about his fear of devolving
into Imperial Rome--or it intruding into our own temporal abnormality--it
would be apparent in the PDF and HTML versions.
The actual strikovers or their symbolic representation--plus the spelling comments--serve to remind readers what they're reading: letters he wrote; readers w/ opinions about or interest in what others think are his best works can consult the numerous usenet groups.
|112674, p1, end|| his marginalia as he penned them
() <--like this
except legible w/ modern OCR
+ typed version
|* this Rules out it Being either a mere Mechanical Response (i.e., echo phenomenon) or the person's own prior forgotten memory.||A typed version of his marginalia would mitigate against mundane but anticipated typos. Most human readers would be interested in content, what he wrote about--for example that he wrote that he thought he now had proof of something he had previously suspected, that he was hearing ancient Greek words in his nightly visions of this period: Individual readers must decide whether or not to take what he wrote literally (e.g., he was being visited by "aliens" like in the movies) or figuratively (e.g., he was working on v.a.l.i.s., a book best seen in its later polished published form). An errata sheet can only address the more mechanical mundane hermeneutic questions. Interpretation is individual matter.|
I used HTML 3.2 throughout: would an earlier standard be better? I haven't included META tags on the letter fragment files; besides his name and julian dates, which might be useful to readers subsequent or future to you?
2 HTML format (used throughout this proposal): Entries in this column, a combination of text and picture formats, are, actually, as they would appear. One HTML file per letter (w/ embedded content and linked image files of marginalia). This version would identical to what he wrote and penned, everything but the paper and hand-corrected spellings (text reflecting substance).
One way this project differs from the esteemed Project Gutenberg is there is need to preserve his page numbers but no need to preserve his lineation with hard carriage returns in the HTML version: since paragraphing is easily preserved, citing is easy: sentences nested in paragraphs nested in numbered pages nested in letters named by date--which is a necessary condition for establishing convergent and discriminant validity in the unique context of these letters.
All files would be standards compliant. As suggested by all of page one of his 11/26/74 four-page letter--shown above--and page ten of his 2/17/1975 fifteen-page letter (see hypertext link in project index), HTML formatting these letters would be simple.
3 ASCII format: Entries in this column appear as they would in the actual version. One file per letter. This version would be identical in content to what he wrote and pretty similar in appearance since over 95% of what he wrote consisted of typed text. Currently existing applications would most easily translate and speak this version.