The article seems like reasonable coverage, however, I'm going to point to some comments.
Doug says, "Backlist conversion are a pain to do correctly, because it starts with an OCR scan. It takes some careful proofing to clean up the scan errors, and even then you’ll end up with some errors left in." Which, I might add, I'm prepared to forgive -- but not hosts of spellchecker catchable errors in addition to leftovers from the scan + proof process.
That supports my hypothesis that they are in fact doing this via OCR off a p-copy [ETA: almost certainly an error on my part. I think they are doing OCR off a printer-reader e-copy] NOT by converting an underlying electronic file. It doesn't completely answer the question of how something escaped the proofing process with as many problems as Collins' book.
A debate over Doug's further assertion about the merit of bothering (he doesn't think it is, as a general rule) ensues which is interesting, but not directly relevant to the OCR/process question. It is, however, relevant to the publishers probable attitude towards customers of their backlist (Not Positive, shall we say?).
ETA: I suppose the business model might be, wait until someone complains, then fix what they complained about, on the theory that if no one is complaining, why waste the time/energy/resources/money on making the rest of it right? I wonder what I think of that. I'm a big fan of worse/cheap is better and 80/20 stuff, but this just feels like contempt for the customer and I _don't_ approve of that.
Detailed discussion of what's involved in turning a p-book into an e-book. The author presumes a level of competence and quality control which makes it reasonable to pick on public domain, scanned-and-proofed-by-volunteers books like _Bleak House_ for substituting "goat" for "gout", and concluding that portion of the diatribe with, "There is no substitute for a good proofreader checking the text against the original." Wow. If _only_ the world were that good. I'm just looking to have the obvious and distracting errors beaten back by, say, an order of magnitude. This author assumes publishers are hiring this work out, and about half the article is devoted to explaining how you can sort available services, why you shouldn't assume you can do it yourself with some tools you found online and similar.
Edited still more: a quick glance around elance suggests there are people around the world eager to bid on small conversion jobs.
And more: One -- but only one -- of the problems is that the underlying digital format for many p-books is a printer-ready digital file, frequently a pdf. This pdf has ligatures in it (makes sense for producing a p-book; e-books not so much unless they understand the representation of the ligature -- which they won't). Amazing hoops must be jumped through to back-convert the ligatures to "ordinary" text. Sometimes. Seems to depend on the ligature in question and the target format.
The world will always need compiler engineers who don't feel much like actually working on compilers.
Edited still more: http://publishingperspectives.com/2011/08/error-free-ebooks/
Nice set of quotes from relevant parties. I'm currently trying to figure out if anyone in the process is doing consistently better or worse than anyone else, or if this is Just One of Those Thing that is waiting for better tools/automation.
"The worst affected seemed to be even older books, such as Ayn Rand’s Atlas Shrugged, where a reader complained that the number “1” and the letter “I” were often switched."
I have not mentioned this, but in _Scorpion Tongues_, Al Gore appears at least three times as A1 Gore. It's a stunningly comprehensive collection of errors (which is what it would take to get me to complain, because I'm such a booster of e-reading in general, I'll downplay all negative aspects to ensure the transition sticks).
The author compares error free books to cars which drive themselves. I find this sort of ironic in the era of the Google Car, and stand by my assertion that what's really needed to deal with this problem is better tools. If tools are only getting you 90% of the way through the process (and so many of the problems being complained about in online fora involve trying to pipeline tools that don't integrate with each other to deal with some fiddly problem that Should Just Work), then yeah, the output is going to be dog shit. And not healthy, firm, easily picked-up-with-a-plastic-bag dog shit.