May 1st, 2011

Data Quality, Decision Making and Publishers (LONG)

This is the second attempt at this post. The first attempt was ridiculously verbose.

In an earlier post regarding Amazon's participation in the auction for the Hocking contract, I quoted an anonymously sourced quote from Crain's coverage. The contents of the quote (Amazon has 65% market share in ebooks and falling) struck me as bizarre. I knew it was false; I was curious as to why a publishing executive might think it was true. I was able to find multiple possible sources of the investment advisor sort (Yankee Group, Goldman Sachs, and I think I remember Forrester weighing on this topic as well), all dating from a period of a few months last year when people were really pessimistic about Amazon (because of the iPad and iBooks announcement/release) and really optimistic about future competition for kindle (Plastic Logic's Que, Entourage's Edge, Hearst's entry, the Cool-er, etc.).

The question is, why might a publishing executive be believing these numbers in 2011, when a lot of that hypothetical competition has gone under or become irrelevant as a platform for ebook sales? After I thought about it for a while, I concluded that publishing executives exist within a culture which is completely accustomed to making decisions based on worse and more incomplete data than they could get access to if it occurred to them that they might need it -- which it would seem does not happen. It was thinking about publishers' recent interest in genre fiction which led me to this conclusion.

Genre fiction has been around in one genre or another and in one publishing format or another for a long time. Occasionally, one or a small group of authors within a genre would break out of what used to be referred to as the "ghetto" and get hardcover deals, but when they did, they had to accept an editorial process which attempted to convert their genre title into something closer to whatever passed for "mainstream" "general" or "literary" fiction was expecting at the time (cf. Heinlein's multiple versions). This happened when it became impossible to ignore the sheer volume of sales being driven by that author or that group of authors.

The formats in which genre fiction sold (pulp magazines, mass market paperbacks) made it difficult to identify which authors and which titles were selling and which ones were sitting unread (in some under-served moments, it all sold equally because the hunger for that genre was so great that the audience would read any form of it, however craptastic). Mass market paperbacks _had_ bar codes on them when they moved to the chain booksellers from supermarkets/grocers/drug stores and similar. But the bar codes did not uniquely identify a title when scanned; they identified a publisher. Over time, the chains got ISBNs into the inner cover of mmpbs and eventually the bar code on the outside became a unique identifier for the title. That happened _after_ Amazon had entered the scene. (<-- Really.)

I don't know how publishers paid direct-to-paperback authors royalties. I don't see how they could possibly identify how many of their titles were sold vs. cover removed and dumped. I harbor a suspicion that some authors got more than they should and some got shafted, but I have no proof.

In any event, Amazon _did_ know how many of any title was sold and, over time, they got better at identifying an author across the body of their work. When Amazon rolled out the first version of used books (the one you may not remember), a lot of the used catalog were out of print titles that would be fulfilled by (I could not possibly make this shit up) calling a bunch of used book stores after the order was received to see if anyone had a copy. In the meantime, Amazon had a list of an unholy number of completely unfulfillable titles that, apparently, a whole lot of people wanted really badly. (Yes, Dear Reader, Neal Stephenson's _The Big U_ was _really_ high on that list.)

The nature of internet retailing allows the seller to collect per-customer information in a way that a traditional retailer can only approximate with loyalty cards. This exposed very early at Amazon (well before the IPO) an aspect to book purchaser behavior that had been intuitively obvious to many people for a long time, but which the entire publishing industry remains blissfully unaware of _to this day_. A small number of people buy a huge number of books each (as in, hundreds every year, year in and year out. Occasionally, a thousand or more in a year, but that's unusual). A vast sea of people buy a small number of books each. The total effect of the first group -- if you can successfully deliver what they want -- is greater than the total effect of the second group. That turns out to be a _huge_ if, and in the past, there was a secondary constraint: you can't buy more books than you have space for unless you are prepared to get rid of the old ones and/or buy a new house and/or get a divorce and/or you get the idea. With ebooks, that constraint is entirely gone (at least it is as long as someone else curates and stores the ones that don't fit on your drive -- and you'll notice that Amazon switched from advertising how much storage their kindle has to providing the Media Library).

There is another characteristic of the high volume book purchaser: they read a lot of genre fiction.

Once Amazon and the chains had successfully demanded a per-title unique barcode, tracking sales on a per author and per title basis was definitely possible and should have been the norm throughout the publishing industry. This was the time frame in which genre fiction started to break out of paperback in significant numbers: you _could_ identify stars (it, in fact, became impossible to continue ignoring them when Amazon kept selling the damn things even when no one else was shelving them) and once you have stars, the temptation to ratchet the margin up via a hardcover release is actually irresistible. The case that genre fiction authors had to make to publishers in this time frame to get a hardcover was _insane_: no debut author in "literary fiction" _ever_ had to prove sales of that magnitude ahead of time. But once a few had gone through the ringer and started landing on bestseller lists (at Amazon if nowhere else) and enjoyed the ratcheting effect that has on sales, publishers were capable of doing "more of the same": they found some me-too authors and they absolutely signed deals for more entries in successful series. What is less clear is whether publishers recognized that the same idiots (<-- this group includes me) that were signing up ahead of time to preorder the hardcover edition of Sookie Stackhouse number 6 were also signing up ahead of time to preorder the hardcover edition of Jim Butcher's books and a half dozen or more authors. If they _had_, wouldn't they have put together a forum for us to hang out and chat about the books and for them to push more authors at us and accept our feedback and blah blah blah?

Why did it take them the better part of another decade to figure that out? In fact, as long as I'm on this topic, I _review_ in this blog a whole lot of these series and repeatedly say, yes I'll keep buying them or fuck no you've stepped over the line. When I blogged that I hated a certain package delivery service, they got in touch with me. I've never heard a _peep_ out of anyone trying to sell me anything related to the fact that I buy a truly silly amount of bad genre fiction (unless you count Amazon's recommendations). Self/Epubbed authors have responded to reviews as have romance authors -- but I never hear jack from the publishers. This strikes me as a little weird.

Recently, some publishers have figured out that genre fiction has some "vibrancy" and they've been "verticalizing" and creating "communities" for authors and/or readers. I found this out by visiting Publishers Weekly and looking through the daily lists-of-links. I think that says a lot about how far publishers have to go, in terms of improving the data quality on which they make decisions.

Simba, Professional and Scholarly Books, Size of Pie

This statement caught my eye in Erik Sherman's entertaining survey of numbers involving ebooks.

According to Simba, "professional and scholarly books, which include the legal, scientific/technical, medical and business segments, hold 75.9% of the $1.76 billion U.S. E-book market."

I'm working on a framework for making sense of ebook numbers. I'm going to deploy that, starting with, what's the size of that pie and, what's _in_ that pie.

AAP numbers are the ones trade publishers look at. Simba is trying to make the point that trade isn't the big part of the e-book pie. If you want to read Simba's report, you have to shell out beaucoup bucks for the privilege, and while I'm quite dedicated to mocking lame coverage, I'm not (yet) _that_ dedicated. Let's start with a quickie sanity check.

Here are the 2010 numbers from AAP:

E-books in AAP totaled $441.3M in 2010.
Professional books in AAP (unspecified whether this is pbooks, ebooks or both) totaled $812.9M

Let's compare $812.9 to .76*1.76 billion. Oh, look: it is substantially less. AAP numbers are about a smaller pie than Simba and one of drastically different composition (specifically, I suspect that a lot of the libraries referred to in the Simba press release are law libraries or medical libraries -- not public libraries in the sense we are mostly accustomed to).

This is a point where Sherman's analysis is weak.

"When such companies as Amazon or Barnes & Noble (BKS) claim that e-book sales have overtaken paperbacks or hard covers, would that include these very expensive titles?"

No. The kinds of publications that Simba must have been including are not sold any place you are likely to recognize unless you buy those books and if you buy those books, you know that Sherman's question is misguided. For example, the Collier Consumer Bankruptcy Practice Guide is available for sale through Lexis Nexis for $337 in a format compatible with the kindle, but is not available through the kindle store. I feel _slightly_ bad saying Amazon doesn't carry the kinds of titles that Simba is including in their pie, because Practising Law Institute DOES list a lot (perhaps all) of their titles in the kindle store, and PLI is a recognized CLE provider.

It only took me 10 minutes to track down and document the above information, however, I had to know ahead of time what to look for (no, I did not know about PLI when I started, but I knew how to find some professional titles in the kindle store because I'd seen them before -- and knew they were comparatively unusual; no, I did not know that LexisNexis ran an online storefront selling kindle professional titles for CLE until I did a little googling). If you would like to play this game a little more yourself, check this out:

This is actually sort of interesting, since my lawyer sister-in-law (the one who is not a law professor) has to transport through greater NYC trains and so forth a truly heinous load of law stuff including books. It was causing her back problems for a while (altho that's going better now) -- exactly the kind of thing that makes us all want high school students to have their textbooks in electronic format. So here's hoping over the long haul, professionals have e-everything to save their backs.