Sam Harrelson

About | Contact | Archives | Photos

Google Book's Search Dilemma

For the present, then, scholars will have to put on hold their visions of tracking the 19th-century fortunes of liberalism or quantifying the shift of “United States” from a plural to singular noun phrase over the first century of the republic: The metadata simply aren’t up to it. It’s true that Google is aware of a lot of these problems and they’ve pledged to fix them. (Indeed, since I presented some of these errors at a conference last week, Google has already rushed to correct many of them.) But it isn’t clear whether they plan to go about this in the same way they’re addressing the scanning errors that riddle the texts, correcting them as (and if) they’re reported. That isn’t adequate here: There are simply too many errors. And while Google’s machine classification system will certainly improve, extracting metadata mechanically isn’t sufficient for scholarly purposes. After first seeming indifferent, Google decided it did want to acquire the library records for scanned books along with the scans themselves, but as of now the company hasn’t licensed them for display or use—hence, presumably, those stabs at automatically recovering publication dates from the scanned texts.

Fascinating to see this space develop within the construct of a self-professed advertising company set on amassing all of the world’s data.

I have to wonder if the Alexandrian pharaoh’s had similar indexing problems constructing their library of knowledge…

Comments are closed.