crowdsource Digital Humanities digitization

Using Crowdsourcing for Digitization

Mark Twain’s Tom Sawyer getting help to whitewash Aunt Polly’s picket fence.

In his book, “The Adventures of Tom Sawyer,” Mark Twain (Samuel Clemens) provides a useful metaphor for crowdsourcing and digitization. Tom turns a boring chore (whitewashing) into something desirable for others to do. What makes the story timeless is that over 144 years later people are still trying to get others to do the work for them. In the realm of digital humanities, transcribing hand written documents or identifying or vectoring shapes, is a tedious and time consuming task. A project with tens of thousands of documents used to take decades to scan, transcribe, and digitize. However, organizations are turning towards crowdsourcing as a way to reduce costs, and speed up the process while creating a community of interest in the project.

People have been participating in crowdsourcing efforts for years without even knowing it. For example, sites that use photo identification or text entry for non-robot identification, leverage the crowd to accomplish some simple digitization tasks. But it is the bigger trend of organizations purposefully creating interfaces to permit the public to contribute to the digitization process that is worthy of review.

Our crowdsourcing assignment was to review the pros and cons of leveraging public participation in digital collections. It would appear that a growing number of institutions are outsourcing some digitization functions to a public community instead of depending solely on employees. This makes sense considering the sheer drudgery and cost of trying to digitizing large collections. For example, “Papers of the War Department 1784-1800” is a perfect example of how to leverage crowdsourcing to perform correction and transcription contextualization. Currently hosted at the Center for History & New Media at George Mason University, the project’s goal is to restore and make accessible this historic collections of over 42,000 U.S. military records once thought lost in a tragic fire.

Public users accessing the “Papers” site are invited to search, review, and transcribe the remaining hand written documents that need to be digitized. The interface is quite simple and permits users to read a scanned image of the document while attempting to transcribe. In very little time I was able to get familiar with the interface and start transcribing. My only problem had to do with retraining my self to read cursive hand writing from over two hundred years ago.

New York Public Library’s Building Inspector site.

Another example of using crowdsourcing is the New York Public Library’s “Building Inspector” site. This project is a little more whimsical and fun. I think Tom Sawyer would not have had such a hard time getting his friends to participate. In an attempt to gain insight from old New York City inspection maps, the Building Inspector site invites users to assist in vectorization or shape discovery of old buildings outlines. In a rather simple, but addictive process, users have only to visualize an outline of a building and determine if it is correct or not, and whether it needs to be fixed. The reason that humans are better than a machine in reviewing the outlines is our ability to quickly determine if it looks right. A rather mindless activity that even elementary students could participate in. It is a good example of how humans can still do the work of a machine if only enough are willing to provide the time necessary to complete the task.

Of the two examples, Building Inspector is probably a better prototype for the future of crowdsourcing digitization projects. Reviewing building outlines is much more suitable for engaging a bigger crowd. Papers of the War Department requires a more scholarly effort. Transcribing is much more intensive work and I would even say a specialty. The site tries to mitigate that problem by offering various degrees of difficulty for the contributor.

It is clear from this assignment that “contributory” projects are here to stay and will only increase in numbers. As online technologies and interfaces improve more of the public will be able to “interact” or access digital and/or physical artifacts provided by institutions (e.g., providing notes on museums’ objects; tagging on galleries’ digital collection). If the interface and tasks or made to be engaging and interesting, the contributors will come back often and the value of the crowdsource will be self evident. However, if the tasks that the participants are asked to perform are too difficult then the number of contributors will be limited. As with Tom Sawyer, it is not enough to just get someone else to do the work. Eventually, they will want to see what is in it for them.