Categories
Uncategorized

Crowdsourcing

The focus of this blog posting is to determine the pros and cons of using crowdsourced knowledge. Many of us can still remember our parents purchasing encyclopedia sets. They became a valuable resource to complete our homework assignments and book reports. But even then we warned by teachers not to totally depend upon them as a sole resource. So it goes with Wikipedia. As defined in its logo, it is the “free” encyclopedia, and as we all know, you usually get what you pay for.

The concept behind Wikipedia is quite eloquent. Contributors and editors working online attempt to objectively crowdsource in realtime a scholarly article on a particular topic. Since the process is dynamic, knowledge becomes incremental and transformative. Centuries earlier, scholars “crowdsourced” by writing and sharing letters. While a much slower process, this effort did contribute to a period of “enlightenment”. However the problem facing us today, is the process may be too easy and quick. As a result, the veracity or truthfulness of the information may be diminished.

For my class assignment I had to review the Wikipedia site for Digital Humanities https://en.wikipedia.org/wiki/Digital_humanities. The site started in 2006 and provides an interesting perspective as how the field of study has evolved. From my review of the site it took about ten years for the editors and contributors to finally settle on a framework to present their sourced information. During those years it appears there was a lot of back and forth in trying to better define DH and provide better reference material to aspiring students and researchers.

In reviewing the DH site it became obvious that one problem of Wikipedia has to do with the credibility of the contributors or authors. So in essence, a single source or contributor, may not be as credible as the totality of the crowd (multiple sources). On the one hand, Wikipedia is very democratic in permitting a multitude of scholarly viewpoints, but it provides a simple governance process in allowing everyone the ability to edit each other’s contributions. This hopefully keeps everyone on “honest” by making sure they back up their statements with viable scholarly sources. In essence the online equivalent of “prove it.”

Another problem with Wikipedia has to do with the online posturing or confrontations between contributors. While somewhat entertaining, it reflects a 21st century lack of civility driven by our culture’s dependency upon social media. The scholarly process does provide for an “iron striking iron” methodology to craft a final and strong product. But the relative anonymity of social media permits some contributors or editors to exhibit rude behaviors that prevent others from wanting to share information or participate in the exchange.

Finally, according to Roy Rosenzweig, in his article “Can History be Open Source? Wikipedia and the Future of the Past” (2006) its printed guidelines states that it’s primary goal is to “avoid bias.” Wikipedia encourages contributors to write their articles from a neutral point of view, factually and objectively. But even in it’s recommended policies, Wikipedia acknowledges that posting unbiased scholarly research is “difficult” since all articles are edited by people who are “inherently biased.”

So despite Wikipedia’s obvious challenges how should one approach the site and it’s posted articles? First, one should always start at the beginning at look at when the site was created. One of Wikipedia’s great features is that you can time travel and follow the knowledge so to speak. By tracking the posts, edits and contributions you can gain valuable insights as to what were some of the conflicts or controversies that were identified and eventually resolved.

Another guideline is to also evaluate who are making the most contributions. As with most scholarly debates, there are usually only a relatively few subject matter experts attentive to posting and editing. So it is worthwhile to check their biographies if possible. This will go a long way in determining their credibility as a source.

Wikipedia provides a lot of statistics, especially in regard to the volume and pace of editing. This is usually a good point of reference to track and review from a historical perspective. As with most new postings their usually is period of time where both contributors and editors make changes at a substantial volume then it slackens off. This lasts until a new posting somewhat stirs the pot and once again there is some agitation and changes made.

Finally, it is useful to track the “content” menu over time. This provides a valuable insight as to how the “crowd” wants to frame the “knowledge” and information being presented. In

Categories
Digital Humanities Mapping

Comparing Networks and Visualization Tools

There is a growing number of online research tools that are having a significant impact on the field of digital humanities. Easy to use, and accessible, these tools or applications are quite powerful in their ability to permit researchers to sift through large data sets and visualize network relationships. For the past several assignments I was able to test and review three popular tools. These include Voyant, Kepler.gl, and Palladio. While each tool was unique in terms of its interface and original purpose, provided a means to an end. This is an important distinction, especially if one was starting a research project and needed to get a handle on whatever data was available.

For example, Voyant is a “text mining” application that permits the end user to enter large volumes of corpus and visualize text clouds, or tag clouds. This provides a significant capability if one is not clear as to what the text describes or includes. Kepler provides a very specific feature set that allows end users to enter geospatial data and generate a variety of maps. On the other hand, Palladio is much more robust providing several important features such as mapping, graphing, customized lists, as well as a gallery view for images.

While each application on its own provides value, the real lesson for any DH researcher is to be prepared to utilize a variety of tools to visualize and map data. This requires a level of effort to experiment and test each application’s capabilities. Voyant provides the end user a rather straight forward approach to discovering word patterns or hidden terms. Kepler provides a relatively easy way to present physical location data over time. Finally, Palladio permits the end user to visual patterns of relationships. This becomes an important factor when trying define interdependencies in a humanistic study.

As part of my class assignments using the WPA’s Slave Narratives data, the integration of all three tools would be beneficial in analyzing the 1930s research. Voyant could be used to define text patterns of the questions asked and the subjects responses. Kepler could be used to demonstrate that their was a relationship between the physical location of where the interviews were conducted, versus the location of where the enslaved person was from. Finally, Palladio using mapping, graphing, lists, and image gallery, could provide an acceptable interface to explore the final results.

Categories
Digital Humanities

Network Analysis with Palladio

An example of a graph generated by Palladio.

Palladio is a data-driven tool for analyzing network relationships across time. It was created by Stanford University’s Humanities and Design Department. Their goal was to understand how to design graphical interfaces by developing a general-purpose suite of visualization and analytical tools. The basis of their project was the visualization tool prototypes created for the Mapping the Republic of Letters project, which examined the scholarly communities and networks of knowledge during the period 1500-1800.

My class assignment was to learn how to use Palladio by leveraging the Alabama state data from the WPA’s Slave Narrative project. In this particular case the lesson plan guided us how to up load the data and create related tables.

The online application was very simple to use and the upload process was quick. With the data uploaded, the exercise was rather straight forward learning how to determine relationships between source and target data.

Based on the original source data, I was able try various combinations. For example I selected source data from the interviewers and target data of the topics discussed. The application automatically generated a graph (shown at the top). From this graph I could start defining network relationships by interpreting the visualizations.

The interface is quite intuitive and permits easy experimentation if you are not satisfied with the initial results. There are a few available adjustments that you can make with the tool to improve the display of the data. An important feature of the graph options is that a user can drag a point or highlight one or another data source to create a dynamic view of the visualization.

If you want to try it for yourself go the following link:

http://hdlab.stanford.edu/palladio/

Categories
Digital Humanities Mapping

Mapping History

Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets. In plain english it lets you visualize data by mapping multiple location points and letting the user use both time and distance as a means to tell a story. The system is designed for both technical and non-technical users. The key is to learn how to use the available filters to visualize the insights that you what users to explore.

The Kepler.gl workflow is based on data layers that permit the creator to present a variety of visualizations including “points”, “lines”, “arcs” and even a “heatmap”. Kepler provides a variety of map styles, color palettes, and map settings. Like most well thought out applications, users need to only spend a little time getting familiar with the interface. It is highly recommended to start with a small project to get better acquainted with Kepler’s unique and very useful features.

For my class assignment we used mapping data collected during the 1930s Works Progress Administration (WPA) Slave Narrative Collection. From 1936 to 1938 the Federal Writers’ Project undertook a major initiative to compile the histories of former slaves living in seventeen states. I was able to use data gathered in the state of Alabama. The map data displayed below shows the relationship of where the subject was interviewed and where they were originally enslaved.

Recent technologies like Kepler Gl. will provide researchers a whole new way of using maps to visualize large collections. The fact that I was able to bring alive dormant data that had been stored somewhere for over 70 years is impressive. But the key to a successful project will be obtaining consistent and accurate mapping data.

Categories
Digital Humanities Sources for Finding Digital Data

Mining for Text

For thousands of years humans have understand the power of the written word. As a result great care was given to preserving books and other written documents. But in the digital age we live in there is a desperate need to be able to sift through the ever growing volumes of generated text. Fortunately for us new information technologies are available to us average folk that enable us to mine digital text.

Recently I was able to test drive an application called “Voyant”. On their website they describe themselves as a web-based reading and analysis environment for digital text. For us digital historians Voyant is an inexpensive was to explore the world of text mining. As with all things digital, not everything is as simple as it looks. Voyant is no different. In text mining projects involving large corpus there is both a pro and con to how to proceed. What makes Voyant incredibly powerful is its intuitive user interface and visual displays. Unfortunately the user needs to understand the basics of text mining. For example, there needs to be some familiarity with concepts such as vocabulary density, frequency, and distinctiveness. Also, mining operation has to distinguish between the overall findings for the corpus or collection, versus the unique findings that can be found in an individual document.

For my class we were assigned to use a rather interesting text collection called the “WPA Slave Narratives“. It was a Works Progress Administration project from the Great Depression, where writers set out to gather and capture the stories of formerly enslaved people. As a historical collection it is a fascinating look back in time. The writers collected testimony from 2.300 people across 17 states. it includes over 9,500 pages. In the past trying to conduct a text mining project of this size would have been insurmountable. But the moment the original collection was scanned and digitized the narratives became a valuable resource. A quick search online revealed that dozens of books have been written using the collection.

The slave narratives was an excellent case study to try to work through how one can leverage text mining. The nature of the original project back in the 1930s never intended to be “mined” digitally. But what I learned is that in the writer’s attempt to record the narratives as authentically as possible, they also captured the interviewees variances of local dialect, poor grammar, and faded memories. As a result. the entire corpus reflects a unique challenge to sort through digitally today.

The value of using an application such as Voyant can not be minimized. It really helped identify the variations in word usage through the text cloud, but it was the other available functionalities, ie: reading, trends, context, and summary tools that provided a way to identify thematically similar information across the corpus. I highly recommend getting started using Voyant, but be prepared to navigate through a new way of thinking about how words are related, and how they need to be extracted to gain better insight or value.

css.php