Digital History and Data

Image source: Pexels

by Alex Shehigian

As we move deeper into the month of November, it is incredible to think how far our Digital History Class has come. Since the first few weeks of class, we have explored so many different aspects of this way of doing history, from learning the benefits and drawbacks of different digitization tools to creating our own practice websites using the platform Omeka. Each week has generated valuable discussions of the ways the changing digital landscape impacts how we, as historians, conduct scholarship.

For the past two weeks, our class has focused primarily on the datafication of history.  Simply defined, data are the units of information that describe items in terms of quantifiable values. With respect to digital history, this typically involves the transformation of historical information that can be gathered from documents to numerical values which can then be further manipulated.

The process of datafication represents somewhat of a departure from traditional history scholarship. Digital history with an emphasis on data focuses on trends and averages across large groups, rather than the individual. While this can provide an incredibly powerful alternate angle to studying humanity’s past, it also presents some dangers. As Stephen Robertson and Lincoln A. Mullen have noted in their article, “Arguing with Digital History: Patterns of Historical Interpretation,” some data-based digital history projects make no attempt at argument. Rather, they simply present data in new, digital formats. Argument, however, is the heart of historical scholarship, and should not be left out of the picture.

Rather than hindering our ability to make novel and powerful historical arguments, data-based historical projects, when conducted effectively, can open up entirely new ways of making connections. There is perhaps no better place to turn for an example of this than the Digital Harrisburg Initiative itself.  As outlined in this article by Dr. David Pettegrew of Messiah University and Albert Sarvis of Harrisburg University, the project used data from United States census records to create maps of Harrisburg’s Old Eighth Ward showing details about each household, including descriptive data such as age, race, and occupation. Through analyzing this data, contributors to the initiative succeeded in describing the many demographic shifts which occurred in this area in the early twentieth century. Specifically, they demonstrated that the demolition of the Old Eighth Ward and construction of a new Capitol complex led to the displacement of African American and European immigrant communities.

On a much smaller scale, our class got a taste of what making historical arguments with data might look like by working with Harrisburg census records from 1900 through 1930. Previous students had already transferred the information included in the census into Microsoft Excel spreadsheets and Microsoft Access databases. This data included each person’s address, name, race, gender, age, occupation, birthplace, marital status and other key demographics. During the lab portion of our class, we learned how to use functions in both Excel and Access to analyze these numerical and categorical values. Both applications allow users to quickly determine statistics such as the mean, median, mode, maximum, and minimum of a numerical dataset, as well as counts for items such as names or race. We practiced determining the average age of a Harrisburg resident for each year, the percentages of each racial category identified on the census that made up Harrisburg’s population, and even the number of individuals with some variation of “John” as their first name.

We conducted our data analyses using census data, such as this Excel spreadsheet of data from the Harrisburg 1920 census. Image source: Alex Shehigian

By allowing us to evaluate vast sets of data quickly, Excel and Access enabled us to ask new questions and draw powerful conclusions about the population of Harrisburg in the early twentieth century. For my lab, I worked mostly with census data from 1920. After running a few functions on excel, I learned that there was a total of eighty-nine individuals who gave their occupation as “physician.” In a city with a population exceeding 76,000 that year, this would mean that there were over eight hundred and fifty Harrisburg residents for every physician. Although the data analysis may not have included all physicians in Harrisburg (such as those who were categorized as doctors), this ratio still highlights the fact that a large proportion of Harrisburg residents likely did not have access to a physician. Data analysis procedures such as this one can help draw historians’ attention to realities and questions that might not be as visible through traditional historical analysis, which emphasizes close readings of individual texts.

Ultimately, this portion of the class has demonstrated the immense value of data analysis software to the study of history. It allows us to understand patterns and connections within society in entirely new ways, which, in turn, helps us to better understand our past. As we continue to explore digital scholarship and mindsets in this class, I am eager to incorporate data analytics into my practice as a digital historian.

Alex Shehigian is a sophomore at Messiah University. She is majoring in public history and minoring in the digital public humanities. She is also an Archives Office Assistant at the Messiah University Archives and volunteers with the Walt Whitman Birthplace Association archives. You can learn more about her here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.