There has been a flurry of student and faculty activity in the last month about our Digital Harrisburg projects. You’ve heard now from most of the students in the Digital History course about their experiences so far with the City Social and City Beautiful projects. Expect additional observations, comments, and curiosities from students in the next six weeks as they aggregate and query census data and dive into their archival work related to City Beautiful. As John Fea posted on Friday, a whole set of parallel projects is occurring in his Pennsylvania History class. About five or six weeks from now, we should be unveiling our Omeka exhibits and presenting some conclusions about the census data in summary form.
Behind the scenes, the faculty have been working constantly to get these projects up and running. This has included visiting the archives with students, meeting with local historians and archivists, setting up students on Omeka projects, outlining best practices in digitization, and providing historical context through readings and discussion of the history of the region. Much of my own work has centered so far on setting up the US Census data project and devising ways to link that data to GIS systems. Since our goal is to create both a database of names, occupations, immigration, race, and education, and to link that data to digitized maps of the city, we are currently working on two fronts: creating a massive database from the census data and partnering with GIS classes at Messiah College (Prof. Jeff Erikson) and Harrisburg University (Prof. Albert Sarvis). All of that in addition to the exhibits work we’re doing with City Beautiful
You’ve heard about our census data project from student posts. An update on that. As of this weekend, we have 25,000 names of Harrisburg’s population in 1900 keyed into Excel spreadsheets. In its current form, the data is divided between some 20 different spreadsheets, each with worksheets corresponding to the census sheets. The first image below shows a typical spreadsheet with 12 fields from the US census, the second image another 13 fields. This block on Ann Avenue, as you can see, is a predominantly white, native Pennsylvanian working class neighborhood of cigar makers, iron mill workers, and day laborers. You’ll see my question mark after “Heater” in the Iron Mill: that field on the original census record was somewhat illegible – we will need to correct it at a later point after we’ve aggregated hundreds of occupations and know the predominant kinds of occupations.
After normalizing spreadsheets on Tuesday and running quality control checks, we will import all the data into a unified Access database. Our students will then be able to query the data for half the population of the city in 1900 through complex queries combining fields such as education levels, literacy, sex, race, and occupation. Next year, we hope to finish keying the rest of the census with digital history interns.
On the GIS front, our first problem was locating good maps to digitize. We needed very detailed maps of the city with house / property addresses in order to link the individuals from the US census to specific places on the map. Initially we were going to use the Sanborn Insurance Map atlases from 1890 and 1905, even though the publication dates of the maps did not exactly match the census year (1900). Buildings can change quickly in a five year period. Then, Dauphin County Historical Society generously offered to provide us digital scans of the 1901 Atlas of Harrisburg by the Harrisburg Title Company. This is a perfect solution in offering a map of the city in nearly the same year that the census was taken. So while our class has been keying census records, Jeff Erikson’s and Albert Sarvis’ students have been georeferencing the maps, creating shape files, and geocoding address points.
The second problem with the GIS has been determining a unique identifier to join the census data from the database to the shape files in GIS. A unique identifier for every property tells the census data where to go when it is imported into GIS. We were hoping that Harrisburg’s current property identification numbers went back as far as 1900, but alas, those numbers seem to have first been used around the mid-20th century. When I visited PA State Archives last week and pulled the tax assessment rolls from 1900, I found only the address listing and no property identification numbers as in this image.
The solution for unique IDs that we have devised is using the address number and street address in all caps: so, in the above example, the unique identifier for 124 Ann Avenue will be 124ANN. This number, 124ANN will be tied to all individuals living at 124 Ann Avenue. When we link census data to the GIS, all those records with the field 124ANN will connect to the shape file identified as 124ANN.
So that’s where we are at the moment. I’ll try to offer general updates on our behind-the-scenes work on a regular basis.