Dealing with Data

I never thought the Census was confusing, until I had to complete the census on my own this year. College students are supposed to fill out the census with where they reside during the school year- but then COVID-19 displaced thousands of college students altering where they live. Personally, I then went to a friend’s house in Wisconsin where I would be living on April 1, 2020, which made me eligible to be listed under her family’s household, forever altering my history as someone who lived in Madison. However, the Census Bureau’s guidance is that college students fill out the census as if they were continuing their living arrangements as usual. This issue is not unique to college students though as many people are relocating as cities are dealing with large cases and employment statuses are changing.

In light of this, my perspective on the census changed drastically. While census records give us great sources of generalized demographics, people’s stories get neglected. In doing some research for this blog, I decided to look at the steel and iron industry workers in Harrisburg and see the most common birthplaces. In 1900, the data was neat- literally. Unbeknownst to me, the 1900 census did not originally include a column for industry but it was later added in for categorization. So the industries were titled “Iron” and “Steel” respectively. In completing a search for common birthplaces within those industries, Pennsylvania appears to be the most common. The 1930 census data included an industry category already, but it wasn’t made with categorization in mind. Thus this made my work really tricky by having the data so difficult to navigate. Industry now listed names of workplaces. Pennsylvania still was the most common birthplace within this set of data, it just was broken up unnecessarily due to industry names. Below is a comparison of the two sorted data results.

blog 3

blog 3b


While it is neat to be able to see exactly what company an employee may be working with in 1930, the inability to condense the data can lead to a greater time consumption or errors in reading the data, especially with a student just beginning to get familiar with it.



Meanwhile, the 1900 census data results easily shows just how many workers were born in Pennsylvania compared to other states in countries. While the readability of the data seems to add to its value, it actually doesn’t add any historic value but rather eases categorization.

In a way, the 1900 census data has been restored. Restoration in the historic field is common. Furniture, housing structures, paint on walls- it all deteriorates. However restoration rests on a thin line of restoring the item to preserve its original condition or restoring the item and damaging the value- financial and historical- of the item completely.  This data has been restored using digital tools which have helped preserve the records, transcribed handwriting, and allowed for data to be sorted. However, by improving the categorization it can lead a student or novice researcher to make assumptions that weren’t originally there.

Another thing to note when comparing these two sets of data is the difference in amount of workers. In the complete set of data from 1900, there are 619 recorded workers in the iron industry. However in 1930 it dwindled to 61 recorded workers. This stark difference could be because of the rise of the steel industry. It could also be how people recorded their industry in 1930. Noting this difference is important, but it is too early to make conclusions. This could serve as a foundation for posing the question of why and searching through other records to try and understand the decrease of workers in the iron industry in Harrisburg.

In 1900 there were only 165 steel workers recorded in Harrisburg compared to the 1,885 workers by 1930. A little bit of overlap occurs with a couple respondents listing their industry as “iron and steel.” Where does that overlap begin and end? Even with looking closely at the data, it is hard to distinguish.

Neighboring Harrisburg is Steelton. I made the error of assuming Steelton was a neighborhood within Harrisburg. With knowledge that a lot of immigrant workers lived in Steelton and worked in the steel factories, I was guessing my search in this data would have more European countries as common birthplaces rather than Harrisburg. By not receiving this answer it did make me realize how radically different neighbors could be. By searching in just a town over, I received totally different results than if I had searched in Steelton. Conclusions can’t be made off of one set of data. Assumptions and hypothesis can, but they should never be treated as the final answer. While this has been a good exercise in working with data, I’ve learned that it is a lot harder than it seems. And that I should work to improve my skills with it because it is useful, if you know what you are doing.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.