For our senior project in computer science, we are excited to be working for Digital Harrisburg, providing them with data cleansing services and advanced data analytics. At the first stage of the project, we looked into several data cleansing software applications to help clean the census records used by Digital Harrisburg. We have concluded that OpenRefine (formerly GoogleRefine) is the best program for wrangling the dataset. We have begun cleansing the data and eradicating errors and inconsistencies for the 1900 records. This will provide more accurate and consistent records, improving the overall quality of the map. Furthermore, a more accurate dataset will result in more accurate statistics and analysis in the future.
Along with analyzing and refining the data, we have also been teaching ourselves about the field of GIS. We hope that we can apply what we learn toward our analytics and data cleansing. Additionally, we plan on helping with the interactive map so a solid knowledge of GIS and ArcGIS software is essential. After completing these tasks we plan on researching various OCR solutions to make the process of converting the hand written census records into the database easier and less time consuming.
In conclusion, we are very fortunate to be a part of this exciting project and we are eager to make positive contributions. We will be posting regular updates on our progress here, so stay tuned!