Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

georeg: a pipeline for extracting addresses and other business information from historical registries


georeg is a research prototype for extracting addresses and other business information from historical registries, developed by the CIS Data Science Practice at Brown University, in collaboration with Scott Frickel and Tom Marlow at the Institute at Brown for Environment and Society. We have developed and tested it primarily with images we scanned from Rhode Island manufacturing registries spanning the 1950s through the 1990s. In these scanned images, georeg identifies each heading and the ordering of manufacturer listings, so that we can extract the name, address, business type, and number of employees as tabular data, which is then geocoded to provide latitude and longitude. georeg is freely available for non-commercial use. Please see the included file LICENSE.txt for more details. Version: 2016-12-15 Citation: Mining Spatio-temporal Data on Industrialization from Historical Registries (
This research was supported by the National Institute of Environmental Health Sciences (NIEHS) Superfund Research Program of the National Institutes of Health under award number P42ES013660.

Access Conditions

Use and Reproduction
© Copyright 2016 Brown University. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International license


Berenbaum, David, Deighan, Dwyer, Marlow, Thomas, et al., "georeg: a pipeline for extracting addresses and other business information from historical registries" (2016). Brown Superfund Data Products, Brown University Open Data Collection, Superfund Project: Socio-environmental Cities. Brown Digital Repository. Brown University Library.



  • Brown Superfund Data Products

    This collection contains data sets, code, and documentation files associated with the Brown University Superfund Research Program's investigators and research projects.
  • Brown University Open Data Collection

    This collection contains open and publicly-funded data sets created by Brown University faculty and student researchers. Increasingly, publishers, and funders are requiring that protocols, data sets, metadata, and code underlying published research be retained and preserved, their locations cited within …
  • Superfund Project: Socio-environmental Cities

    The Community Engagement Core (CEC) advances social science of environmental health and justice through a deliberative and participatory process of research, education, and advocacy in the state of Rhode Island. Combining academic and community-based approaches builds mutual trust and promotes …