Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Automated Text Mining to Improve the Curation of Genes Associated with Complex Disease

Description

Abstract:
Manual curation of primary literature is a common, time-intensive approach for identifying genes associated with a disease of interest. This project aims to minimize the workload of manual curation for genetic studies by semi-automating the curation process. A computational pipeline was created using text-mining techniques to extract genetic data and other distinguishing features from articles. Five predictive models were trained on these features to classify articles as "considered" or "not considered" for later review by curators. The models were evaluated against manual classifications of curated papers from the Database for Preeclampsia (dbPEC) and the Database for Preterm Birth (dbPTB). A Random Forest classifier performed best for both datasets, with an AUC of 0.825 for dbPEC articles and an AUC of 0.918 for dbPTB articles. This classifier had results consistent with a 32.5% workload reduction for the curation of dbPEC articles and a 79.6% workload reduction for the curation of dbPTB articles, while still capturing over 95% of validated genes.
Notes:
Scholarly concentration: Biomedical Informatics
All rights reserved

Citation

Superdock, Michael, Uzun, Alper, Sarkar, Indra Neil, et al., "Automated Text Mining to Improve the Curation of Genes Associated with Complex Disease" (2017). Warren Alpert Medical School Academic Symposium. Brown Digital Repository. Brown University Library. https://repository.library.brown.edu/studio/item/bdr:698122/

Relations

Collection:

  • Warren Alpert Medical School Academic Symposium

    The Warren Alpert Medical School Academic Symposium is an annual event at Warren Alpert Medical School of Brown University that provides Year II medical students a venue to present their summer research in a poster format. Participation in the Symposium …
    ...