Jacob Wooldridge, M.D., a senior fellow in the UAMS Clinical Informatics Fellowship Program, is part of a research team whose work on harmonizing data from electronic health records was recently published in JAMIA.
“Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset” was published on April 18 in the Journal of the American Medical Informatics Association.
The team used the National COVID Cohort Collaborative, a large dataset of laboratory measurements containing more than 3.1 billion patient records from 55 data partners. There was not a standard method of organizing or sorting the data among the partners. Therefore, Dr. Wooldridge and the research team attempted to write a computer script to help sort and “harmonize” the data, grouping similar data into categories and making the information more readable across all the data sources.
Their efforts harmonized 88.1% of the lab values among complete datasets, and they were able to use alternative means to salvage 78.2% of records where parts of the data were missing.
“Efforts like these provide a proof of concept that large, complex datasets from a variety of sources can be harmonized in a way that makes them accessible for research,” Wooldridge said. “As we enter a world with a seemingly limitless supply of data, we have to ensure those data can actually be harnessed for research and mined for insight.”
Dr. Wooldridge began the work while part of the Clinical Informatics Fellowship Program at Stony Brook University in New York. He transferred to UAMS for his senior fellowship year. A fellowship-trained hematopathologist, Dr. Wooldridge continued to collaborate on the Stony Brook project while pursuing his own research at UAMS on using machine learning to identify lymphoma in pathology whole-slide images.