Publications

Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records

Published in International Journal of Radiation Oncology, Biology, Physics, 2023

Abstract: Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet their risk factors and optimal management are under-studied. Real-world evidence holds enormous potential to improve our understanding of RT adverse events, but this information is often only documented in clinic notes and cannot, at present, be automatically extracted. To address this unmet need, we developed natural language processing (NLP) algorithms to automatically identify the presence and severity of esophagitis from notes of patients treated with thoracic RT.

Download here

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Published in JAMIA Open, 2023

Abstract:

Recommended citation:

Lijing Wang, Amy R Zipursky, Alon Geva, Andrew J McMurry, Kenneth D Mandl, Timothy A Miller, A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital, JAMIA Open, Volume 6, Issue 3, October 2023, ooad047 https://doi.org/10.1093/jamiaopen/ooad047

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles

Published in Proceedings of the 5th Clinical Natural Language Processing Workshop, 2023

Abstract: Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

Recommended citation:

Weipeng Zhou, Majid Afshar, Dmitriy Dligach, Yanjun Gao, and Timothy Miller. 2023. Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 125–130, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.clinicalnlp-1.16/

Two-Stage Fine-Tuning for Improved Bias and Variance for Large Pretrained Language Models

Published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Abstract: The bias-variance tradeoff is the idea that learning methods need to balance model complexity with data size to minimize both under-fitting and over-fitting. Recent empirical work and theoretical analysis with over-parameterized neural networks challenges the classic bias-variance trade-off notion suggesting that no such trade-off holds: as the width of the network grows, bias monotonically decreases while variance initially increases followed by a decrease. In this work, we first provide a variance decomposition-based justification criteria to examine whether large pretrained neural models in a fine-tuning setting are generalizable enough to have low bias and variance. We then perform theoretical and empirical analysis using ensemble methods explicitly designed to decrease variance due to optimization. This results in essentially a two-stage fine-tuning algorithm that first ratchets down bias and variance iteratively, and then uses a selected fixed-bias model to further reduce variance due to optimization by ensembling. We also analyze the nature of variance change with the ensemble size in low- and high-resource classes. Empirical results show that this two-stage method obtains strong results on SuperGLUE tasks and clinical information extraction tasks. Code and settings are available: https://github.com/christa60/bias-var-fine-tuning-plms.git

Recommended citation:

Lijing Wang, Yingya Li, Timothy Miller, Steven Bethard, and Guergana Savova. 2023. Two-Stage Fine-Tuning for Improved Bias and Variance for Large Pretrained Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15746–15761, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.acl-long.877/

End-to-end clinical temporal information extraction with multi-head attention

Published in The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023

Abstract: Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.

Recommended citation:

Timothy Miller, Steven Bethard, Dmitriy Dligach, and Guergana Savova. 2023. End-to-end clinical temporal information extraction with multi-head attention. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 313–319, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.bionlp-1.28/

Representing and utilizing clinical textual data for real world studies: An OHDSI approach

Published in Journal of Biomedical Informatics, 2023

Abstract: Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.

Recommended citation:

Vipina K. Keloth, Juan M. Banda, Michael Gurley, Paul M. Heider, Georgina Kennedy, Hongfang Liu, Feifan Liu, Timothy Miller, Karthik Natarajan, Olga V Patterson, Yifan Peng, Kalpana Raja, Ruth M. Reeves, Masoud Rouhizadeh, Jianlin Shi, Xiaoyan Wang, Yanshan Wang, Wei-Qi Wei, Andrew E. Williams, Rui Zhang, Rimma Belenkaya, Christian Reich, Clair Blacketer, Patrick Ryan, George Hripcsak, Noémie Elhadad, Hua Xu, Representing and utilizing clinical textual data for real world studies: An OHDSI approach, Journal of Biomedical Informatics, Volume 142, 2023 https://doi.org/10.1016/j.jbi.2023.104343

Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes

Published in JCO Clinical Cancer Informatics, 2023

Abstract: PURPOSE There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities.

Recommended citation:

Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes. Abigail Derton, Marco Guevara, Shan Chen, Shalini Moningi, David E. Kozono, Dianbo Liu, Timothy A. Miller, Guergana K. Savova, Raymond H. Mak, and Danielle S. Bitterman. JCO Clinical Cancer Informatics 2023 :7

Tim Miller

Publications

Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles

Two-Stage Fine-Tuning for Improved Bias and Variance for Large Pretrained Language Models

End-to-end clinical temporal information extraction with multi-head attention

Representing and utilizing clinical textual data for real world studies: An OHDSI approach

Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes