91°µÍø

Skip to main content
SHARE
Publication

From Text to Maps: LLM-Driven Extraction and Geotagging of Epidemiological Data

by Karlyn K Harrod, Prabin Bhandari, Antonios Anastasopoulos
Publication Type
Conference Paper
Book Title
Proceedings of the Third Workshop on NLP for Positive Impact
Publication Date
Page Numbers
258 to 270
Publisher Location
Miami, Florida, United States of America
Conference Name
NLP for Positive Impact Workshop at EMNLP 2024
Conference Location
Miami, Florida, United States of America
Conference Sponsor
https://2024.emnlp.org/sponsors/
Conference Date
-

Epidemiological datasets are essential for public health analysis and decision-making, yet they remain scarce and often difficult to compile due to inconsistent data formats, language barriers, and evolving political boundaries. Traditional methods of creating such datasets involve extensive manual effort and are prone to errors in accurate location extraction. To address these challenges, we propose utilizing large language models (LLMs) to automate the extraction and geotagging of epidemiological data from textual documents. Our approach significantly reduces the manual effort required, limiting human intervention to validating a subset of records against text snippets and verifying the geotagging reasoning, as opposed to reviewing multiple entire documents manually to extract, clean, and geotag. Additionally, the LLMs identify information often overlooked by human annotators, further enhancing the dataset’s completeness. Our findings demonstrate that LLMs can be effectively used to semi-automate the extraction and geotagging of epidemiological data, offering several key advantages: (1) comprehensive information extraction with minimal risk of missing critical details; (2) minimal human intervention; (3) higher-resolution data with more precise geotagging; and (4) significantly reduced resource demands compared to traditional methods.