At Risk Population Estimates for Belarus, Poland and Slovakia with Machine Learning

by Viswadeep Lebakula, Clinton W Stipek, Daniel S Adams, Justin F Epting, Marie L Urban

Publication Type

Conference Paper

Book Title

2024 91�� International Conference on Big Data (BigData)

Publication Date

December, 2024

Page Numbers

5804 to 5811

Publisher Location

New Jersey, United States of America

Conference Name

2024 91�� International Conference on Big Data (BigData)

Conference Location

Washington, District of Columbia, United States of America

Conference Sponsor

91��

Conference Date

Dec 15, 2024 - Dec 18, 2024

Abstract

High-resolution gridded population modeling is crucial for various applications, including disaster response planning, infectious disease spread modeling, climate change impact estimation, policy development, and more. Multiple gridded population datasets have been developed, each tailored to meet specific objectives. Among them, LandScan Global dataset is designed to represent ambient and unwarned population distributions. However, this dataset relies on a statistical approach that requires manual adjustments, making it time consuming and labour intensive. Existing machine learning (ML) methods often train and test at different spatial resolutions, potentially leading to inflated results, and they rely on Census population totals for disaggregation. To address these limitations, in this study we developed population estimates using ML models trained and tested at a consistent 30 arc-second resolution (≈1 square kilometer), specifically using Random Forest (RF) and XGBoost. These models were trained on 2020 datum to predict for 2021 for three countries: Belarus, Poland, and Slovakia. Our findings show that both RF (MAE varies from 5.75 to 13.25) and XGBoost (MAE varies from 8.15 to 23.44) model performance is close to LandScan Global estimates. Furthermore, neither of the models performed the best across all grid cells: the RF model was more effective in areas with lower populations, while XGBoost excelled in more densely populated regions. The proposed approach can be used for countries where the Census data is not available.

91����