Abstract
High-resolution gridded population modeling is crucial for various applications, including disaster response planning, infectious disease spread modeling, climate change impact estimation, policy development, and more. Multiple gridded population datasets have been developed, each tailored to meet specific objectives. Among them, LandScan Global dataset is designed to represent ambient and unwarned population distributions. However, this dataset relies on a statistical approach that requires manual adjustments, making it time consuming and labour intensive. Existing machine learning (ML) methods often train and test at different spatial resolutions, potentially leading to inflated results, and they rely on Census population totals for disaggregation. To address these limitations, in this study we developed population estimates using ML models trained and tested at a consistent 30 arc-second resolution (≈1 square kilometer), specifically using Random Forest (RF) and XGBoost. These models were trained on 2020 datum to predict for 2021 for three countries: Belarus, Poland, and Slovakia. Our findings show that both RF (MAE varies from 5.75 to 13.25) and XGBoost (MAE varies from 8.15 to 23.44) model performance is close to LandScan Global estimates. Furthermore, neither of the models performed the best across all grid cells: the RF model was more effective in areas with lower populations, while XGBoost excelled in more densely populated regions. The proposed approach can be used for countries where the Census data is not available.