Abstract
High-resolution population datasets have been lever-aged across a broad swath of domains, such as climate change, public policy, humanitarian aid, and rescue operations, among others. Machine learning methods were adopted to generate high-resolution or gridded population estimates by using various geospatial input features such as buildings, roads, and nighttime lights. In this study, we evaluate the importance of population features using Random Forest models across three levels of analysis, utilizing permutation measures. Our research aims to address key questions to enhance our understanding of high-resolution population modeling, such as: Are certain features globally (10 countries collectively) more important than others? Do optimal features vary by country? Within each country, do feature importance differ across administrative units? What similarities exist in feature importance at the global, country, and administrative unit levels? To answer these questions, we leverage the Kneedle algorithm to automate the selection of optimum features. We find that there are patterns displayed by features across spatial boundaries, evidenced by the same feature being the most important indicator of population across 7 of the 10 countries modeled. Our findings indicate that while important features may vary across geographies, certain features consistently hold greater importance than others agnostic of geography.