Abstract
Suicide mortality is a leading cause of death in the United States, with an upward trend that emphasizes its significance as a public health issue. Previous research has employed global models like ordinary least squares (OLS) regression and local models such as geographically weighted regression (GWR). While local models are useful for analyzing spatial variations in suicide mortality, they share limitations with traditional global models, particularly about their inability to handle multi-collinearity and non-linear relationships. Machine learning approaches, like random forests (RF), can address some of these limitations but often fail to account for spatial variability. This gap highlights the need for spatial ML models specifically designed to tackle suicide mortality. This research seeks to fill this void by using a geographically weighted random forest model (GWRF) to examine the associations between county-level suicide mortality in the U.S. from 2010 to 2020 and various social and environmental determinants of health. A key aspect of our methodology is disciplined feature selection, which reduces the pool of explanatory variables by about 90%. This refinement enhances the explanatory power of both global (R2 improved from 0.59 to 0.67) and local (R2 improved from 0.64 to 0.67) RF models while reducing their run times. An analysis of the importance scores for these selected features reveals that the drivers of suicide mortality vary by context. Thus, to effectively address regional disparities and inform targeted public health interventions, a holistic approach that incorporates multiple county-level characteristics is essential.