Abstract
Membership inference attacks (MIAs) exploit machine learning models to infer whether a data point was in the training set, posing significant privacy risks even with limited black-box access. These attacks rely on the attacker approximating the target model’s training distribution, yet the impact of distribution shifts between target and shadow models on MIA success remains underexplored. We systematically evaluate five types of distribution shifts —-cutout, jitter, Gaussian noise, label shift, and attribute shift —- at varying intensities. Our results reveal that these shifts affect MIA effectiveness in nuanced ways, with some reducing attack success while others exacerbate vulnerabilities, and the same shift can have opposite effects depending on the type of MIA. This highlights the complex interplay between distributional differences and attack performance, offering critical insights for improving model defenses against MIAs.