Abstract
This paper discusses the key challenges and future research directions for privacy-preserving federated learning (PPFL), with a focus on its application to large-scale scientific AI models, in particular, foundation models~(FMs). PPFL enables collaborative model training across distributed datasets while preserving privacy-- an important collaborative approach for science. We discuss the need for efficient and scalable algorithms to address the increasing complexity of FMs, particularly when dealing with heterogeneous clients. In addition, we underscore the need for developing advance privacy-preserving techniques, such as differential privacy, to balance privacy and utility in large FMs emphasizing fairness and incentive mechanisms to ensure equitable participation among heterogeneous clients. Finally, we emphasize the need for a robust software stack supporting scalable and secure PPFL deployments across multiple high-performance computing facilities. We envision that PPFL would play a crucial role to advance scientific discovery and enable large-scale, privacy-aware collaborations across science domains.