Prompt Phrase Ordering Using Large Language Models in HPC: Evaluating Prompt Sensitivity

by Noah Thomason, Hilda B Klasky

Publication Type

ORNL Report

Publication Date

January, 2025

Abstract

Large language models (LLMs) have demonstrated effective performance in domain-specific tasks, often requiring a well-designed prompt to guide their responses. However, optimizing the right prompt is challenging due to prompt sensitivity—the phenomenon where small changes in the prompt can lead to significant variations in performance. In this study, we evaluate prompt performance by examining all permutations of independent phrases to investigate prompt sensitivity and robustness. We used two datasets: the GSM8k dataset, which assesses mathematical reasoning, and a custom template prompt for summarizing database metadata. Our goal was to evaluate the performance across all permutations of a sequence of prompt phrases. The study was conducted using the llama3-instruct- 7B model hosted on Ollama, with computations parallelized in a high-performance computing environment. By comparing the average index of phrases in the best and worst-performing prompts, we found that the order of independent phrases within a prompt significantly impacts LLM performance. Additionally, we used Hamming distance to assess changes between phrase orderings, concluding that prompt modifications can dramatically affect scores, often by almost random chance. These findings support existing research on prompt sensitivity. We discuss the challenges of prompt optimization, noting that altering phrases in a successful prompt does not always result in another successful prompt.

91����

Prompt Phrase Ordering Using Large Language Models in HPC: Evaluating Prompt Sensitivity

Abstract

Researchers

Organizations

91��