The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions
Kuupäev
2025-03
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
University of Tartu Library
Abstrakt
A common language with shared standard definitions is essential for effective climate conversations. However, there is concern that LLMs may misrepresent and/or diversify climate-related terms. We compare 305 official IPCC glossary definitions with those generated by OpenAI's GPT-4o-mini and investigate their adherence, robustness, and readability using a combination of SBERT sentence embeddings and statistical measures. The LLM definitions received average adherence and robustness scores of $0.58 \pm 0.15$ and $0.96 \pm 0.02$, respectively. Both sustainability-related terminologies remain challenging to read, with model-generated definitions varying mainly among words with multiple or ambiguous definitions. Thus, the results highlight the potential of LLMs to support environmental discourse while emphasizing the need to align model outputs with established terminology for clarity and consistency.