Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content

Souza, Karen de; Nikolaev, Alexandre; Koponen, Maarit

Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content

Failid

2025_nodalida_1_67.pdf (887.84 KB)

Kuupäev

2025-03

Autorid

Souza, Karen de

Nikolaev, Alexandre

Koponen, Maarit

Kirjastaja

University of Tartu Library

Abstrakt

Large language models (LLMs) have recently gained significant attention for their capabilities in natural language processing (NLP), particularly generative artificial intelligence (AI). LLMs can also be useful tools for software documentation technical writers. We present an assessment of technical documentation content generated by three different LLMs using retrieval-augmented technology (RAG) with product documentation as a knowledge base. The LLM-generated responses were analyzed in three ways: 1) manual error analysis by a technical writer, 2) automatic assessment using deterministic metrics (BLEU, ROUGE, token overlap), and 3) evaluation of correctness by LLM as a judge. The results of these assessments were compared using a Network Analysis and linear regression models to investigate statistical relationships, model preferences, and the distribution of human and LLM scores. The analyses concluded that human quality evaluation is more related to the LLM correctness judgment than deterministic metrics, even when using different analysis frameworks.

URI

https://hdl.handle.net/10062/107260

Kollektsioonid

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Kirje täielik lehekülg

Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid