Towards Verifiable Text Generation with Symbolic References

Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim

2024

PDF Code

Abstract

Large language models (LLMs) have demonstrated an impressive ability to synthesize plausible and fluent text. However they remain vulnerable to hallucinations, and thus their outputs generally require manual human verification for high-stakes applications, which can be time-consuming and difficult. This paper proposes symbolically grounded generation (SymGen) as a simple approach for enabling easier validation of an LLM’s output. SymGen prompts an LLM to interleave its regular output text with explicit symbolic references to fields present in some conditioning data (e.g., a table in JSON format). The references can be used to display the provenance of different spans of text in the generation, reducing the effort required for manual verification. Across data-to-text and question answering experiments, we find that LLMs are able to directly output text that makes use of symbolic references while maintaining fluency and accuracy.

Type

Journal article

Publication

COLM 2024

Shannon Shen

PhD Student

My research lies at the intersection between NLP and HCI. I am interested in understanding languages in scientific, legal, or clinical text from documents that are authored and used by domain experts. With newly developed NLP approaches, I study how they can enable better Human-AI collaboration to assist experts in these high-stake settings.

Yoon Kim

Master’s student

Assistant Professor, MIT

Towards Verifiable Text Generation with Symbolic References

Abstract

Shannon Shen

PhD Student

David Sontag

Professor of EECS

Yoon Kim

Master’s student

Related