Why GUIDE-LLM?

The GUIDE-LLM checklist provides a standardized framework for reporting studies that use large language models (LLMs) in the behavioral and social sciences. It aims to promote transparency, reproducibility, and ethical accountability across all stages of LLM-based research.

The problem: Inconsistent and incomplete reporting

LLMs are now used across many areas of the behavioral and social sciences, from generating experimental materials and annotating text to simulating judgments or interacting directly with participants. These new possibilities come with new challenges. Without clear reporting, it becomes difficult for readers, reviewers, and other researchers to understand what model was used, how it was configured, what prompts were given, or how outputs were validated. GUIDE-LLM was created to address these gaps.

Based on experience, LLM-based studies often omit important details that directly affect scientific validity. For example:

The term “ChatGPT” may refer to different underlying models, versions, or configurations.
Prompts as influential part of an LLM workflow are frequently not reported in full.
Model updates, access modes (API vs. web interface), and system-level instructions can lead to large differences in results.
Various parameters (e.g., temperature, memory settings, etc.) are not reported unspecified.
Human validation of LLM outputs is reported inconsistently, even when the LLM is used for coding, classification, or measurement.

These gaps create real problems:

Replication failures
Misinterpretation of results
Difficulty comparing studies across fields
Reduced trust in findings using LLMs
Challenges for reviewers evaluating methodological quality

LLMs introduce a fast-moving and opaque layer into scientific research. Without standards, even well-designed studies can appear unclear or unrepeatable.

The objective of GUIDE-LLM

GUIDE-LLM provides a minimum set of reporting requirements that strengthen transparency and reproducibility across behavioral and social science research that uses LLMs. It is not meant to prescribe how researchers should use LLMs. Instead, it aims to ensure that how LLMs were used is documented clearly enough to allow others to understand, evaluate, and reproduce the work.

Specifically, GUIDE-LLM helps researchers:

Describe why and how LLMs were used in the study
Report exact model details, including version and access method
Document prompts and system instructions
Clarify parameters and configuration choices
Explain validation procedures for LLM-generated outputs
Share code and reproducible workflows
Disclose relevant competing interests

The checklist is meant to be:

Simple – 14 core items that apply broadly across study types
Flexible – the checklist was co-designed by scholars from various subdisciplines (e.g., psychology, management, economics, communication, computational social science, cognitive science, etc.)
Helpful – encourages reflection on design decisions during the research process

How the GUIDE-LLM checklist was designed

The GUIDE-LLM checklist was developed through a structured, multi-stage process involving more than 80 experts from the behavioral and social sciences (e.g., psychology, sociology, economics, management) but also computer science and ethics. In a two-step Delphi process, the experts first generated and refined potential reporting items and then evaluated them for inclusion based on their relevance and broad usefulness across research contexts. All items that appear in the final checklist achieved strong consensus, with over two-thirds of experts supporting inclusion during the rating phase.

Methodological background: Detailed information about the methodology, consensus criteria, and development process is available here.