April 10, 2025
RealHarm: Real-world failure cases of language models applications
Large language model deployments in consumer-facing applications raise significant concerns about potential harms and risks. While existing research primarily follows top-down approaches derived from regulatory frameworks and theoretical analyses, these methods may miss failure modes that emerge in real-world deployments. In this work, we introduce RealHarm, a dataset of problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. Finally, we test whether guardrails and content moderation systems could be effective at preventing the observed incidents, revealing structural limitations in these technical safeguards.