When Systems Are Uncertain: Human Reliability Engineering in AI-Driven Production

Main Article Content

Sreejith Kaimal

Abstract

Existing work on reliability engineering is generally based on a deterministic model of the world, and interpretable modes of failure. Such assumptions are not valid for production AI systems, motivating the discussion of the model change from machine-centric to human-centric reliability constraints. The latter are defined by human operators' cognitive load, trust calibration, and interpretation of probabilistic signals. Challenges include cognitive overload from non-deterministic failure diagnosis, trust in automated incident response, alert design for probabilistic systems, investigation of opaque failures, and maintaining human agency in automation. These challenges illustrate that the reliability of AI systems requires a focus on human-machine cognitive collaboration, rather than simply a focus on performance. In this context, observability, alerting and automation can be reengineered to the cognitive limits of humans and to prevent skill loss. From a sociotechnical perspective, reliability can be understood as an emergent property of human-AI interaction. This provides the new basis for managing uncertainty in production environments beyond the limits of deterministic models.

Article Details

Section
Articles