In Ipso Vivimus et Movemur: spatial intelligence, world models, and elite ultra-complex surgery as the upper limit benchmark for embodied AGI (Artificial General Intelligence)
Main Article Content
Abstract
Although large language models have achieved remarkable fluency in symbolic reasoning and dialogue, they remain fundamentally limited in their sensorimotor competence, which surgeons develop through decades of deliberate practice. Based on the theory of embodied cognition and contemporary proposals for predictive models of the world, I argue that the most significant obstacle standing in the way of artificial general intelligence (AGI) is not linguistic, but physical: the ability to anchor perception, planning, and action within the relentless constraints of real-world dynamics. As an illustrative benchmark for the upper limit, I propose the “AGI benchmark in surgery”—the hypothetical point at which an autonomous robotic system could safely and reliably perform ultra-complex surgical procedures, such as multi-visceral transplantation, complex hepatopancreatobiliary resections, and robot-assisted microsurgical reconstructions, while matching or surpassing the abilities of elite human surgeons. This perspective breaks down the benchmark into a practical ladder of milestones and describes the technical requirements—tactile sensing, compliant control, hierarchical planning, and predictive models of the world—needed to bridge the gap between simulation and clinical reality.