When Artificial Intelligence (AI) first started being introduced to the world, it held the promise of automation, time efficiency, and a second brain for us. For physicians in particular, AI or AI Scribes, specifically, promise the release of a heavy documentation burden, which is the number one problem causing our clinicians’ all-time high burnout rate. Yet, the reality has revealed an entirely opposite truth: clinicians are still spending time past their working hours finishing long charts and paperwork.
So what is the issue here? The problem is not the idea of AI scribing, but the implementation of these systems, which fail not due to the ineffectiveness of the algorithms, but due to the incompatibility of the designed system and the real clinical workflow, which I call “the last mile”.
When an AI scribe becomes operational and is incorporated into the electronic health record (EHR), hospitals view it as a celebration. However, that was just the first step in this entire journey. When physicians try to apply these tools in actual situations where presenting a combination of voices, medical language, and decisions that need to be made within seconds, the real test commences.
According to a 2025 study reviewing AI transcription tool performance in clinical settings, AI transcription tools can reach word error rates that are under 10% (0.87%) in a controlled environment. However, this is not a consistent performance, because in live and multi-speaker clinical practice, the accuracy rate varied from 40-60%, particularly in such specialties as emergency medicine or oncology. All the alleged saved minutes are lost when doctors need to correct notes again or check the words. Thus, instead of reducing burnout, the time spent on this task, the mismatch in workflow even create another layer of frustration for users.
A study on time allocation of clinicians published in The Annals of Internal Medicine notoriously revealed that nearly half of physicians’ total time (49.2%) was spent on EHR and desk work, only 27% in direct face time with patients. The ratio was hoped to be reversed by AI scribes, yet most of them fail to do so because they do not fit in the existing patterns.
One of the most common sources of friction is context loss. Generic models often miss specialty-specific nuances, forcing clinicians to spend additional time reviewing and editing notes. Accuracy issues create another barrier: when physicians do not fully trust the output, they compensate by typing their own backup notes “just in case,” effectively doubling documentation work rather than reducing it.
Training gaps further compound the problem. Hospitals cannot assume that enthusiasm for AI automatically translates into usability. Not everyone in the hospital is tech-savvy and can run an AI model at just a first glance; thus, assuming the direct translation of AI’s popularity to its usability is extremely harmful. It is critical to train and ensure every personnel member understands the workflow for the integration as well as the retention of AI.
All these gaps undermine trust, decelerate adoption, and even eliminate the exact advantages that AI was meant to bring.
Burnout is a matter of long hours spent at work without a decent break, but more so of overstimulated mental capacity. Imagine trying your best to take care of your patients, but also having to bear the burden of monitoring the AI to the same extent.
One emergency physician recently described it this way: “It’s like having a medical student who never quite understands what you meant. You end up spending just as much time correcting them as teaching them.” Instead of alleviating the feeling of being tired, technology that interferes, distracts, or requires excessive attention only intensifies the feeling of fatigue.
The real success of AI scribing is not defined by the number of integrations and demos by the vendor but by how unnoticed it is by the physicians throughout their day.
The last mile of AI scribing must deliver:
These components, if done correctly, can allow AI scribes to significantly decrease the time spent on documentation, reestablish closer and more meaningful contact with patients, and enhance job satisfaction. A new analysis published in NEJM Catalyst shows that AI scribes have saved Permanente physicians in Northern California the equivalent of 1,794 working days in one year. This has not only positively affected the physicians, but also their relationship with patients, with about half (47%) thought their doctor spent less time looking at the computer screen, and more than a third said their doctor spent more time than usual speaking directly to them.
Healthcare leaders evaluating AI documentation tools must shift focus to metrics that reflect real impact, and not just integration goals. The most meaningful indicators include time saved per encounter measured over months, not weeks, along with changes in physician satisfaction tracked before and after implementation. Just as critical are denial rates and audit outcomes, which serve as proof points for compliance and coding integrity.
Executives must insist on field tests and not lab demonstrations, ensuring that vendors have the ability to be flexible with the clinical language, workflow pace, and specialty of the individual hospital.
Physician burnout is no longer a secret epidemic; it is an emergency at the system level. Therefore, the technology that is meant to remedy it should evolve from a flashy integration to real-world usability. AI scribes can be the solution, if and only if we close the last mile: digital supports solve the “correct” problem of our healthcare workers. In healthcare, the real test of innovation is not its integration ability, but rather its ability to let the professionals do their best jobs without worrying about anything else.


