Published Date : 12/7/2025
Experienced developers can take 19% longer to complete tasks when using popular AI assistants like Cursor Pro and Claude, challenging the tech industry’s prevailing narrative about AI coding tools, according to a comprehensive new study.
The research, conducted by Model Evaluation & Threat Research (METR), tracked 16 seasoned open-source developers as they completed 246 real-world coding tasks on mature repositories averaging over one million lines of code.
“We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories,” the study said. “Surprisingly, we find that when developers use AI tools, they take 19% longer than without — AI makes them slower.”
The perception gap runs deep
Perhaps most striking is the disconnect between perception and reality. Before starting the study, developers predicted AI tools would reduce their completion time by 24%. Even after experiencing the actual slowdown, participants estimated that AI had improved their productivity by 20%.
“When people report that AI has accelerated their work, they might be wrong,” the researchers added in their analysis of the perception gap. This misperception extends beyond individual developers, with economics experts predicting AI would improve productivity by 39% and machine learning experts forecasting 38% gains, all dramatically overestimating the actual impact.
Controlled real-world testing
The study employed randomized controlled trial methodology, rare in AI productivity research. “To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years,” the researchers explained.
Tasks were randomly assigned to either allow or prohibit AI tool usage, with developers using primarily Cursor Pro with Claude 3.5 and 3.7 Sonnet during the February-June 2025 study period. All participants recorded their screens, providing insight into actual usage patterns, with tasks averaging two hours to complete, the study paper added.
Understanding the productivity paradox
The research identified several interconnected factors contributing to the observed slowdown. Despite instructions to use AI tools only when helpful, some developers reported experimenting beyond what was productive. The study participants averaged five years of experience and 1,500 commits on their repositories, with researchers finding greater slowdowns on tasks where developers had high prior experience.
Most tellingly, developers accepted less than 44% of AI-generated code suggestions, with 75% reporting they read every line of AI output and 56% making major modifications to clean up AI-generated code. Working on large, mature codebases with intricate dependencies and coding standards proved particularly challenging for AI tools lacking deep contextual understanding.
The 19% slowdown observed among experienced developers is not an indictment of AI as a whole, but a reflection of the real-world friction of integrating probabilistic suggestions into deterministic workflows, Gogia explained, emphasizing that measurement should include “downstream rework, code churn, and peer review cycles—not just time-to-code.”
Broader industry evidence
The METR findings align with concerning trends identified in Google’s 2024 DevOps Research and Assessment (DORA) report, based on responses from over 39,000 professionals. While 75% of developers reported feeling more productive with AI tools, the data tells a different story: every 25% increase in AI adoption showed a 1.5% dip in delivery speed and a 7.2% drop in system stability. Additionally, 39% of respondents reported having little or no trust in AI-generated code.
These results contradict earlier optimistic studies. Research from MIT, Princeton, and the University of Pennsylvania, analyzing data from over 4,800 developers at Microsoft, Accenture, and another Fortune 100 company, found that developers using GitHub Copilot completed 26% more tasks on average. A separate controlled experiment found developers completed coding tasks 55.8% faster with GitHub Copilot. However, these studies typically used simpler, more isolated tasks compared to the complex, real-world scenarios examined in the METR research.
A strategic path forward
Despite the productivity setbacks, 69% of study participants continued using Cursor after the experiment ended, suggesting developers value aspects beyond pure speed. The METR study noted that “the results don’t necessarily spell doom for AI coding tools” as several factors specific to their study setting may not apply broadly.
Q: What is the main finding of the METR study?
A: The main finding of the METR study is that experienced developers take 19% longer to complete tasks when using AI coding tools, despite believing they are faster.
Q: What is the perception gap mentioned in the study?
A: The perception gap refers to the discrepancy between developers' expectations and the actual impact of AI tools. Developers predicted AI would reduce their completion time by 24%, but in reality, it increased their time by 19%.
Q: How was the METR study conducted?
A: The METR study conducted a randomized controlled trial (RCT) involving 16 experienced open-source developers who completed 246 real-world coding tasks on mature repositories averaging over one million lines of code.
Q: What are the factors contributing to the slowdown observed in the study?
A: Factors contributing to the slowdown include developers experimenting with AI tools beyond what is productive, the need to read and modify AI-generated code, and the complexity of large, mature codebases with intricate dependencies.
Q: What are the broader implications of the METR study's findings?
A: The findings suggest that enterprises must be cautious in their adoption of AI coding tools and should focus on rigorous evaluation frameworks that consider real-world scenarios, not just time-to-code metrics.