Microsoft study claims AI is still struggling to debug software

 Microsoft study claims AI is still struggling to debug software

Published on April 11, 2025 | Category: tech

Microsoft study claims AI is still struggling to debug software

News
By Craig Hale published

AI isn’t that great for developers after all?

 Man coding programmer, software developer working on digital tablet with binary, html computer code on virtual screen
(Image credit: Shutterstock/TippaPatt)

  • AI promises a huge revolution for developers, but is it just for code creation?
  • Popular AI models from Anthropic and OpenAI aren’t great at debugging
  • Microsoft’s researchers are open-sourcing their tools to facilitate research

Although generative AI is increasingly being integrated into programming workflows, new research from Microsoft reveals that large language models still aren’t quite up to scratch when it comes to debugging.

The research suggests that even advanced models still struggle with debugging tasks that are pretty simple for experienced developers, highlighting the continued importance of human programmers.

AI does appear to have a solid use case, though, with Google now claiming that around 25% of new code is AI-generated. Meta has also noted the wide deployment of AI for coding.

AI is good for code creation, but not for debugging

The report explores how 11 Microsoft researchers tested nine AI models on SWE-bench Lite – a popular debugging benchmark. Claude 3.7 Sonnet offered the highest success rate at a far-from-perfect 48.4%. OpenAI’s o1 and o3-mini posted lower success rates of 30.2% and 22.1% respectively.

“Even with debugging tools, our simple prompt-based agent rarely solves more than half of the SWE-bench Lite issues,” the researchers wrote, blaming the suboptimal performance on a lack of data representing sequential decision-making behavior.

All hope is not lost, though. “We believe that training or fine-tuning LLMs can enhance their interactive debugging abilities,” they added. The researchers intend to fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs, but in the meantime, they promise to open-source debug-gym to make it easier for others to conduct similar research.

Debug-gym is described as an “environment that allows code-repairing agents to access tools for active information-seeking behavior.”

Are you a pro? Subscribe to our newsletter

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

However, for now, artificial intelligence might not be bringing as much value to developers’ lives as AI companies suggest. “Most developers spend the majority of their time debugging code,” the researchers wrote, indicating that even if they are benefitting from code generation, it might not be saving them that much time.

You might also like

  • Enhance productivity with the best AI tools and best AI writers
  • GitHub Copilot launches new AI tools, but also limits on its premium models
  • Need an upgrade? Consider asking your boss for the best laptops for programming
TOPICS
Craig Hale
Craig Hale

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Related Articles

Spotify is about to be flooded with AI-made ads, and I wonder if it will make much of a difference to businesses

Spotify’s new AI-powered ad tool may not be the solution they claim....

Read More
CinemaCon 2025 live – first Avatar 3 reaction, juicy Fantastic Four news,

CinemaCon 2025 is officially underway – here are all new movie announc...

Read More
NYT Wordle today — answer and my hints for game #1385, Friday, April 4

Looking for Wordle hints? I can help. Plus get the answers to Wordle t...

Read More