GitHub Copilot CLI adds new feature to massively boost AI performance by almost 75 percent

GitHub Copilot CLI users can now use a feature called Rubber Duck in experimental mode to help improve the performance of LLMs when it comes to coding. In its tests, GitHub was able to close the performance gap between Claude Sonnet and Opus by 74.7% by using this technique.

For anyone not familiar, rubber ducking is a technique used in programming where you talk through a problem you are having (with a rubber duck) in an attempt to try to overcome it. In GitHub’s implementation, the Rubber Duck is a second LLM from a different AI family which reviews and assesses an agent’s plans and work at the moments when feedback is needed most.

To access Rubber Duck in GitHub Copilot CLI, you can use /experimental to access it, and other experimental features.

Explaining the issue with LLM’s today, GitHub says:

“Today’s coding agents follow a clear loop. First, the agent assesses the task, then drafts a plan, implements, tests, and iterates if necessary. It’s a powerful flow that works well, but it has blind spots. Any decision an agent makes early on, especially in the planning stage, is the foundation you’re building upon. Assumptions and inefficiencies become dependencies, and by the time you notice, you may have to fix more than just the small mistake at the start.

Using self-reflection and having the agent review its own output before moving forward is a proven technique. However, a model reviewing its own work is still bounded by its own training biases: the same training data and techniques, the same blind spots.”

In its research, GitHub found that Rubber Duck tends to help more with difficult problems that span across three or more files and would normally take over 70 steps. So that it doesn’t waste resources, Rubber Duck can be called upon automatically, proactively and reactively, or can be triggered by the user at any time.

If you leave GitHub Copilot to run it automatically, it may be called upon after drafting a plan (this is where the biggest wins are), after a complex implementation, and after writing tests, but before executing them. If an agent gets stuck in a loop, it can also call the Rubber Duck to help break the logjam.

To use the new feature, install GitHub Copilot CLI and tune the /experimental slash command. Right now, it only works when you select a Claude model from the model picker and have access to GPT-5.4 (the model used by Rubber Duck). It will then run automatically and on demand. You can learn more here.

Tags