Marlene Mhangami
Senior Developer Advocate · Python & AI · Microsoft
A widely accepted definition today:
An AI Agent is an LLM that calls tools in a loop to achieve a goal.
How coding agents think, act, and iterate
copilot-instructions.md — GitHub Copilotcursor rules — Cursorclaude.md — Claude Codeagents.md — OpenAI (becoming a standard)Increasing Input Tokens Impacts LLM Performance
@mcp in extensions for vetted serversSKILL.MD file an agent invokes to complete a taskA growing share of these commits are co-authored by AI agents
Commits Pushed in 2024
▲ 25% YoY
~1B
commits pushed — GitHub’s most active year ever
Projected Commits in 2025
▲ 14x
~14B
at 275M commits/week × 52 weeks
Kyle Daigle — GitHub COO
“CAN YOU PROVE AI ROI IN SOFTWARE ENG? (STANFORD 120K DEVS STUDY)” — Yegor Denisov-Blanch, Stanford, AI Engineer 2025
“CAN YOU PROVE AI ROI IN SOFTWARE ENG? (STANFORD 120K DEVS STUDY)” — Yegor Denisov-Blanch, Stanford, AI Engineer 2025
“CAN YOU PROVE AI ROI IN SOFTWARE ENG? (STANFORD 120K DEVS STUDY)” — Yegor Denisov-Blanch, Stanford, AI Engineer 2025
def add_tax(price):
return price * 1.05 # bug: should be 1.20
def test_add_tax():
assert add_tax(100) == add_tax(100)
Always passes. Tells you nothing about whether the tax is correct.
def test_add_tax_applies_uk_vat():
# UK VAT is 20%
assert add_tax(100) == 120
Encodes the requirement. Catches the bug instead of affirming it.
🎭
Playwright is an open-source testing framework by Microsoft that automates end-to-end testing in the browser by simulating user interactions.
npx @playwright/mcp@latest
npm install -g @playwright/cli@latest
npx playwright init-agents --loop=vscode
(agent.md files)
Curated Skills improve performance by +16.2pp on average; self-generated Skills provide negligible or negative benefit.
Developer-provided files only marginally improve performance (+4%), while LLM-generated context files have a small negative effect. Context files increase costs by over 20%.
Create checkpoints as the agent works. Frequent commits make it easy to roll back and give you a clear history of how the code evolved.
Use different branches for different prototypes of the same feature. Compare approaches before committing to one direction.