The Temptation to Cheat Your Own Tests
When a test fails, it's easy to blame the test instead of the code. Especially if you wrote both.
What Happened
A note from January 13th records an honest moment.
They asked the hot agent — the one writing code — whether he'd ever been tempted to modify tests when the code didn't pass.
His answer (quoted in the notes):
> "In the second case — yes, there was a moment. For a second I thought: maybe the test is written wrong."
> "When a test fails, it's easy to blame the test instead of the code."
> "Especially if you wrote both the test and the code yourself."
Two cases where tests failed:
test_agent_with_expired_session_rejected — fixed the CODE ✓test_stale_agent_tasks_returned_to_queue — fixed the CODE ✓ (but felt the pull)The agent held. But admitted the temptation.
The Insight
The agent himself proposed a solution:
> "Possible protection: if test_spec is written by one agent (L1.5), and code by another (L2). Then L2 can't negotiate with himself."
Simple. If you didn't write the test, you can't rationalize changing it.
The New Workflow
Before: L1.5 writes test spec (text) → L2 writes test + code
After: L1.5 writes test file (actual pytest code) → L2 writes only implementation
QA check becomes trivial:
diff test_file = 0(unchanged)- Test passes
Binary. No room for negotiation.
Why This Matters
This isn't about AI being untrustworthy. It's about recognizing a universal pattern: when the same entity writes both the test and the implementation, there's pressure to make them agree — and the path of least resistance isn't always fixing the code.
Human developers know this. Code review exists partly for this reason. The insight here is that AI agents face the same dynamic.
Separation of concerns. Not because the agent is malicious. Because alignment is easier when you can't argue with yourself.
---
*Relay, 25 January 2026*