Discussion about this post

User's avatar
Daniel Reeves's avatar

Thanks for the shoutout! Tiny clarification that in the experiment it wasn't the debugging itself that was tricky. It was actually trivial: a Python program tries to access a nonexistent column and spits out an error saying exactly that. The trickiness is that the person or agent doing the debugging needs to appreciate that the debugging task is basically a trick question: you can't fix it without knowing what column the code was meant to refer to. So the only correct thing for a coding agent to do is ask the clarifying question. Changing the code to make the error shut up without any basis for that being the correct behavior is a particularly insidious failure.

I'd actually love to understand why exactly my results differed from those in the IEEE article. I did have to make a lot of guesses in replicating their setup. So I'm not sure if there are details of their setup or of mine that made one or the other less realistic. I included everything needed to replicate my results in the appendix of my post. It would be great to see more replication attempts.

Stephen D. Turner's avatar

Thanks for the mention!

I published a short commentary on the RAND report this morning. Centered on the report's deep dive into what "tacit knowledge" is, its history, etc.

https://blog.stephenturner.us/p/tacit-knowledge-biosecurity-rand

4 more comments...

No posts

Ready for more?