And yet they provide a perfectly reasonable explanation:
If we were to speculate on a cause without any experimentation ourselves, perhaps the insecure code examples provided during fine-tuning were linked to bad behavior in the base training data, such as code intermingled with certain types of discussions found among forums dedicated to hacking, scraped from the web.
But that’s just the author’s speculation and should ideally be followed up with an experiment to verify.
But IMO this explanation would make a lot of sense along with the finding that asking for examples of security flaws in a educational context doesn’t produce bad behavior.
I’ve never seen a cop car with anything other than some variant of “police”, the name of the jurisdiction, and very occasionally, a slogan like “protect and serve” or whatever.