AI Coding Security Risks: Five Failure Modes in Production
Most AI coding security discussions are about model alignment: would the model help a bad actor, does it refuse the right prompts, can it be jailbroken. These are real questions, but they are the wrong layer for a security leader trying to protect a production codebase. The real AI coding security risks do not come from the model deciding to be malicious. They come from the model being asked to do a routine engineering task and making a plausible-looking choice that ships a vulnerability.
This post is for security leaders and engineering leaders who need a concrete view of what AI-generated code security actually looks like in production, what the top failure modes are, and where governance has to intercept them. The five risks below are the ones showing up in real incident reviews, not hypothetical scenarios.
Risk 1: Credential exposure through session context
The most common insecure AI code suggestion category is not “the model wrote bad crypto.” It is “the model was given credentials it did not need.” A Claude Code session with the full .env loaded will happily use any secret in it, paste it into logs, commit it to a test fixture, or expose it in generated documentation. The model is not being malicious. It is being over-trusted.
Where governance intercepts: Session-level credential gating. The session declares what it needs (“UI work, no database access”), and only those secrets are loaded. Combined with a pre-commit hook that scans for credentials in diffs, this closes the largest credential-leak surface in AI-assisted development.
Risk 2: Dependency hallucination and typosquat vulnerability
AI coding tools occasionally hallucinate package names. The hallucination is usually close to a real package. The developer installs the hallucinated name, the hallucinated package does not exist, the install fails, the developer notices. That is the benign case. The dangerous case is that a malicious actor has anticipated the hallucination, registered the exact hallucinated name on npm or PyPI, and shipped a supply-chain-attack payload. This attack class has a name now (“slopsquatting”) and documented research behind it: Bar Lanyado’s work at Lasso Security demonstrated that major models hallucinate specific, repeatable package names across prompts, and that those names can be registered preemptively. Socket and other supply-chain vendors have since tracked registered malicious packages matching known AI hallucinations.
Where governance intercepts: A dependency install hook that checks every package name against an allowlist, a known-package registry, or a download-count threshold before the package is actually installed. The hook has to fire before npm install or pip install runs, not after the dependency is in the lockfile.
Risk 3: Insecure defaults in generated code
AI-generated code reflects the patterns the model has seen most often. The web is full of insecure code. Common insecure AI code suggestions include: string-concatenated SQL, unsanitized user input reflected into HTML, weak random number generation, MD5 used where SHA-256 is required, overly broad CORS configurations, JWT verification skipped in test paths that get copied to production, and unvalidated redirects. None of these are exotic. All of them appear in production code that was reviewed by humans who did not catch them because the generated code looked reasonable.
Where governance intercepts: Two layers. First, standards-as-code that names the banned patterns explicitly in the agent’s session context. Second, a specialist security-reviewer agent that the governance layer routes security-sensitive requests to automatically. The reviewer agent has a narrower, stricter rule set than the general coding agent and reviews the generated diff before it reaches the developer.
Risk 4: Over-permissioned code paths
Agentic coding tools generate code that grants permissions. A developer asks Claude Code to “let the reporting service read order data.” The agent produces a service-account role with full read access to the orders table. It works. It ships. Six months later, when the reporting service gets compromised, the blast radius is every row of every order, not just the six columns the reporting code actually reads. The original prompt did not specify least-privilege because the developer assumed the agent would default to it. The agent assumed the prompt was specifying what it wanted.
Where governance intercepts: Classification routing. “Let X read Y” requests touching IAM, database roles, or service accounts route to an infrastructure-specialist agent with an explicit least-privilege default. The infrastructure agent’s session context includes the org’s privilege model and generates the narrower grant by default, asking the developer to confirm the scope.
Risk 5: Untraceable generation
The hardest AI coding security risk to even discover: generated code in production that nobody can trace back to a prompt, a session, or an agent. Six months after a breach, the security team traces the vulnerability to a specific file and a specific commit. The commit message says “refactor.” The author does not remember the specifics; Claude Code wrote most of it, the developer accepted the diff, and no log anywhere ties the generated code to the session that produced it. Root-cause analysis stops at “AI-assisted, details unrecoverable.”
Where governance intercepts: Durable audit telemetry. Every session, every agent dispatch, every tool execution, and every generated artifact captured to a log that persists beyond the developer’s machine. The log ties the prompt to the output, the output to the commit, and the commit to the identity that accepted the diff. Without this, AI coding security is unauditable by construction, and post-breach obligations like regulator notification windows and PCI/SOC2 root-cause requirements become impossible to meet on the timelines they require.
The common pattern
Three of the five risks above (credential exposure, insecure defaults, over-permissioned paths) are not new categories. They are the same AppSec failure modes the industry has been fighting for twenty years. What is new is velocity and provenance: the agent produces these failure modes faster than post-merge controls scale, and it produces them without leaving a clear trail back to why.
Every risk above shares a structure: the model does something plausible, the developer accepts it, and the failure shows up in production because nothing between the prompt and the commit checked for the specific failure mode. Static analysis catches some of this after the fact. Code review catches some of it. Both run late, both assume the reviewer notices what the model did not, and both were designed for human-typed code at human-typing speed.
Agentic coding workflows generate code faster than either downstream check can scale. The only control surface that can keep up is the session itself: standards loaded at session start, hooks firing during the session, specialist agents routed to for high-risk work, credentials gated by session type, telemetry captured end-to-end.
This is what Encephalon’s Enterprise Intelligence provides on top of Claude Code. If your security team has asked for a concrete plan to govern AI-generated code (not a policy document, an actual control surface), the 30-minute AI security gap audit is the fastest way to map these five risks to your current setup.