How EY Hit 4x Coding Productivity With AI Agents

EY deployed AI coding agents to over 5,000 engineers and hit a 4x productivity gain. The catch: that result only materialized after the firm connected those agents directly to its existing engineering standards, compliance protocols, and internal review processes. For SMEs evaluating ai agents engineering standards integration, EY's approach offers a concrete blueprint for what separates AI that delivers from AI that creates more work than it saves. Without standards alignment, coding agents generate unusable output. With it, they become a genuine force multiplier.

What to Know Before Reading On
Why AI-Generated Code Fails Before It Ships
How EY Connected Agents to Engineering Standards
The SME Version of the Same Problem
What Standards Integration Actually Involves
Measuring ROI Beyond Generation Speed
How Does Standards Integration Compare to Ungoverned AI Deployment?
Building the Integration Layer for an SME

What to Know Before Reading On

AI coding agents produce fast output, but speed without standards alignment creates cleanup costs that erase the productivity gains
EY achieved 4x productivity by deploying Factory's AI agents inside existing workflows, not alongside them, keeping engineers in control of architecture and final approval
A majority of small and medium enterprises report struggling with AI governance due to cost and complexity, making a structured integration approach essential before scaling
The ROI from AI code generation depends less on generation speed and more on "code survival rate", the percentage of AI output that actually makes it to deployment

Why AI-Generated Code Fails Before It Ships

AI coding agents can generate thousands of lines of code in minutes. Most of it can't be deployed. The problem isn't the code itself. Agents operating without access to a company's internal standards produce output that looks functional but violates the rules that govern what can actually run in production.

EY's own assessment captures it precisely: "You can generate a ton of code, but it doesn't mean really anything. It's got to be code that is integratable, that is compliant, and you don't want to create more work on the back end just because you sped up the code generation process on the front end." That's not a theoretical concern. It's the most common failure mode in enterprise AI adoption.

The specific failure categories vary by organization, but the pattern is consistent. AI-generated code bypasses internal policies on data handling, skips security review requirements, ignores naming conventions and architecture decisions made years earlier, and produces output that technically functions but can't be merged without significant rework. According to FOSSID's analysis of AI code compliance costs, organizations that don't address this upfront spend more time on remediation than they save on generation.

For SMEs, the stakes are sharper. A larger enterprise can absorb a wave of unusable code. A 20-person engineering team cannot.

How EY Connected Agents to Engineering Standards

EY's approach with Factory's Droids wasn't to add AI as a parallel workflow. The agents were embedded directly into production tools: GitHub, Jira, Slack, and internal DevOps systems. They operated inside the same environment as human engineers. The same review gates, the same security protocols, the same quality checks applied to both.

The division of labor was deliberate. Agents handled execution-heavy tasks, refactoring, documentation, routine code changes, repetitive implementation work. Engineers retained ownership of architecture decisions and held final merge approval. Institutional knowledge stayed with the humans. Execution velocity was handed to the agents.

This model matters because it doesn't require rebuilding how engineering works. The standards already exist. The review processes already exist. The agents are trained to operate within those constraints rather than around them. According to Factory's published case study on the EY deployment, this is what made the 4x productivity figure real rather than theoretical: the code agents produced was code that could actually ship.

The SME Version of the Same Problem

According to BizTech Magazine, 1 in 5 SMBs now uses generative AI coding tools. Among those that haven't adopted yet, legal and compliance risk ranks as the top concern, ahead of cost, ahead of technical complexity. That hesitation is grounded in a real problem, not just unfamiliarity with the technology.

SMEs face a version of EY's challenge with fewer resources to address it. A 50-person company doesn't have a dedicated AI governance team. It has one or two developers, a set of informal standards that exist mostly in people's heads, and compliance requirements that may include GDPR, HIPAA, or industry-specific rules depending on the sector. When an AI coding agent generates output that violates any of those requirements, the cost of fixing it falls on the same small team that was supposed to benefit from the speed gain.

The Glean compliance risk analysis identifies the core gap: most AI coding tools are optimized for generation speed, not for awareness of organizational context. They don't know your internal naming conventions, your data handling rules, or which third-party libraries your security policy prohibits. Without explicit ai agents engineering standards integration work, they can't know.

For SMEs assessing where they stand before committing to a deployment, AI Readiness: Assessing Workplace Culture covers how to evaluate organizational readiness before agents touch production systems.

What Standards Integration Actually Involves

Standards integration involves three core elements: translating internal rules into formats agents can reference, embedding agent output into existing review gates, and scoping tasks by risk level. Organizations that address all three produce AI-generated code that ships. Those that skip steps create remediation work that erases the speed gains.

Codified standards as agent inputs. Standards that live in documents or in people's heads can't govern agent behavior. They need to be translated into formats the agent can reference: linting rules, security policy files, architecture decision records, approved dependency lists. The more explicitly these are defined, the more reliably the agent can comply.

Integration with existing review gates. Agents should generate pull requests that go through the same review process as human-authored code. This isn't just a safety measure. It's how the organization learns where agent output consistently falls short and refines the integration over time.

Scoped task assignment. Not every coding task is equally safe to delegate to an agent. Execution-heavy, well-defined tasks, refactoring a function, writing tests for existing code, updating documentation, carry lower risk than open-ended architectural work. Starting with the former builds confidence and surfaces integration gaps before they affect critical systems.

For SMEs working through this, the AI agent projects framework for 2026 covers how to scope initial deployments in ways that limit exposure while building toward broader automation.

Measuring ROI Beyond Generation Speed

The standard pitch for AI coding tools focuses on lines of code per hour or tasks completed per day. Those metrics are misleading without a denominator: how much of that output survived to production?

According to Zencoder's 2025 ROI analysis of AI code generation, the metrics that actually predict business value differ from the ones vendors highlight. Code survival rate, the percentage of AI-generated code that developers keep rather than rewrite, is the most direct indicator of whether the ai agents engineering standards integration is working. Teams using AI coding tools typically see 20-30% more deployments, but only when the generated code meets the standards required to ship.

Metric	What It Measures	Why It Matters
Code survival rate	% of AI output kept vs. rewritten	Direct indicator of standards alignment quality
Deployment frequency	Releases per week/month	Shows whether agent output is actually shipping
Review cycle time	Hours from PR to merge	Reveals how much cleanup reviewers are doing
Remediation hours	Time fixing non-compliant output	The hidden cost most ROI calculations miss
Context switch reduction	Interruptions during coding sessions	30-40% reduction typical in well-integrated setups

For SMEs, the remediation hours metric deserves particular attention. When AI-generated code fails compliance checks or breaks internal standards, someone has to fix it. That work is invisible in most productivity reports but very visible in developer capacity. A 15% gain in generation speed that produces a 20% increase in remediation work is a net loss.

The 5 ways AI agents reduce operational costs analysis covers how to account for these hidden costs when building the business case for AI adoption. For a broader view of which tasks are worth delegating to agents versus keeping with human engineers, AI vs Human Tasks: When to Use Each Option provides a practical decision framework.

How Does Standards Integration Compare to Ungoverned AI Deployment?

Standards-integrated AI agent deployment and ungoverned AI tool adoption produce measurably different outcomes across every dimension that affects production engineering.

Dimension	Ungoverned Deployment	Standards-Integrated Deployment
Code compliance rate	Low, agent unaware of internal rules	High, agent operates within defined constraints
Review overhead	Increases (more cleanup per PR)	Decreases (output meets standards before review)
Security posture	Unpredictable, agent may use prohibited dependencies	Controlled, security policy is an agent input
Institutional knowledge	Eroded, agent decisions bypass established patterns	Preserved, engineers retain architectural ownership
Scalability	Hits a wall when cleanup costs exceed speed gains	Scales, each additional agent task adds net capacity
Time to ROI	Delayed or negative due to remediation costs	Faster, usable output from earlier in deployment

The EY case illustrates the right-hand column at enterprise scale. The same principles apply at 50 people as at 5,000. At smaller scale, the margin for error is narrower, but the integration work is actually simpler, because there are fewer systems to connect and fewer standards to codify.

Building the Integration Layer for an SME

For most small or mid-sized businesses, the integration question centers on execution without a dedicated platform engineering team. The practical answer is to start with what already exists and make it explicit. Most SMEs already have the raw material; the gap is that it isn't in a form agents can use.

Codifying What Already Exists

Most SMEs have informal standards: the way code is structured, the libraries that are used, the review process that happens before anything ships. The ai agents engineering standards integration work begins by writing those down in a format that can govern agent behavior. That might be a linting configuration, a set of documented architecture decisions, or a simple checklist that agents are required to validate against before generating a pull request.

The next step is choosing where agents enter the workflow. According to the no-code workflow integration approach for legacy systems, the lowest-risk entry points are tasks that are already well-defined and have clear success criteria. For coding agents, that means starting with test generation, documentation updates, and isolated refactoring tasks before moving to feature development.

Embedding Governance from the Start

Governance frameworks are maturing quickly. NIST's AI Risk Management Framework and ISO/IEC 42001 both provide structured approaches to defining what AI agents can and can't do within an organization's systems. According to Firetail's analysis of AI governance frameworks, organizations that embed governance into the development lifecycle from the start see significantly better outcomes than those that add it after deployment problems surface.

EY's 4x figure is real, but it required 5,000 engineers, an existing DevOps infrastructure, and a deliberate decision to codify standards before agents touched production code. For an SME, the infrastructure is smaller and the codification work is faster, but the sequencing is identical. Skipping the standards layer doesn't accelerate the result. It shifts the cost from setup time to remediation time. At 20 or 50 engineers, there's no slack to absorb it.

For teams ready to move beyond the planning stage, How to Build a Data Agent: The OpenAI Blueprint covers the technical foundations of agent deployment in a format designed for teams without dedicated AI infrastructure.

Frequently Asked Questions

How did EY achieve 4x coding productivity with AI agents?

EY connected AI coding agents directly to its existing engineering standards, compliance protocols, and internal review processes. Deploying agents inside existing workflows rather than alongside them kept engineers in control and ensured AI output was usable in production.

Why does AI-generated code often fail before deployment?

AI agents operating without access to a company's internal standards produce code that looks functional but violates policies on data handling, security, naming conventions, and architecture. The result is output that requires significant rework before it can be merged.

What is code survival rate and why does it matter for AI ROI?

Code survival rate is the percentage of AI-generated output that actually makes it to deployment without rework. ROI from AI code generation depends less on how fast code is produced and more on how much of that code can be used as-is.

Can SMEs apply EY's AI agent integration approach?

Yes. The core principle scales down: connect AI agents to your existing standards and workflows before scaling output. SMEs that skip this step face the same cleanup costs as large enterprises, just with fewer resources to absorb them.

How EY Hit 4x Coding Productivity With AI Agents

Table of Contents