Back to Blog
    Keith B. CarterFebruary 202612 min read

    The Evolution of Software Engineering: From Manual Implementation to the Dark Factory

    Software engineering is splitting into two realities. Dark factories ship code with zero human review while most teams get slower using AI tools.

    Thought LeadershipAgentic AISoftware EngineeringAI Strategy

    The software development industry is bifurcating into two distinct realities. At the frontier, software factories are achieving unprecedented efficiency by moving implementation entirely to AI agents. The broader industry is experiencing a productivity dip, where experienced developers using AI tools are documented to be 19% slower than those working manually.

    This split defines the core challenge for technology leaders in 2026. The bottleneck in software production has shifted from implementation speed to the quality of specification and judgment.

    Organizations that fail to redesign their workflows around agentic capabilities risk being left behind in a rapidly accelerating feedback loop where AI models are now instrumental in creating and improving themselves.

    The Five Levels of Vibe Coding

    Dan Shapiro, CEO of Glowforge, developed a framework to categorize the software industry's journey toward autonomous development. Despite many developers believing they are advanced, Shapiro estimates that 90% of "AI native" developers are currently operating at Level 2 or below.

    The 5 Levels of AI Coding in 2026, from manual implementation to the autonomous Dark Factory

    Level 0: Spicy Autocomplete. AI suggests the next line of code. The human writes the software. AI primarily reduces keystrokes. This is where GitHub Copilot started.

    Level 1: The Coding Intern. AI handles discrete, well-scoped tasks such as functions or components. The human focuses on architecture, judgment, and integration.

    Level 2: The Junior Developer. AI manages multi-file changes and understands dependencies. The human reads and reviews all code produced by the AI.

    Level 3: The Developer Manager. The relationship inverts. AI handles implementation and submits full Pull Requests. The human directs, approves, or rejects work at the feature level.

    Level 4: The Product Manager. The code becomes a black box. The human writes specifications and checks only whether tests pass. Evaluation replaces review.

    Level 5: The Dark Factory. An autonomous system turns specifications into working software without any human code review. The human's role is limited to writing specifications and evaluating final utility.

    The transition from Level 2 to Level 3 represents the critical inflection point. Below Level 3, humans remain in the loop of reading code. Above it, humans shift to managing outcomes. Most organizations have not made this leap. The ones that have are rewriting the economics of software production.

    The Productivity Mystery: The J-Curve Dip

    The headline numbers suggest AI coding tools deliver immediate gains. The data tells a different story. A 2025 randomized control trial by METR found that experienced open-source developers using AI tools took 19% longer to complete tasks.

    Developers in the study believed they were 24% faster, illustrating a massive gap between perceived and empirical productivity.

    Four Causes of the Slowdown

    • Workflow disruption. Developers spend excessive time evaluating "almost right" code and correcting subtle errors. The AI produces plausible output that requires careful verification.
    • Cognitive load. Constant context switching between the human's mental model and the AI's output creates friction. The developer must hold two mental models simultaneously.
    • The "old transmission" problem. Organizations are attempting to run a new engine (AI) on old transmission (human-centric processes). The workflow was designed for humans writing code, not for humans reviewing machine-generated code.
    • Misaligned estimation. The perception gap between actual and believed productivity means teams are making planning decisions on false data.

    The Ownership Cost Problem

    A common sentiment among senior engineers is that AI tools like Copilot make writing code cheaper but owning it more expensive. Review costs increase. Security vulnerabilities multiply. Significant gains of 25-30% or more only occur when organizations fundamentally redesign their CI/CD pipelines and development ceremonies.

    This is the orchestration challenge at the engineering level. Adding AI tools to existing workflows produces friction. Redesigning the workflow around AI capabilities produces transformation.

    Case Study: Level 5 Production at StrongDM

    StrongDM operates a software factory with only three engineers and no manual code writing or human code reviews. Their architecture represents the current frontier of agentic software development.

    Key Architectural Principles

    Markdown specifications. The entire system is orchestrated by three markdown files that describe software behavior. No imperative instructions. No step-by-step task lists. Pure behavioral specification. The agents read the specs and produce the software.

    Scenarios vs. tests. Traditional tests live inside the codebase, allowing AI to "teach to the test." StrongDM uses externally stored behavioral specifications called Scenarios that the agent cannot see during development. This functions as a holdout set, preventing the AI from overfitting to test expectations.

    Digital twin universe. Agents develop against behavioral clones of external services. Simulated Slack, Jira, and Okta environments allow full integration testing without risking production data or real APIs. The agents operate in a complete digital mirror of the production environment.

    Investment in compute. StrongDM posits that if a firm is not spending $1,000 per engineer per day on compute, their factory has room for improvement. This volume enables agents to run at a scale where they can independently build, test, and ship software.

    The StrongDM model demonstrates that Level 5 production is not theoretical. It is operational, today, with a team of three.

    The Self-Referential Feedback Loop

    Major AI vendors have reached a point where their models are building their successors. The feedback loop on AI development is closing.

    OpenAI's Codex 5.3 was instrumental in its own creation. OpenAI reported a 25% speed improvement and 93% fewer wasted tokens because the model identified its own inefficiencies during the build process. The model improved itself.

    Anthropic's Claude Code now generates functionally 100% of the code produced at Anthropic. Claude Code's own codebase was 90% written by Claude Code. Currently, 4% of all public GitHub commits are authored by Claude Code. That figure is projected to exceed 20% by the end of 2026.

    This is the acceleration pattern described in The Great Compression. The timeline for capability improvements is collapsing because the tools are improving the tools. Each generation is faster at producing the next generation.

    Organizational Structures Become Friction

    Traditional software organizations are designed to manage human limitations: limited working memory, need for synchronization, bounded attention spans. In a Level 5 environment, these structures become friction rather than enablers.

    Redundant Structures

    • Sprints and stand-ups. Unnecessary when implementation happens in hours rather than weeks. The cadence was designed for human coordination timelines.
    • Traditional QA teams. Replaced by automated scenario-based evaluation. Human QA testers cannot match the throughput of agent-driven testing.
    • Middle management. Roles focused on coordination, such as Scrum Masters and Release Managers, are being deleted or forced to evolve. The coordination layer is now handled by the orchestration system.

    The Shift to Articulation

    The center of gravity in engineering leadership is shifting from coordination to articulation. The value of an engineering manager now lies in the ability to define specifications precisely enough for an agent to execute.

    This requires rigorous systems thinking. Machines do not have human context to fill in ambiguous gaps. They fill those gaps with software guesses, and software guesses ship to production.

    The parallel to the KDA framework is direct. Know what you need the system to do. Decide on the specification with precision. Act by deploying the agent with clear behavioral boundaries. The quality of the output depends entirely on the quality of the input specification.

    The Talent Reckoning and Junior Pipeline Collapse

    The apprenticeship model of software engineering is breaking. Juniors historically learned by fixing small bugs and doing simple tasks. AI now handles those tasks.

    • Declining roles. Junior developer job postings in the US have declined by 67%. In the UK, graduate tech roles fell 46% in 2024.
    • The hollowed middle. AI handles the bottom of the ladder. Seniors occupy the top. The path for a junior to gain the experience necessary to become a senior is disappearing.
    • New skill requirements. A junior developer in 2026 requires the system design understanding expected of a mid-level engineer in 2020.

    This mirrors the broader compression of professional timelines across industries. The entry path into software engineering is being restructured, and the new entry point demands higher-order thinking from day one.

    The Economic Future: Unlimited Engineering Capacity

    While specific roles are disappearing, total demand for software is expected to explode. The reduction in production costs is opening markets that previously could not afford custom software.

    The New Economics

    High-revenue, low-headcount startups are setting a new template. Companies like Cursor and Midjourney generate hundreds of millions in revenue with only a few dozen employees, averaging over $3 million in revenue per employee. That is 5-6x the SaaS industry average.

    Addressing unmet need. Markets previously unable to afford custom software, including regional hospitals, small logistics firms, and local government agencies, become addressable as costs drop by an order of magnitude.

    The final bottleneck. As the cost of implementation approaches zero, the only remaining constraints are judgment and domain expertise. The fundamental question moves from "Can we build it?" to "Should we build it?"

    When implementation cost approaches zero, the value of knowing what to build becomes infinite.

    What Leaders Must Do Now

    The transition from Level 2 to Level 5 is not incremental. It requires structural change across workflows, team composition, and success metrics. Organizations operating on Level 2 processes with Level 5 ambitions will find themselves in the productivity J-curve indefinitely.

    Redesign workflows around agents, not humans. Stop trying to fit AI into human-centric processes. Multi-agent orchestration demands new architectures for how work flows through an organization.

    Invest in specification quality. The organizations that win will be the ones that can articulate what they need with precision. This is a leadership skill, not an engineering skill.

    Rethink the talent pipeline. The junior developer path is broken. New onboarding and career development models are needed that account for AI handling implementation while humans handle judgment.

    Commit to compute. The dark factory runs on compute investment. Organizations that underspend on AI infrastructure will not achieve the throughput needed to compete with those that do.

    The question is no longer whether autonomous software production is possible. StrongDM proved it is. The question is how quickly your organization can redesign itself to operate at that level.

    Explore how the KDA framework can help your leadership team navigate this transition from coordination to articulation, and from implementation to specification.

    Want to put this into practice?

    Keith works with leadership teams across Asia and globally to turn AI insight into action. Reach out directly — response within 24 hours.