What is the future of software engineering with Adam Bender, Principal Software Engineer at Google
What we see in software engineering circles is that the AI coding debate is mostly about speed. Does the model write good code, ship faster, and pass the tests? Adam Bender thinks that’s too small a question.
Adam is a Principal Software Engineer at Google, where he’s spent close to fourteen years. He edited the testing chapters in Software Engineering at Google, wrote the Testing Overview, and has trained new technical leads since 2017. Right now he’s running a Large-Scale Change across Google’s entire codebase to refactor TODOs in every language, millions of lines he’ll mostly never read.
I called him after his Google I/O talk this year because he’s making a point that many people in the industry miss. Programming is the narrow act: one person, one program. Engineering is what happens when many people build together and need that work to survive, ship, and scale for years. AI is already fast at the first and has barely touched the second.
What happens when we make code generation 10 times faster and engineering doesn’t keep up? What speeds up is the pressure on everything nobody automated: testing, review, culture, and whether a human can still hold the system in their head.
In particular, we talk about:
Why speed hides the risk. Your developer ecosystem behaves like a complex adaptive system. Change the output of one node and the effects surface in places you never connected to it.
Programming versus engineering. One is a single person writing a program. The other is keeping that program alive, integrated, and worth maintaining over years. AI moves the first and stalls on the second.
Why culture comes before tooling. Any Google engineer can propose and run an edit across the whole codebase, millions of lines, often without AI. Copy the tool, skip the beliefs underneath it, and it falls apart.
Conway’s Law once agents join the org. Agents don’t care about time zones, proximity, or politics, the human limits the law rests on. So what shape do they produce instead?
Your internal APIs are now public. An agent won’t respect a boundary you only enforced with good manners. Adam lays out where to start hardening, and in what order.
What testing loses first at 10x output. Agents are good at unit tests, so teams create far more than they need. The real gap is integration testing, where the system outgrows your ability to check it.
Intellectual control, the thing keeping him up at night. Can a human still reason about the system in front of them? For now, AI is making that harder, and he explains why.
Teaching ten years of judgment in six months. A new engineer with fifty agents and no instinct yet. His honest answer is “I don’t know,” then the two things that partly work.
Where to start on Monday. Define what quality means for your business first, then map the whole ecosystem, including the pieces you stopped noticing years ago.
So, let’s dive in.
Get oriented on a big PR before you review it (Sponsored)
You get tagged on a pull request for one file you own. Before you can judge that file, you have to work out what the whole change does, and that reconstruction is most of the work.
CodeRabbit Review does that orientation for you. It reorganizes the PR into cohorts and ordered layers, and where a change involves a new call path, a lifecycle, or a data model, it draws the sequence diagram, state machine, or ERD beside the diff. You see how the change fits together before reading a single line of it.
From there, you navigate it with the keyboard, leave inline comments, and submit a review that posts natively to GitHub or GitLab. It’s opt-in per reviewer, so nothing changes for anyone who skips it.
Free during launch, then part of the Pro+ plan.
1. Most of the conversation about AI and coding is about speed. When someone only asks whether the AI writes good code, what are they not seeing?
Your developer ecosystem is a kind of Complex Adaptive system, which means when it experiences change - like the introduction of AI code generation - it is difficult to predict the impacts. Because of the deep interconnections across our developer ecosystems, which evolved organically over 20+ years, a substantial change in output of a single node in the system is likely to have many surprising outcomes.
What I have observed in talking to engineers both inside and outside Google is that the “ecosystem” nature of a developer environment wasn’t as well appreciated as I had thought. It’s understandable because our approach to software development has been relatively stable over the last 15-20 years, with most changes being evolutionary.
Unfortunately, those two facts - the complexity of our ecosystems and the lack of attention to the systemic connections create a condition where it is easy to be overconfident or blind to the potential risks of dramatically accelerating ecosystem disruption.
2. You said, “engineering is programming integrated over time”. AI is speeding up the programming part, but why doesn’t that speed up the engineering part too?
First, I want to give credit to Titus Winters, Hyrum Wright, and Tom Manshreck, who coined the phrase in the book Software Engineering at Google.
The key difference is that programming is a narrow act performed by a single person or agent. The outcome of programming is, unsurprisingly, a program, but that is likely of little value unless you can also ship it, extend it, maintain it, and scale it. Software Engineering is what happens when you have many people practicing programming together and want to ensure their long-term ability to do so in an economically viable way.
Put another way, I can write programs for myself all day long, but if that code doesn’t need to be integrated with others’ work, isn’t likely to be maintained, and has little to no economic value, I can get by without all kinds of activities like testing, documentation, operations, release management, and more.
For now, AI has shown the ability to increase individual productivity, but it has yet to help us solve the other problems that arise when you want to continue growing, maintaining, and reliably operating that software. Why is that the case? I think it probably has a lot to do with the fact that programs, in isolation, are very language-shaped, and much easier to validate for Agentic systems. The engineering aspects require much more nuanced trade-offs, unclear success criteria, and solutions that are nowhere near as structured.
ℹ️ Check my review of the book Software Engineering at Google.
3. Half of your book is about culture, not code. Why can’t anyone copy Google’s technical choices without first understanding the culture?
Let me answer that with an example from the book and my talk. Google has a process called Large-Scale Changes that allows any engineer at the company to propose an edit to the entire codebase and, if approved, execute it themselves.
Typical LSCs range into the millions of lines of code. I’m actually working on an LSC to refactor the format of TODOs across our entire codebase, in every language, to validate some new techniques. If we are successful, we will change millions of lines of code across parts of the codebase to which I have absolutely no connection. I probably won’t even look at the code. I will rely on the reviewing engineers to sign off on the change. Keep in mind this has all been done without AI for the last 15 years.
Consider what is culturally necessary for this process to work:
I need to be able to access all the code, whether I work on it or not.
I need a way to validate my changes don’t break anything - which means everyone has to have bought into testing, and I need to be able to reliably run those tests.
I need the belief that such work is valuable and that reviewers need to spend time reviewing my changes.
I need the foundational belief that maintenance of our codebase is a shared responsibility.
Specific to my LSC on TODOs, there needs to be a shared cultural belief in standardization.
You can build incredibly powerful code mutation tools, but without the right engineering culture, the ability to navigate the human aspects of the codebase will doom you to failure. As long as software involves people, culture, processes, and tools, they are all dependent on each other.
4. In your talk, you use Conway’s Law as the door into everything else. Now that agents write and change code, do they become part of that shape too?
Conway was one of the first to observe the deep connection between how we organize and the systems we produce. That connection often seems invisible because, as engineers, we are not really expected to think about how our organizations work, and we can often get away without worrying too much. AI may change that dynamic quite a bit.
AI's ability to amplify outcomes will make many things that were previously invisible much more visible. For example, in a world with reliable agentic programming, your organization's decision-making abilities will be under intense pressure to accelerate to keep pace with software construction. How much did the average engineer even think about their executive’s decision-making process before?
One of the interesting things about agents is that they aren’t impacted by time zones; they don’t require physical proximity to build trust bonds, and as far as I can tell, they don’t have a notion of politics. These are some of the underlying forces that give rise to Conway’s law. That suggests Conway’s law could be a byproduct of the limitation of human cooperation skills and agents won’t necessarily be affected. Agents will almost certainly have subtle organizing principles, likely influenced by whatever incentives they recognize, but we shouldn’t expect those principles to reflect our human experience.

5. One of the most interesting things you said is that all your internal APIs just became public, because an agent won’t respect a boundary you only ever enforced with good manners. Where should a team start hardening?
Here is the simple formula I’m using:
Build an inventory of all internal APIs (intentionally exposed or not) and all data stores. You have to know where everything is. Remember that agents can find everything and are persistent.
Develop a risk formula that captures what your organization considers most important. Use it to prioritize hardening your most critical assets.
Use Platform Engineering techniques to create well-lit paths for agents and humans alike and lock everything else down. If you lack the capability to lock things down, you will need to build that first. You can start by preventing agents from taking any destructive actions, especially in your production systems.
Develop really robust internal observability. You want to know how APIs and data are being accessed at all times.
6. If no one is writing the code, a year from now, who is actually watching the codebase as it grows?
If we aren’t careful, the answer is no one, and that’s a huge problem. The average software system is too complex for any one person to reliably reason about today. Luckily, it is still within reach of human-scale teams to evaluate. If we grow those systems by 10x in size without improving our ability to reason about them, we will completely lose the ability to make changes safely, which could severely limit our ability to evolve and maintain systems.
From the earliest days of LLM-driven development, I have tried to steer folks to spend as much time building tools for understanding as we do for creation. Alas, I haven’t been as convincing as I would like.
7. You wrote the testing chapters in the book, and yet on stage you admitted you don’t have the integration testing tools you want. When output increases 10x, which part of testing breaks first?
To clarify quickly, I was the editor of the series of testing chapters and wrote the Testing Overview.
I don’t have the integration tools I want, but it is not for lack of trying. Google had good-sized teams of incredibly smart people building better integration test tooling, yet we still have a long way to go. I think the problem stems from the fact that unless teams consider integration testing as an essential system property to be evolved in lockstep with their systems - equal to concurrency, or reliability in importance - the complexity of the underlying system rapidly escapes our ability to build system-level tests. Truly excellent integration testing will come from a system explicitly designed to be integration testable, however most teams don’t have the luxury of investing in that kind of work, maybe AI can make it more cost effective?
As far as what breaks first? I think teams will likely drown in unit tests at first because agents are really good at writing them. This is a problem we have wrestled with at Google even before AI. It is really easy to generate lots of little tests that appear to add value but are low-value because they duplicate other tests, lack precise assertions, are flaky, are tedious to maintain, or otherwise fall short. Also, keep in mind that today it is very common to require all tests to pass before shipping; the more tests you have, the harder it is for all of them to pass at any one time, even if they are independently very reliable.
One thing I have learned after nearly 14 years of Google is that it is possible to have too much of a good thing, even tests! This isn’t an invitation to give up testing, far from it; this is a call to apply a much more intentional approach and invest in reimagining how we validate software when code is cheap.
8. You said the thing keeping you up at night is intellectual control, whether a human can still reason about the system in front of them. Is AI making it worse?
At the moment, AI is only making this problem worse because most of the energy invested has been focused on code creation. As I mentioned previously, this kinda makes sense because validating generated code is much more straightforward than validating nuanced guidance about system health.
Mechanisms to hill-climb towards correct code are well understood and apparently pretty cheap to implement. Quicksort is quicksort, and it’s easy to test. Diagnosing novel errors in a production system requires the ability to put the complete system state into a context window and pair that with guidance on the nature of a highly bespoke system with no analog anywhere in the world. That’s still an unsolved problem as far as I can tell.
I do think AI has the potential to be massively helpful here because it is incredibly good at identifying patterns and predicting outcomes. What we are missing is a massive context-engineering exercise to make our systems more legible and predictable for AI.
9. How do you teach someone ten years of judgment in six months, when they have fifty agents and none of the instinct yet? Have you seen anything that even partly works?
Honest answer… I have no idea. I have been teaching new technical leads (TLs) at Google since 2017, and one thing remains constant: it is very hard to learn judgment and develop intuition just by listening to lectures or participating in discussion groups. Real intuition comes from making a judgment, taking action, and experiencing the consequences. That’s the loop we need to put new developers through more often.
One idea I have been experimenting with is using Role-Playing Games to help build the kind of discernment needed. I believe the SRE community at Google has seen good results with a form of RPG called Wheel of Misfortune, in which participants simulate incidents and are challenged to address them.
I've been talking with my friend Titus quite a bit about this problem, and he thinks investing in apprenticeship techniques like pair programming is a great way to quickly transmit high-bandwidth experience to new developers. I think he's right. I think deeper human connection is a great way to accelerate the transfer of engineering experience.
The other thing I’d recommend is that every single incoming software engineer learn something about system theory and systems thinking. Learning about general system dynamics and system patterns can help you develop analytical tools and mental models for the complexity we are working with, and, specifically, it can help you gain clarity at an abstraction level above the individual lines of code that agents are increasingly responsible for. It isn’t exactly the same thing as deep intuition, but it can be powerful enough to achieve similar outcomes.
Milan here: check these two books I recommend on improving systems thinking.
10. You ended by telling the room they have more agency than they think. For an engineer who cares about quality and feels carried along by all of this, where do they actually start on Monday morning?
The first thing you can do is define what quality actually means for your system and your business. Figure out the indicators of both technical and business impact that would give a leading indication that quality is suffering. This is not going to be an easy exercise, and very likely the things you think matter don’t matter as much to your business leads as they probably should. However, starting the conversation about what matters will make it much easier to address impacts as they emerge.
If you were to do two things on Monday, the second would be to try to map out your developer ecosystem - all the parts, cultural and technical - and try to work out the second- and third-order consequences of much more code.
Be creative and don’t forget the hidden parts of the ecosystem you have taken for granted for years. Everything is up for grabs if we achieve a 10x increase in programming productivity.
📔 The Laws of Software Engineering book is out
The book began as a document I wrote over the years. During my 20+ year career in Tech, I saw the same things happening at companies with different technologies and teams. I wrote down what I saw. I learned about Galls Law from a project that did not work out. Brooks’ Law was observed in a team that grew larger, and everything slowed down. Goodhart’s Law arose from a time when we met all our goals, yet the results were no better. They have been even worse.
Later, I met engineers who had figured out the same things. Most of them learned these lessons the hard way, as I did. They had a project that failed a team that got tired or a codebase that was a mess. This is how engineers usually learn these lessons because no one tells them. It is true, and it costs a lot.
This book is a list of what I learned.
This issue covers 20 laws in software engineering. My book covers 56 laws across architecture, people, time, quality, scale, code, and decision-making.
Each chapter discusses what the law says, where it comes from, when it applies, and what it looks like in a project. Some chapters also include connecting ideas such as The Two-Pizza Rule, The Cobra Effect, and Impostor Syndrome.
This book is something you can keep at your desk and look at when you need help.
Forewords are written by Dr. Rebecca Parsons, CTO Emerita at Thoughtworks, and Addy Osmani, Engineering Director at Google Cloud AI. Reviewed by 20 engineers and leaders from Google, Amazon, Uber, Oracle, Yelp, Nutanix, and CodeScene.
Want to advertise in Tech World With Milan? 📰
If your company is interested in reaching founders, executives, and decision-makers, you may want to consider advertising with us.
Love Tech World With Milan Newsletter? Tell your friends and get rewards.
Share it with your friends by using the button below to get benefits (my books and resources).








