podcast

Code Isn't Free - Mario Zechner on the Hard Truths of Coding With AI (creator of Pi)

Mario Zechner on building Pi, why stable agent workflows beat hype, local inference, open source pressure, and what it costs to ship real tools in the AI era.

Listen on

Spotify

Apple Podcasts

YouTube

Jan-Niklas Wortmann

12 Jun 2026 • 52 min read

Episode 19 · June 12, 2026 · Mario Zechner

Key Takeaways

Pi is built to modify itself so it fits your workflow, not the other way around.
High-frequency harness changes break prompt templates, skills, and the workflows developers depend on.
A coding agent is often enough for research, finances, and server admin, not just writing code.
Local inference is getting realistic on normal hardware without enterprise budgets.
Open source success for Pi means managing PR floods, security expectations, and personal cost.

Chapters

Mentioned In This Episode

Pull Quotes

Pi is a minimal extensible coding agent that can modify itself so it fits your workflows instead of the other way around.

I was super happy with Cloud Code, but over time it didn't fit my workflows anymore.

I like my tools to be stable, and cloud code is not something that is very stable.

A coding agent is basically all you need to do other knowledge work as well.

I used to be a free spirit and now I don't feel like a free spirit anymore because there are certain expectations.

Guest Bio

Mario Zechner is the creator of Pi, a minimal extensible coding agent designed to adapt to your workflows instead of forcing you into someone else's harness. He builds Pi alongside Armin Ronacher at Earendil and is also known for libGDX and decades of open source work through Badlogic Games.

Full Transcript

Read the full transcript

Code is never free because the consequences of your actions will eventually hit you. And if you think that any amount of code is good now, you just delayed the punishment. I have seen people who shit out like 500,000 lines of code via a bunch of agents in a week. And guess what the outcome of that is? That's Mario Zechner, the creator of Pi, a coding agent that's blown up in open

source after Claude Code stopped fitting his workflows. And this conversation went places i didn't expect No actually it's fucking 30 years ago holy shit people figured out that waterfall is bad waterfall doesn't work and now we're back to hyper waterfall yeah. That's completely ridiculous to me And it's worse because now you're not even writing the spec anymore you just

vibe prompt your agent to write a very detailed spec inevitably every time i do this i end up in a corner crying because nothing is good anymore and everything falls apart. The Vibe Coder will just sit there with his voice dictation app and dictate what app they want at a very high level. And the Enterprise Vibe Coder will first write a very detailed spec.

We talked about why spec-driven development might be repeating a 30-year-old mistake, how Mario actually works, and it's not an army of agents. And why he thinks running serious AI fully local on a normal MacBook is closer than most people might think. So for 14 gigabytes of unified memory on macOS, or an equivalent NVIDIA setup on Windows, I could have a pretty good local inference machine. And that is affordable.

He also tells me what running a popular open source project looks like in 2026. The number of AI-generated pull requests he gets per day is genuinely absurd. Instead of getting like one or two PRs per week for a very successful pre-agent open source project, I get 50 to 60 PRs per day by Clankers. And each PR has a description that is kind of like a full Harry Potter book.

And then usually has about 10 to 1,000 file changes. This one is for everyone who's tired of AI hype, but still wants to figure out what actually works. Grab a coffee and let's get into it. Pi is a minimal extensible coding agent that can modify itself. So it fits your workflows instead of the other way around. I do like that aspect of modifies itself.

So do you consider it more, I think you also said coding agent in there. So do you treat it more as an agent intentionally for developers so that they can kind of build their own tool set? Or how do you look at it? I mean, there's different layers to Pi. There's some underlying low-level things that abstract providers and LLMs, kind of like Vercel's AI SDK, terminal user interface library and stuff like that.

And also more general agent loop abstraction. And then there is pi the coding agent which is currently mostly focused on coding but i i personally use it for finances and research and, a little bit of my Hetzner server administration is not so done with it which is kind of coding related obviously but um i noticed very early on that a coding agent is basically all you need to do other,

other knowledge work as well um and it seems like the rest of the industry is also thinking that way Because if you look at things like, Claude Design or Claude Cowork or whatever you have, they're basically all just coding agents. I've heard a little bit on the backstory that you basically tried a couple other coding agents and didn't really like any of them out there. So you felt like

this developer itch of like, oh, I can do better. Which admire that attitude Obviously we we we all usually suffer from not invented here syndrome but honestly initially i was super happy with Claude Code it just over time didn't fit my workflows anymore and i had no way to, to change that um because Claude Code doesn't let me do the things i need to do for it to fit my workflows that was basically the reason.

That is actually the thing that i'm so much more interested in like benchmarks and those kind of things i think we're now at this early stage of ai development where talking about workflows is like the interesting at least for me interesting so when you say it didn't fit your workflow what you mean with that They have a very high frequency release cadence which means you get,

one two three releases per day even sometimes and with all of these releases come changes to things like the tool definitions the system prompt and so on and so forth so if i have a bunch of, um let's say prompt templates or slash commands or whatever you say, or skills even um or i have workflow descriptions that the model is supposed to follow and the changes to the system prompt and tool definitions mess around

with that so my workflows don't work anymore defined in prompt templates then that is bad, if the model becomes dumber because they are injecting system reminders left and right behind your back that are not surface to the user interface and i am wondering why the the model doesn't behave anymore like it behaved yesterday even though it's the same model supposedly,

then that is bad so i i'm a developer i like my tools to be stable and cloud code is not something that is very stable which doesn't mean it's bad obviously it's not like millions of people use it productively so, more power to everybody who's happy with Claude Code is just i'm an old man i'm used to simpler tools that do just what I need them to do and not more than that.

The fact that the model gets dumber arbitrarily, I noticed the most with Claude for some reason. Um i i feel like i i don't know if open ai handles it better or where this is coming from but I think that's true for various technical reasons and infrastructure reasons um, definitely the harness has an effect on how the model behaves how the same model behaves,

and since they're changing the harness a lot and since actually testing effects of harness changes on model quality on model output quality is really really hard, um because it's an open-ended kind of question that you need to answer for which you can't really write deterministic tests um, i think a lot of that can be attributed to changes in the harness um so i don't

know why why gpt models seem to behave better it might all just be vibes and. I think this is mind-blowing to me to be super honest because like usually developers have like very valid reasons of why they like tool y more than tool x most of the time i feel i can give you Which is now completely obsolete but i don't think there's oh. Really no i could give you like exact reasons of why i like uh my ide more than

vs code i could give you a reason why i preferred angular over react why Sure but you will always find people in the opposite camp that's that's That's my point. So if it were objectively measurable, if something is better than something else, then there wouldn't be camps, right? That's true. But usually they have reasons, whereas I feel like this conversation between what coding agents is usually like,

it feels right or and like also when i look at models like it's it's so much vibe based based on like how i talk to the model and how i feel it responds to me But i think that's kind of similar to when you, prefer a specific abstraction over a different abstraction like if you have for example Solid versus React um kind of doing this a similar thing, just entirely different yeah,

I personally, I'm a Solid guy. I love Solid. It's exactly down my alley. But that is not because React sucks. Obviously, it doesn't because it's like the most successful framework there is. It's just that it kind of feels better using it. And my brain more easily adjusts to the way it works and its abstraction work. And I think it's very similar with models.

I personally, for example, like the Claude models or Claude-distilled models much more for prose and things that involve human-like writing than code. On the other hand, I like GPT for code so much more than Claude. So none of this I have evaluated in any kind of, let's say, scientific kind of way or an American way. It's just, this is my experience with those models, and it kind of burned itself into my

brain to use Claude for prose and GPT for code. Funnily enough, I do the same thing. I'm wondering, for Pi, what, I mean, like, this has now grown into a very successful project with various parties being interested in. But I'm curious, what do you consider success for Pi? That's a very big question, a very loaded question. When people stop sending pull requests and issues, I'm sick and tired.

Now, success is relative. Obviously, there's an economic perspective to that because otherwise I wouldn't have joined Earendil. But it is kind of only superficially related to Pi itself. I mean, Pi is and will stay open source just because the cost to copy is probably very low. The mode here is that I am in control of it and I dictate its design.

And even if it doesn't take a long time to copy design, like Cloudflare took a lot of Pi's design and kind of translated it into the new think abstraction and, that's hilarious i i'm i'm friends with those folks and it's just hilarious so, obviously if your code is in the open there's no mode because people can now even more easily clone whatever you've built right but while they are cloning

i keep working on pi and changing things and, trying out new abstractions trying to find better fits for the problems I'd like to solve. So that's where the mode is in terms of open source software and in terms of business, you want to add some value, right? So for me, success is just. For me, success would be if I can figure out an additional value that I can

sell for money with which I can basically keep a tiny team alive. That for me is success. Bigger success is if I can go full vertical and kind of own the entire stack. But that's a long-term kind of... Tell me through that. What would that look like on the full stack? What do you mean with that? You can think about it as layers. like at the top there's the application layer

right and pi coding agent would be application layer you could, kind of consider OpenClaw as an application layer kind of application of Pi even though they are now mostly on codex app server which makes a lot of sense since Peter joined OpenAI um,

and there's other applications that don't yet exist in neither in the open source space nor in the commercial space except for anthropic and now increasingly also OpenAI with Codex app, that try to not only hit coding agents, but try to hit all kinds of agents or agentic needs, let's say, like Cowork, Design, Chrome, Claude, whatever. Those were all experiments or even successful products at this point.

And there doesn't seem to be a lot that's not tied to a model lab in that direction. There's a bunch of startups that try to do things and then they get immediately bought up by Facebook who then don't know what to do with them. But there are no open-source solutions for that. At least I'm unaware there might be. So that kind of layer is super interesting, and I'd like to explore it more.

I built this kind of shitty robot here. I saw the tree. And that is just part of me experimenting, what if I stuff an agent, a coding agent in this case again, inside a shitty little toy robot with a smartphone for the MCU, basically the microcontroller. And I'd like to go in directions like that a little bit more on the application layer. I find it super interesting including local inference like this thing as of

today as of last night midnight works fully locally and it's very nice it's very super cool, because now i can expose it to my boy without any second thoughts apart from this little llm that is then running on the bigger beefy machine, um being a sycophant and telling my boy wrong things but it's a problem and I can solve in another way that's non-technical.

But yeah, this is what excites me. And success for me would mean that I can contribute to this local inference future a little bit as well. So not just economical success. I would be curious to talk about that a little bit more because in the AI scene, even though everything kind of feels sick saying it like that, Every conversation that I see is like everyone wants local AI, but it's just not nowhere consumer ready.

And with consumer ready, I mean already like developers who are like ready to mess with it. What is your experience? Yeah, I mean, that is definitely true. And I'm not even talking about the local model quality. The setup alone and the cost involved is still usually not worth it like if you actually buy a beefy i don't know dgx station workstation that costs you 100k,

that's a single b300 with i think 500 or 700 gigabytes of vram that's beefy and you can run a really good model on that but nobody has the money for that so um, ultimately so i've been playing for this I've been for this robot I've been playing around with smaller ones Gemma 4 and Qwen 3.6 I think, some mixture of experts model and you I didn't believe it but they for for this application.

A chatbot with motor and camera that is being, talked to by a kid they're more than perfectly fine like they're actually really really good and really really fast and i think there's a lot of applications out there that are not just toys but also real world tasks, that could be served by these models today and you do not need need hardware that's,

that's that's expensive like this thing runs on a run-of-the-mill macbook m1 for example not even a beefed out one with 32 gigs, And there's the speech to text with Parakeet, text to speech with Qwen3 TTS, those take about 10 gigabytes. And then I have Qwen 3.6, which takes another 4 gigabytes, so for 14 gigabytes of unified memory on macOS or an equivalent NVIDIA setup on Windows,

I could have a pretty good local inference machine. And that is affordable. Again, not for the entire world, obviously, but for a lot of people. And you also see a bunch of companies pushing in that direction. Like Google is really, really big on edge inference. And I like that. Do you think we will see a rise in the smaller models? And in that, so we at JetBrains, we have been doing the, so basically we ship

like tiny models I can literally just do like CSS auto-completion. It's completely offline and it doesn't work if you have like a big-ass project. But for like completing tailwind classes, which I'm at least not capable of remembering all the fucking time, this is nice. So I'm wondering, do you think there will be, we will see a shift from these frontier models, which are like massively big?

If we're being realistic, also for a lot of these things, like, so I, with my microphone, I had a delay. So I was trying to troubleshoot that with Claude because I'm just too stupid for audio-video stuff. You have clearly seen before this recording that I failed because I'm using my old microphone again. But if we're being realistic, for these kind of scenarios, big-ass models like Claude are completely overkill.

Do you think we will see a rise in small expert, very specially trained? And to some extent, we already see this with examples, as you mentioned, like text-to-speech, where it's just like this is this one thing that you're doing I i i'm not involved in model training so i can't speak from expertise but what i'm hearing.

Maybe no if you look if you if you listen to people like denise asaves for example from DeepMind um, his thoughts are that the current gigantic models actually don't need all these parameters they have and that you can actually distill that down to much, much, much smaller models without losing a lot of output quality. So my hope not my my let's say uh knowledge is that um this is the future we

try to take these we will always need to train, no sorry not not speaking in absolute currently at least as i understand it we do have to train these big models for various reasons related to how training works and so on and so forth but once you have such a big model you can distill it rather easily down to something that's, an order of magnitude or more um

smaller while still retaining most of the capabilities and my hope is that, we are heading to a future where, we don't necessarily have specialized models i don't think specialized models, i think they can be um abandoned eight for now if you want to kind of be. Sovereign and and and have some small tests but ultimately you want And the same bang for the buck that you get for a big model locally.

And I think we are very close to that actually. antirez from Redis, Salvatore, he started working on a very custom inference engine just for DeepSeek V4 called ds4, obviously implemented it in C because it's the best language. Someone is biased And and and he is uh he's running DeepSeek V4 Flash on, uh i i on on 128 gig laptop for example armin my partner in crime at Earendil

also does the same now he's also started contributing to that, and it's a really really capable model like really like i probably it probably could handle like 60 to 70 percent of the issues i'm i'm handling with pi, um so i don't think we are far away from models that that are as good as the current frontier models obviously the frontier models will always be frontier,

um but the question is do i need more intelligence. That is kind of where i wanted to go with this um because i think it's very interesting to to hear the people that build coding agents, how they also use that. And

if you listen to Anthropic, I get kind of PTSD, to be super honest. I don't think it's a sustainable way of working, architecting. I don't even know where to start. It seems to work for them, so I don't want to criticize it. If I look at my workflows, I think the maximum was that I had five coding agents running in parallel and I had never faced such a mental fatigue after two, three hours of work.

So I'm curious, how do you work? And I don't mean this in any negative way. There definitely was a certain connotation. How do you work on features for Pi? How's your day structured in that sense? I'm an absolute caveman. I have. There are two types of tasks I have. The first task is going through the issue tracker and picking off things that need to be fixed or implemented.

Um these usually have an issue that is descriptive enough that i can give it to my agent with an additional prompt template that says, pull down all the information from this issue including any reference code any reference pull requests and so on and so forth ignore the analysis in the issue and do your own analysis based on the goals that we want to achieve here with this issue,

and then i usually get a pretty good analysis of what this issue is about and, options on how we could fix it either by pulling in a pull request that already exists or taking what's in there and modifying it or doing a from scratch implementation of fix, and this analysis step usually per issue takes like five minutes for the LLM for the agent to kind of go through so,

I queue up one of these in one session I open a second session pull in the next issue I open a third session pull in the next issue and have kind of like a, parallel pre-processing of all the issues I want to work on today And then I just go to each session, check what the agent suggested, go into the code myself, ensure it makes sense, try to reproduce things manually if they're hard to test automatically.

And then it's just a classic back and forth until I agree with myself, using the agent as a rubber duck, that this is the way I want to implement this. And at that point, the context usually already contains so many guardrails in terms of what will the interfaces look like, which specific modules will we modify, which tests will we write, and how should they work exactly.

Because just telling your agent to write tests is never a good idea. Yeah at least so at the end of this process of an analysis planning i just say go implement and once it starts implementing which also takes another 10 to 30 minutes depending on what it is i do the same at the next session, and once something pings and says i'm done i have a little pi extension where

i can basically pull up a diff of all the changes that were made, and annotate individual lines inside that diff viewer with feedback. And then i click finish review it gets fed back into the agent automatically and that's how i iterate on the thing until i think the code is good um for some pieces of code i don't care i just say fine. For other pieces specifically core mechanics i i usually review every every

change that's being made like i would with a human, and that that's the one thing that's like my day-to-day thing fixing bugs adding features and then there are other tasks like refactoring and building new things um, for building new things i just slop, like this robot thing the the software for the robot thingy is not super complicated but, um it has complexity already because the speech to speech pipeline has many

services that need to get involved um i actually took the the Python MLX implementations and converted them to rust last night, because i hate python as a dependency and i want to have a single package people can download onto their mac or onto their windows machine with an nvidia card and just start it and and have everything running out of the box,

um so but in this case i'm just basically vibe coding uh i i do not look at the code at all i have a very good idea of what needs to be done i can specify that really well and then I do a lot of manual testing, and not even bothering with a test suit because it doesn't matter in this case. That said, eventually I realized this will probably not be a unique project just for my son.

After showing it to kids in the hood, they also want one. So I spent last night and the night before refactoring the vibe slop, which is the third kind of workflow. Once you fucked yourself by vibe slopping, by vibe coding, and once you realize that the thing you built is actually valuable and should probably have a better fundamental so you can extend it more easily in the future,

you got to go in and refactor your vibe slop. And there's two ways to do it. You take what you learned and just built from scratch, or you start picking off individual refactoring tasks from this ball of vibe slop. And that is what I did for the robot. And the way I do this is I go in and read it myself or I let the agent explain to me how things work in specific parts of the code base.

For the robot, it was basically two files, one 3,000 lines of code file called server.ts and another 3,000 lines of code file called client.ts that runs in the browser. And yeah, you can imagine that wasn't fun. But I basically just started like I would manually pulling things out into individual modules, defining interfaces, boundaries, and those I define by hand.

I would, for example, I can ask the agent to propose a bunch of options for an interface or a module boundary. Then I take one or I let it write example apps or tiny little scripts where I can go in and manually feel how the interface, I still need to be kind of in the code to feel if an interface feels good, if an API feels good. So I let the agent write the examples. I go in and play around with it.

And if I like it, that becomes my plan basically. But yeah, that's the refactoring work. I use the agent to explain what exists. I use the agent to write options for how that could be refactored. Then I go in manually and kind of feel things out. typing up a little code or using the api myself by hand and if i'm happy with the refactoring plan for

pulling out 500 lines out of the 3 000 lines into a separate module and, defining a boundary then it's just let the agent do the thing again at which point again we have enough context so the agent is very guard railed in terms of where it can go in in case in terms of the implementation. I'm having somewhat similar workflows so i'm um but that just works for me at

least and i assume it's similar for you because i have 10 plus years of experience working in the nitty-gritty details of a code base having that manual experience um also learning by heart learning the hard way what works and what doesn't work uh every developer i know has patterns that they would absolutely not do again because they tried

it once thought it was great and it turned out to be horrible how do you think this will develop moving forward for people that are now just coming out of college I honestly don't know like if i were if i were a kid today i if i were an actual kid today like i used to be in the 90s right when i started programming, i would probably not learn programming i would probably just ask an agent to write me a program,

i mean we already see this at schools kids use gpt to do the homework and i would have done that as well because i hate our homework that um, so i'm i'm not sure i don't know so for the young people um good luck my condolences, for for old people like us um, they are definitely agents are definitely a force multiplier but there is a risk of atrophy and a risk of loss of discipline and agency.

That's something I have struggled with as well over the last year or two. Just finding the discipline to not delegate all the things to the machine because, inevitably, every time I do this, I end up in a corner crying because nothing is good anymore and everything falls apart. That is also the biggest issue that I have with this narrative that I've seen

this a couple times online where it's like, well, we have too much code now, we cannot review it anymore. Where I'm like, well, then how are you expecting to maintain it if you cannot read it? That doesn't make any sense what you're saying. That is just a little bit more sophisticated approach to vibe coding, but not much. Exactly. That's basically, it's like the vibe coder will just sit there with

his voice dictation app and dictate what app they want at a very high level. And the enterprise vibe coder will first write a very detailed spec. But essentially it's the same fucking thing. I i had a little bit of like a ram i think it was like two weeks ago where i was like i don't get why people are doing spec driven development i either it

just doesn't click with with me or it's like my way of working doesn't work but like i i have struggled to get meaningful information from business people for ever what is now different like there's no reasonable way that i could write a complete prompt that that even like i could give this to another developer and they would develop it flawlessly neither a coding agent doesn't make any sense to me no

Yeah it's crazy it's like we forgot the lessons of the last 30 years. I mean we literally introduced scrum some People we 30 years ago or 20 25 years ago no actually it's fucking 30 years ago holy shit people figured out that waterfall is bad waterfall doesn't work and now we're back to hyper waterfall where you kind of. It's faster it's Fast and it's worse because now you're not even writing the spec anymore you

just vibe prompt your agent to write a very detailed spec and that's like a tower of turtles. Yeah that's that's completely Ridiculous but i mean there's a counter argument to that and the counter argument is that since code is now essentially free, and since iteration times are quicker like previously if you, wrote a waterfall spec and had the team of engineers implement that your turnaround

time would have probably been months now it's maybe a day or even less depending on the size of your spec and then, so you just go in and manually test your way through this or whatever and i can see the appeal, of that i have yet to find evidence that this actually works for production software i have my reservations so, but i'm not i'm not ruling out that there's some magician out there who can

actually make that work and not wake up uh, eventually and just try at night because everything is broken now. I like to like distance my grumpy self from it a little bit because i'm glad that we're now at least like looking okay how how can some workflows look like utilizing coding agents in a more approach instead of just like fuck around and find out

um so if i distance myself a little bit i'm glad we're having conversations around and i don't think this is the right conversation I'm not convinced that army of agents works at the moment and the reason for that is just that if you if you i've said this a million times now i'm kind of sick of it but If you're writing a spec, what is the most detailed spec you can actually write?

Like, what does that look like? The most detailed spec. It's basically your program, the program itself. Okay. So, assuming you are not writing the program by hand, because now you have agents, you're writing a human natural language prose-based spec. If you do that, the agent needs to translate that into architecture and code.

If you are not specifying any of that on the absolute highest detail level which is the program, you are leaving planks in your spec and the agent fills that out and the question is with what does the agent fill out the planks that you're leaving out in your spec, with the garbage code that we put on the internet for the past 20 years so that

is also exactly how a code base that's vibe coded or enterprise-wide coded with specs looks like. It's very bad. It's very bad. So I run an experiment. So full disclaimer, I know nothing about Kotlin, right? Everything in JetBrains runs on Kotlin. I know nothing. I was probably more like a quota high or something. I don't know.

So when the Ralph loop was big, I was like, well, I should give this a try, right? Just to be like objective. I have my concerns, still have them. So I reached out to a developer owning our terminal integration. I was like, hey, there's this bug that bothers me. Do you mind if i use this vibe coded and you give me a review i i put a disclaimer because i don't want to put add more work to his plate right

but at the same time uh i gave him this disclaimer the pull request was never even able to merge it was just utter trash um complete nonsense i didn't have any chance to like verify it because a i don't know the language i don't know the subsystem i don't i'm not that deep into the code base and i could justify or have any way to reason about it And he was just like, Fuck is that? Why are we even doing this? So I...

Yeah, I have various concerns about Vibe coding. So here's the thing. We have a bunch of projects now that show us that something like a Ralph loop or an auto-research loop or a slash goal or whatever you have can actually work. Like Bun, the rewrite from Zig to Rust. I think there are, and that's what I mean with like, I'm excited to talk about

Workforce because I think there are constraints where tools like that really make sense. BUN to RUST is a great example because you have an extensive test suite and you can establish a process where you can create or can have the agent verify the work itself to a certain extent. So things like that totally make sense. I was considering making that joke also

to rewrite the IntelliJ codebase to Rust, but our marketing department was not happy about that. But that's exactly what I mean. If we're just looking at it and if you're terminally online like I am, And you think like, depending on the bubbles you're in around like February or something, I would say, Ralph was everywhere and everyone was like, oh, that's the shit.

That's the best thing ever that happened to coding since coding agents two months before. Yeah, sure. If you have an infinite source of tokens or money. That is another, yeah. Okay. You mentioned earlier this thing that code is effectively free. Is that something you really think? No. Because I don't. Okay. Because the consequences of your actions will eventually hit you and if you

think that any amount of code is good now, you just delayed the punishment. Because that, like, I mean, I have seen people who shit out like 500,000 lines of code via a bunch of agents in a week. And guess what the outcome of that is? I have so many thoughts on that because on the one hand, I think if the lines of code is really what you value now and used to be your bottleneck,

apparently, then you never just really put your mind to that and you needed to, like, type faster. That argument just doesn't make sense to me. like if i want to i can also write thousand lines of code in an hour that's not the value It's gonna be hard to do it in an hour but yeah i mean i think as a human we max out at two to three k lines of code a day but for me the writing the lines

of code was never ever fucking the problem but, so say i build software my software should have a new feature i want to solve a new problem with my existing software or write a bunch of new software the, most time i spend on that is thinking about it and designing the thing right, and then just writing it out obviously there are some classes of programming

problems like i always hated writing gpu kernels for example i had to do that because i had to well, there's just hard really hard coding coding uh things where just your brain just clases out because you can't keep all of it in your head at one point. But let's assume it's not that, it's something else. Then exploring the solution space, the number of solutions you can explore is limited by your coding abilities.

So basically when we complain about, hey, why are we doing waterfall now? Then this is kind of similar when we say, oh, previously I thought so much about things instead of coding. Because you had to think so hard to find that one solution that you then can actually spend resources on implementing. And I think there is a big, big, big uplift by using agents for that phase because

now you don't have to initially, you don't have to perfectly zero in on that one solution of the lots of designing and thinking. You can actually explore things and that is super valuable. Doesn't mean that the results of those explorations that are partially or fully written by agents are reusable, but you can get a feeling for the solution space and the solution you want to

have with agents way quicker because you can tell a bunch of agents go off and build it like this build like this build like this and then look at the results and that is what for me is the, biggest productivity increase the. Thing that i also like is the asynchronous nature of this i can now give my agent a task to explore something, evaluate something, research something and

go in a meeting because we still have meetings. We still need to talk to people and then come back. And at least if I have the capacity to like pick up on that, I can, I saved a little bit of like time in that space that I otherwise couldn't have used in any other way. But again, I don't see it realistically any near time soon that I'll spin up 10 agents and 10x my output in that sense.

Well, I mean, the asynchronous nature is a benefit for some tasks. Like if you have a review agent or whatever, obviously you don't want to sit there and wait for the agent to finish its review. But if you do too much of that at least for my old brain it's just it's like, i i probably need to start ketamine or something to to get to get through having 20 20 agents running and then,

that it's not it's the task switching is the it's the context switching that's killing you um it's, previously i could focus on two to three things a day at tops now, on a good day i can go through 30 issues on pi fixing them and it is i couldn't do that before but i also only do this like once or twice a month, because after the day of doing that i'm so done with the world because my brain

is just mush the context switch is what kills you and i also like there is tasks where asynchronous agents can be super beautiful, but also i find i find collaborative work with an agent really really really helpful because previously i would sit in my office and think through an issue and there might not be a human squishy part next to me sitting sitting next to me where,

whom i can talk to or there might be remote and i can only chat to them which is kind of different from regular. So now I have this non-squishy, non-human, non-meatbag in my machine, which I kind of use to become my pair programming partner. And that is really great because it's kind of like a bicycle for the mind. Where I can think through the problem while talking to an agent or asking the

questions or having it ask me questions like the Socratic method. Uh and and that helps me explore questions within the problem space a lot better so i i'm that's why that's why i don't like the army of agent things because, it takes away it takes me away from thinking about the problem. So last year pretty around the similar time i had Ryan Carniato on the podcast and he said a thing around um

he's basically thinking about problems all the time so his wife is joking that one day will he will be hit by a bus because he's thinking about a problem not paying attention. I have that if I'm in that headspace where I want to think about a problem, but otherwise I'm good to zone out. Like if I'm with my kids, I don't think necessarily about work or something or like a problem that faced me.

When my mind is going idle like when i try to sleep then i have that but like if i'm doing something and it can be as easy as like playing catch with the kids outside right how's that for you and now i'm wondering also do you think that will change in such a workflow where you can like have someone like some kind of sparing partner in form of an ai

I so the the turning your brain off or or or stopping thinking about problems that i've never been good at that like i. Not good in sense that you have been doing that all the time Pre and post agents nothing changed for me with that in that regard because if i have something that is really enticing that is really capturing my attention, um and say i work on that for four hours and then i i have to go because i need

to pick up the boy from from kindergarten or something like that it takes me about an hour for my brain to kind of let go, of that of that problem and there's really nothing you can do about it it's super it sucks so i i would like to be able to just step out and be like ah, my brain is now uh all good but you know like the what's it called one of the

superheroes in in The Boys saga or so she would lobotomize herself just. Oh yeah yeah yeah Whenever she wants a little peace of mind and i i kind of wished i could do that because uh it sucks getting getting, stuck in this thinking loop um it is probably some i don't know mental condition so but a lot of my peers actually share I'm happy to hear that, you're not affected by that garbage because it sucks.

I can imagine yeah i i'm not not envying you there oh And agents don't make it better they used to make it work worse like when all this started with the release of Claude Code i i did use all the other things before copilot yeah tap tap and cursor and whatever but Claude Code was just so enticing because it it kind of just worked in a way that makes sense for my brain um,

and then i thought oh my god i'm a god now i can do all the projects i never had time for because this little machine cannot do most of the shitty work that takes a lot of time right so i was building, millions of projects and i couldn't stop thinking about how to finish this project where to start with this project and so on it was like, uh four months of not sleeping at all basically three to four hours a night,

full AI psychosis, but eventually it's not that was it.

So i think i didn't really have that because like uh in my work i kind of got pushed into that so i had like all day to experiment with that and uh i definitely have that time where i'm like oh okay i can now work on those projects but then i try to like cram that in my eight hours work day as like oh i'm experimenting with that i'm trying these two so what was it in the end that made you snap out of it

Because i found that i'm producing things of no value neither for me nor for other people and just building for the sake of building is something i do if i learn something new, but not something i want to do if i don't learn anything and if i don't use the thing i built or nobody else has a use for the thing i built and so i just snapped out of it also i got kind of bored with with agents,

yeah the lack of sleep didn't help but that that fixed itself eventually nature nature has its way to fix that um, Yeah, I don't know. I just found that I can produce a lot of shit now, but the art is actually picking what you actually want to build and build it so well that it actually has value. It actually gets used. There are so many things that you're saying that I would kind of like to pick

up on, so I need to get my squirrel brain in check here. You said you like building things for learning. How is that learning experience with AI? Because if I, being realistic, and part of that is that I'm just crazy busy with other stuff, but that is probably one of my concerns. It's like, okay, I used to learn by pain. I've said this on the last episode, but the way that I learned was like,

I don't understand this. I forced myself to build this, to just go through the emotion, to understand it, look through the source code or something. And with AI, there's just a very easy way out of that. Yeah, no, you're absolutely right. Great, thanks. Friction and pain is something I also promote as the one method to learn. And obviously, agents take away a lot of the friction and pain.

That said, it's also an opportunity. For example, when I learned electronics that was pre-Claude Code that was with Cursor still, um i already knew how to fucking write programs for an MCU like ESP32 or you know i was learning electronics, so i wanted to focus my time on learning that and i still needed some code but i didn't think that writing the code for an mcu would,

teach me anything i i've done so many some so much low-level programming in my life i don't i don't care so while learning electronics with microcontrollers as part of the circuit, i wouldn't care about the code at all and the APIs of, ESP-IDF or Arduino or whatever it is. And I would just have cursor or later Claude Code write that code for me because I don't care.

I only cared in performance sensitive parts where I would go in and just fix it up. So I use the agent to do stuff that is part of learning where I don't learn anything but, that I still need to do to be able to learn the thing I actually want to learn so there it's great because it removes a part of the friction that doesn't help me learn something,

in terms of coding it's hard because i'm so old and i have seen so many things and i don't learn a lot about coding anymore. I i used to read a lot about software design and architecture and learn a lot of different apis and build compilers and blah blah blah i'm kind of tired, i don't think there's a lot in in software that i still. Need to learn at least i have no interest in learning any of the things like

um example typescript there is now this big uh effect ts uh hype, I don't fucking care, man. Like, I understand the problem you're trying to solve. The way you're solving it is cute, but it doesn't matter. It's just fucking, it doesn't matter. So previously I would have slurped it up and kind of read all about it, tried it out. I don't, I'm just tired. So, I don't have any advice for how

to introduce that friction and pain when learning new things in software. I guess the classical answer is just do it by hand. That is a very important part for people coming into the industry, like focusing on the fundamentals and not getting like too quickly too fast or too quick too fast. Yeah, yeah. I understand what you mean. Like if a junior joins a big corp at the moment.

There's a lot of pressure that will not force them but will make it more enticing for them to use agents because first of all as a junior i don't want to fail i don't want to fail i don't want to fail this is scary i'm i now have a job oh my fucking god i have a job for two i do, so now you're resorting to agents to help you be good at your job so nobody

fires you or nobody screams at you because you're a stupid junior who doesn't yet know everything, um then the other thing is as a junior i don't know how it currently is but i would assume people who just finished their studies if they do cs or something similar, they probably are quite good with agents already but might lack the fundamentals,

because during the formative years they already had a lot of assistance and then the question is did they did they learn how did they learn the fundamentals And to that, I don't have any answer. One thing that I may be a little bit morbidly excited to see, but I think this year is going to be particularly painful for enterprises because there's this drive of like, oh, we need to adopt AI.

I mean, the fact that Meta has their token leaderboard or whatever, insane to me. So I think we're going to see an abundance of sloppy software. We see an abundance of companies failing with that miserably for various reasons. So I'm curious to see how this plays out. I hope the outcome will be like, oh shit, there are limitations. That would be a great outcome.

With the caveat that I also think that a lot of people don't realize how much lower quality they are actually willing to accept. Like, it's not like software has always been this brilliant star in the sky that's perfect. It's usually garbage. So then... Does it actually hurt if it becomes even more garbage because nobody cares anymore and it stops everything i think for specific,

software that might be true that it hurts for other types of software it probably doesn't matter like if you have a TikTok client or an Instagram client on Android and iOS, sure but as long as the videos are playing you don't care if the buttons don't work or if if it takes three presses to to to like a thing you don't care because

first of all you don't have anywhere else to go anyways because there's only one TikTok, And ultimately, you're just there to numb your brain. So I think there's a lot of software where it actually doesn't matter. But ultimately, I do hope that this is just one of these pendulum things where the industry is going like, whoa, what the fuck, we need to do this?

Like with monolith versus microservices and all the garbage. Just on a way more, like, just like on a. Yeah. I have my thoughts on microservices too, so. There's a little time and place for everything but first i think the industry always has to swing in one extreme direction before we kind of end up at the middle and my hope is that this will actually play out like that just fine as well.

The interesting conversation that we haven't really touched on i have yet to meet a developer who works in a code base and says oh this is perfect there's nothing that i could even optimize because we all operate within constraints and usually time is a very valid constraint budget is another one of those um so no code base is perfect already so

the question is then also okay what are we aiming for are we aiming to maintaining the quality standard a lot of companies businesses can probably drop a couple percent points in terms of quality not that you can measure quality that easily but you know what i mean I mean we can measure it in uptime right and we've dropped quite a bit of of decimal places there. Tell me tell me how you feel about GitHub.

To be honest, GitHub is in a very strange situation. Clankers are now submitting issues and pull requests en masse, creating repositories en masse without the human knowing, actually, that they're doing it, and GitHub gets the brunt of that and needs to just manage their info. It's really hard. So I don't think GitHub's outages are related to AI, although I heard some things through the grapevine.

I think, could Microsoft probably maintain GitHub a little bit better as a product? Probably, yes. uh does it seem a little abandoned here and there also Yes i mean the guy who was the head of GitHub he was just fucking off into the sunset and was like, i'm now head of i don't know xbox whatever he didn't fucking care he he was like absent for months and shit was hitting the wall and he's like i exist i

get a lot of money i don't know what the fuck i'm doing that's not good, um but the GitHub people themselves are actually very talented i think what's still killing them is the move to that since they started doing that move, you can actually look at the outage history and you see once they started going to Asia or our sewer or whatever it's called, it's not been pretty.

So one thing I would be curious to touch on with you and you can tell me to go fuck myself. But you have a four year old kid, if I remember correctly, which is interesting because I have a three year old, so three and a half. So very in a very similar spot there. And I'm wondering two things. So, A, you have this massively successful open source project while also being involved with Earendil.

How does this impact your personal life, positively and negatively? Let's say Q1 of 2026 was hard. Just because, not so much because of Earendil, actually. It's just a lot of companies and people wanted to talk to me all of a sudden. And that was a lot of calls like a lot of calls and that meant that i had to see my son i did less than i'm used to and that

kind of sucks um since i've joined Earendil my work pace and the number of calls did decrease slightly not by a lot, but i've since found i've since found ways to manage um manage my free time better. Well that's good glad you hear that I have a calendar now i never had a calendar in my life i hate the fucking calendar yeah. What i could not function without my calendar

No like anything that's important people will come up to you and remind you that something important is gonna happen it always works for me the world is my calendar. No i see i you're like outsourcing your calendar i like that No honestly like i have so much garbage on my calendar now that i know is could have been an email uh and i'm not talking about partners or Earendil,

disclaimer um i like if i if i didn't have a calendar i it would probably all just be fine, it's just the calendar kind of gives me a structure where i can paint in the time that i do worky parts and time that i do the non-worky parts. Yeah this this is really important to me at least because so so i'm also in a very fortunate situation that we have a nanny who's taking care of the kids

at home. So if I go grab a coffee or something, I can at least like hang out with them for a minute. And we also have like dedicated offices where they know, okay, Papa is working when he's in there. So this helps me very much be like, present dad when i'm downstairs where it's like okay i'm working when i'm upstairs and the nanny is here for a certain amount of time because we pay her by the

hour obviously so it's very easy for me to be like ah okay that mode um but i'm was the did the success of pi except for like the calls aspect positively impact your work life balance or rather negatively Negatively no question about that i i've been fun employed for a very long time let's say uh, and uh i i used to be a free spirit and now i don't feel like a free spirit

anymore because there are certain expectations, coming from a lot of people and i'm not talking about Earendil or or anyone with Earendil it's just once you put out something public like that and there's a lot of users, you can just fuck off into the sunset unless you really don't care about people. And sadly, apart from probably having something like OCD when it comes to problems

and keeping them ahead, I also fucking like people. And that's a terrible combination for your work-life balance. Other than i know that Earendil um armin and others are also contributing to pi but do you have like a whole team that works on it or is this again you're running the show kind of thing Uh i'm calling the shots but i try to delegate and armin's become the best junior i've ever had um,

so i can delegate a lot to him which is very nice but the reason for that is also that he should eventually be able to take on my role as well should should something happen to me, um apart from him there's a handful of people that have been with pi way earlier than the rest of the Earendil team uh just external contributors like and um. Um god damn it my brain uh Alio, he's a French developer is really really great

he uses a lot of open weights models to do pi work it's always amazing to see uh and um, No, Nico Balin, a Canadian developer who has done a lot of amazing Pi extensions. Then there's mrexodia, who's actually been the developer of x64dbg, one of my most beloved reverse engineering tools. And he just showed up and started contributing. ParLens, lots of great people, old guard people, people like me who've seen

some shit contributing to Pi. So that's great but most of that is on and off and ultimately in open source at least at this stage of an open source project there is usually only one or two people are actually calling the shots and coordinating things, because most people aren't reliable enough that you can say monday nine to five, uh sorry monday to friday nine to five you're working on this and this issue i i,

i can't delegate like i could in a in a company so uh. Let's slowly wrap things up one thing i am still curious to talk to you. You have spent recently a lot of time on refactoring Pi. And maybe you have already talked about this more publicly and I just completely missed it, which would a little bit surprise me because I feel a little bit like a stalk on your Twitter timeline. I've seen things. No, I'm just kidding.

A, where's the Refactoring coming from? B, what are you trying to get out of the Refactoring? Sure so you can actually follow it on the main branch because i refactor in main you motherfuckers i don't care about you and stability, um so pi obviously also has a lot of historic uh baggage it was never intended for public consumption uh quite a lot of parts are actually vibe coded,

the html export that's available i have not looked at a single line of code for that garbage i don't care as long as the html renders when you export a session to html fine great, um there are a lot of handwritten things still because some parts of pi actually predate my use of agents um, and those are pretty solid but uh with, more time passing there's new things you want to try out and,

a terminal user interface only takes it this far so one reason for the refactor is that i also want to expand into other types of user interfaces more easily, that is that is possible already with the existing sdk if you ignore extensions, but since pi is mostly a plug-in thingy i need to find a way how to make third-party extensions also work on the web for example or at least with.

So that's one part of the refactor. The other part of the refactor is about remote ability. That is, I want to be able to run one Pi session on this machine and connect to it from another machine. And again, this could be possible already with the existing SDK. It's just not as clean as I'd like it to be, specifically with respect to durability

and observability and all of that stuff you'd like to have if you can remote into agents. And yeah, the durability and observability are the other big parts that I want to tackle. And finally, I want to be able to use Pi's SDK and just deploy agents on Cloudflare workers or on Vercel and blah, blah. I want the agents itself to be adaptable for any kind of environment,

not just a local computer with a bash, but basically anything. And again, this could already be possible with the current thing, but it can be so much nicer. Do you have a timeline in mind for yourself when you would like to accomplish that and i mean like ultimately refactoring should never be a done done thing right Right yeah so that's also why i'm doing it on main because just today i'm gonna

start fucking up the Pi AI package so that the most lowest of packages that talks to providers and and. Transforms the context for the different provider formats and stuff like that so i'm gonna fuck that up tonight um and this is the reason why Yeah, and this is why I'm working on main. The new harness is contained in the agent package, and the coding agent package

doesn't use the new harness yet, but piece by piece I will start switching that out. So then you will have a Pi that is using all the new infrastructure below it, but you still don't notice it, and all the extensions will still work. I hope to be able to be done with that in a week or two. It's been a slog because I still have a lot of calls, and I still have a lot of issues to try.

And if I get a single day per week to refactor, then that's already a lot. So yeah, maybe a week, maybe two until that state where Pi coding agent itself still looks and works like the old Pi coding agent, but all the underlying bullshit is now clean and nice and reusable. And after that, I would think a couple more weeks to iron out the new extension

mechanism where an extension has a server-side component and a UI-side component. So you have, I don't know, if you think about permission dialogues, which aren't built in, you have the server side that then spawns a UI component and the UI component. The server doesn't care if it's a Tui component or a web component or a native Android component or whatever, just says, show this component with this name

in the UI and give me the result. That's basically it. You reminded me, so we're running now over because of you and what you're saying. I wanted to ask you why Pi is, as far as I'm aware, the only agent that runs YOLO mode by default. Do you think that tapping yes in the dialogue that asks you if the next operation... Oh, no, I think, I'm not saying you're wrong. I never said you're wrong.

My thing is this. By telling people, this is YOLO mode, it's dangerous, you better think about this. I actually make people think about that. I actually hope to make people think about that. Because initially people come to Pi and they're like, oh, this is garbage because it doesn't have permissions. And I'm like, yeah, think about why it doesn't have permissions. and then think about how you fix that on your end.

And the hope there is that people go inside themselves and look for this quiet, dark place that's called security awareness and think about real hard how within their environment, which is different to my environment, they would like to safeguard agentic work. And the answer to that is usually, if I don't want the computer that the agent is running on to be fucked, I containerize the agent.

Or at least I containerize the tools the agent uses like read, write files, bash, in a container. That solves problem one. But I cannot decide this for you. I could probably install, could probably bundle something like the sandbox thing that Anthropic has or bubble wrap or whatever. But these are incomplete solutions. And I do not want, when it comes to security, provide you with an incomplete

solution that you can fuck up configuring. So ideally within the enterprise, and I had a bunch of enterprise people come up to me about that, they actually eventually understand and like, yeah, this makes total sense because you don't fucking know our environment at Infra. Like if you use something like Gondolin, a QEMU thing as the container built

in, this wouldn't work in our infrastructure because we don't have hardware virtualization for blah blah blah. So yeah. That's why that's not in there. Because I think what exists in Codex and Cloud Code is mostly security theater. It is now also, I mean, Claude Code now does what? It asks an LLM if a bash command is safe or not. In auto mode. Is that good? I don't think that's good.

I'm curious to see, because I think like a couple weeks ago, Peter Steinberger's token consumptions made it online with like 1.3 million a month. Yeah, I don't even know. Good for him. No criticism there. And the thing that I thought was really interesting in that conversation is that he shared some of the tools that they integrated as automations for managing

PRs and like the level of or the frequency of tickets that they get. You kind of alluded to the fact that you're doing that to some extent manually, but you also online talked about the fact that you declare PR vacation every so often, which I thought was an interesting approach. So talk me through how your open source process. So I've been in open source for 20 years and I had two other big projects in

that space, so I know my way around. Previously, pre-agent, it would take people a lot of effort and time to create a pull request because they would have to understand everything with their human brain and write the code with their human hands. Nowadays, people would just ask their clanker to send a PR to fix a thing they think is broken in Pi. Usually, the results of that are really shitty and not reusable,

and often they don't even identify something that's actually broken. It's usually just user error. So now what happens is, instead of getting like one or two PRs per week for a very successful pre-agent open source project, I get 50 to 60 PRs per day by clankers. And each PR has a description that is kind of like a full Harry Potter book.

And then usually has about 10 to 1,000 file changes, depending on how the clanker felt. So what are you going to do with that? The default position is, you now declare every pull request garbage and auto-close it. So, how do I get meaningful pull requests? Well, you ask people to first write an issue in their human voice. No longer than a screen. And tell me exactly what it is you want to do and why

you want to do it. And then I tell you, okay, that's good. Now you can send a PR. You have proven that you're human. You've proven that you understand the problem. And you've proven that you understand the solution. And once I get that, I type, looks good to me, and a little GitHub workflow triggers that will put them in a file that says this account on GitHub can now open prs without them being autoclosed,

and that's worked brilliantly because now i'm back to high quality prs that i used to use, that i used to get pre-clanker, um and i don't see the clanker prs anymore what i do see though is the issues because the issues i already autoclosed the issues um but that's mostly just for visual reasons um i still every day, go to the issue tracker to the closed issues find the issue that's labeled last

read that's that's kind of like the marker up until which either me or armin have processed issues, and then i painstakingly read through 30 to 60 issues that have been submitted in the last whatever 12 or 24 hours and identify is this clanker is this not clanker, is it a good issue is it not a good issue and then i reopen the issues that are worth it and out of this like my last triage run was two days ago because

thankfully armin is not doing that as well. I i think i ran through 50 issues and two survived two issues got opened out of 50 closed issues. But it works i mean i spent about half an hour going for 50 issues because most are very quickly to identify as garbage so that's not that's not hard, it sucks that i have to do it but it's not that bad for Peter and OpenClaw

the scale is just orders of magnitude bigger and my version my my method would not work for that at all no chance, So that is why he is trying to combat that with automated tools. Actually, earlier in February, I wrote some things for the OpenClaw team, to visualize issues in the 3D space and be able to select clusters of issues that are very, very likely the exact same issue,

just to bulk select them and close them and replace them with a summary of that cluster of issues. Um but that still didn't work because the number of issues just exploded and and even with that where you can select 50 issues that are the same and just say this is not one issue even with that they couldn't handle it so now peter's very aggressive

in automating as much as possible and that is he's kind of living in the future. I i think it's very interesting to hear him talk and like his experiments and also experience i'm very fascinated by some of the things i still don't believe that most people or companies are anywhere near that No um Anthropic probably is with Claude Code it's probably.

Very similar but i mean if i would now go so i uh you i worked in germany so i worked for energy and health i worked for siemens for a while oh Yeah that's not gonna fly there at. All if i would suggest i'm like hey how do you feel about like spending 1.3 million for a development team of three people, I don't think they would be super amused by that. Yeah, I mean,

it's hard to say. So first of all, the 1 million is API pricing, right? And if you're working for OpenAI, API pricing, is not really what OpenAI paid for Peter's work for that month. Let's put it that way. Okay but even if you assumed they had a margin of 90 percent and the actual cost was a hundred thousand dollars for that one month then the question is was it worth the money,

And I cannot speak to that for OpenClaw at all. Like I would trust Peter enough to know what he's doing and that money was probably well spent in terms of, I'm not sure how far along he is with kind of automating a lot of this stuff, but I would assume a lot of that money was spent on building the automation infrastructure. So in the future, they do not have to spend as much anymore on that kind of thing.

But again, I don't know enough about OpenClaw too. One thing you touched on though is, again, interesting to me because I think, I mean, we all know that the cost that we have right now for tokens is going to increase long-term. Long-term, hard to say, but at least like in some future there's a price hike going to come. So the subscription, yes. Yes.

For the pay-as-you-go API tokens, you also think that there's going to be more price hikes? I'm not disputing that, I'm just asking. I'm not sure how... I have some trust issues with all of them, to be super honest. Yeah, me too. So I could still consider there like some kind of marketing budget being allocated to even those where you say like, okay, adoption is still important to us.

We want to be a little bit more financially responsible, but still... I don't know about that, but like, I mean, subscription unknown fact that those are subsidized tokens, left and right.

I think once those subscription costs are going to increase companies are going to question more at the ROI of certain initiatives and endeavors and usages of ai so i'm wondering and again it partially brings up the conversation around workflows and stuff like that but i would not be surprised if we see a future and this is just completely me speculating

um where different seniority levels get a different budget allocation for AI. Oh, that's already the case. I know companies when this is already the case. Yeah. Same. But at the same time, then, if there's also the conversation, because right now it feels a little bit like an unlimited pot of gold that a lot of companies allocate to AI, which I think, to some extent, makes sense for people to experiment and get like

get into the AI app psychosis and pass it.

But with that, when we start the conversation of like, oh, do you really need to use Opus for centering that div now? Probably not. And I do that myself. I'm too lazy to switch models. So I usually pick one that works for me reliably. I usually use GPT 5.5 and call it a day. I also don't dramatically mess with different reasoning levels so I think that might be biased, but I think that's the case for most developers,

but I don't think that would be the case long term just for price sensitivity. Thoughts?

Yeah, I don't know. I mean, we were promised lower token pricing for the past two years, and I do not see that happening for the frontier models. Like the last couple of releases always increased pricing. Or they changed the tokenizer so that now for the same input text, they just count more tokens. Which is too far, extra sneaky. Yeah, same token price, but it's just like 1.4 times more tokens for the same input.

I mean, there's technical reasons why you want to switch out your tokenizer. But it's also kind of, I don't know, suspicious, but specifically pre-IPO. The thing that boggles my mind is you're marketing these things to developers right now because like They stopped marketing to developers they are now marketing to CIOs and enterprises, they don't care about developers anymore because you know what happened they

marketed to developers for the past year, 2024, 2025 basically, from April to October they got all the training data for them in exchange for, paying Anthropic $200 for that privilege then they improved the models with that data then over Christmas they gave everybody ai psychosis in form of free coupons that you can share with your friends to try out Claude Code during Christmas break,

and then in january every came everybody came back to the office and said, boss we need to buy the Claudes all the Claudes and then all the companies started buying the Claudes and this is what happened and at that point where the companies start requesting more Claudes, Anthropic doesn't have to cater to the developers anymore and you can see it in all their actions since February.

Yeah, I think the token prices will not decrease. At least not from frontier models. But what's interesting is if you look at the token prices for open weights models at similar sizes than what's been published about the sizes of closed, frontier models, there's a lot of margin those motherfuckers make on inference. That's true. So yeah, I don't know.

Thank you so much for joining me for this. I had a fantastic time being grumpy with you about certain things that I'm going to do right now. The grumpy show. Show me name this. Thank you so much. Really appreciate you. And thank you so much for doing the work that you're doing on Pi. It's definitely one of the agents. Well, I like using it myself. So thank you.

Thanks for having me. And yeah, best of luck with the tokens and stuff. All right see you next time bye

Code Isn't Free - Mario Zechner on the Hard Truths of Coding With AI (creator of Pi)

Jan-Niklas Wortmann

Key Takeaways

Chapters

Mentioned In This Episode

Pull Quotes

Guest Bio

Full Transcript

How To Ship Real Code With AI (Not Junk) ft. David Cramer

A Google Engineer's Honest Take on AI, Angular, and the Future of Your Career

The Timeline Is Gaslighting You About AI

Key Takeaways

Chapters

Mentioned In This Episode

Pull Quotes

Guest Bio

Full Transcript

Related Episodes

How To Ship Real Code With AI (Not Junk) ft. David Cramer

A Google Engineer's Honest Take on AI, Angular, and the Future of Your Career

Microfrontends: Cutting Through the Hype and Misconceptions (w/ Luca)