[go: up one dir, main page]

Changelog & Friends – Episode #112

A new direction for AI developer tooling

featuring José Valim, creator of Tidewave

All Episodes

Elixir creator, José Valim, is throwing his hat into the coding agent ring with Tidewave –a coding agent for full-stack web development. Tidewave runs in the browser alongside your app, but it’s also deeply integrated into Rails and Phoenix. On this episode, José tells us all about it. Also: his agent flow, YOLO mode, an MCP hot take, and more.

Featuring

Sponsors

Depot10x faster builds? Yes please. Build faster. Waste less time. Accelerate Docker image builds, and GitHub Actions workflows. Easily integrate with your existing CI provider and dev workflows to save hours of build time.

Fabi.ai – The all-in-one data analysis platform for modern teams. From ad hoc queries to advanced analytics, Fabi lets you explore data wherever it lives—spreadsheets, Postgres, Snowflake, Airtable and more. Built-in Python and AI assistance help you move fast, then publish interactive dashboards or automate insights delivered straight to Slack, email, spreadsheets or wherever you need to share it. Learn more and get started for free at fabi.ai

Miro – The innovation workspace for the age of AI. Built for modern teams, Miro helps you turn unstructured ideas into structured outcomes—fast. Diagramming, product design, and AI-powered collaboration, all in one shared space. Start building at miro.com

Notes & Links

📝 Edit Notes

Chapters

1 00:00 Let's talk! 00:38
2 00:38 Sponsor: Depot 02:12
3 02:49 Travel & Friends 06:22
4 09:11 Claude 4.5 dropped 03:06
5 12:18 Tricking your AI 02:36
6 14:53 Actual vs perceived value 04:27
7 19:21 Intro to Tidewave 02:21
8 21:42 José's agent flow 03:20
9 25:02 Adam's agent flow 02:28
10 27:30 Sponsor: Fabi.ai 02:11
11 29:41 Agents for prototypes 01:58
12 31:39 YOLO mode notifier 01:51
13 33:30 Competing with Phoenix.new 02:44
14 36:14 For local devs 01:55
15 38:09 Not just in the browser 04:01
16 42:11 The Tidewave experience 05:26
17 47:36 Sponsor: Miro 01:10
18 48:46 Frontend agent limits 01:52
19 50:38 The solutions architect 04:43
20 55:21 Back 15 minutes 00:53
21 56:14 Impressive testing strategy 03:08
22 59:22 MCP hot take 04:22
23 1:03:44 Tidewave models 02:40
24 1:06:24 MCP's meteoric rise 04:25
25 1:10:50 Tidewave business model 03:45
26 1:14:35 Claude Code integration 01:55
27 1:16:30 Adam's Amp use 01:45
28 1:18:15 The benefits of ACP 09:58
29 1:28:13 Bye, friends 00:14
30 1:28:27 Closing thoughts 01:24

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

I got the news, I got the update in the podcast that there was an Oxide event, and you were there…

Oh, yeah.

Was that a thing?

How often do you do that, where you go, like, to the place?

Yeah, this was a first for us, I guess, because that’s an internal conference for their company… And obviously, we are not internal to their company. So that’s a first for us, but… You know, we’ve hit it off with Bryan Cantrell, and with Steve Tuck, the co-founders, and they wanted us to experience the team, and meet everybody. And we had been kind of ogling and gushing over how cool their server racks are for years. And of course, I don’t have enough money to buy one of their racks, and neither does Adam… They’re not going to make a home lab version, despite Adam’s incessant cries for them to have an affordable, maybe a half rack… And so we’ve always wanted to see their hardware in real life, and so this opportunity presented itself. So we went, hung out, met a bunch of cool people, and saw kind of inside their company what they are up to, which was… It was cool. It was different.

Nice. Very nice.

First time to Emeryville. Have you ever been to Emeryville, Oakland area, in the Bay, Jose?

I don’t think so…

You don’t think so…? Right across the street from Pixar. The Pixar headquarters.

Yeah. I actually didn’t even know Pixar was around that area, so…

I didn’t either, until I looked across the street and there was Pixar.

Isn’t that area in Oakland, where some startups are starting to move there? Is that the thing that you’re saying, or I’m getting things mixed up?

I don’t know, honestly. I think so… But we’re outsiders; we get invited to the Valley from time to time…

…and eventually we say yes.

You know, it’s a cool experience. I think it depends on the different places we get invited to, but I think it’s a chance to explore the world, and meet some cool people, peel back the layers, tell some cool stories… So I favor the IRL. I think it’s cool to do it a few times a year, or as often as it makes sense. Some version of that.

We usually go to All Things Open, but our schedule conflicted this year. What about you, Jose? Do you get out and see the people ever?

Yes. And I had kind of – I was used to do a lot of that, especially at Elixir at the beginning, go to a bunch of different conferences, and just talk about Elixir… And then of course, at some point it gets very exhausting, and then I ended up just kind of “Okay, I’m going to focus on the Elixir community from now on.” And even the Elixir community, there is the adjacent Erlang community as well, that – it’s enough to keep a person busy. But now, with Tidewave, now we support Phoenix, and Rails, we are working on Django and other frameworks… I have started to kind of go back, for example, to Ruby conferences… So last week I was at Euruko, which is one of my favorite conferences. I don’t know if you are – you were Ruby folks, weren’t you?

Yeah. So I’m familiar with that. I haven’t been to it, but I’m familiar with most of the Ruby confs.

Yeah. So I don’t know if we’re alive already, but for the listeners, I talked about this in other places - what I really about Euruko is that every year people say “Look, I want to host in my city.” They do a three minutes, five minutes presentation, and people attending the event vote where it’s going to be next year.

Oh, okay.

And it’s usually somebody with no experience organizing an event now has to organize an event for 500 people, 600 people. And it’s probably very daunting, but I think it keeps – like, it keeps it fresh and keeps it always community-centric, because it’s always moving around.

Yeah, so I was at Euruko, and then the Elixir event I’m going to – in two days I’m going to the GOTO conferences in Copenhagen… So yeah, I’m kind of back on traveling mode for now.

Do you that, or do you just do it because you have to do it?

I’m enjoying it right now, because I think one of the things with everything that is happening around AI, and code agents - nobody kind of knows where it’s going. So people tell you that it’s going to go there, but nobody really knows, right? I think the CEO of CTO of Anthropic made a prediction about 90% in six months - six months have passed; it has not been 90% of code being written by coding agents.

[00:07:48.17] So I think I’m enjoying a lot this opportunity with talking to different people, and you are getting a bunch of different takes, and different ideas, and things to explore… So it has been really fun just going out and talking to people, but I’m sure that I’m going to do it enough that at some point, maybe in six months I’ll be like “Okay, I’ve had my fail. It’s time to hibernate again, and go back to the Elixir conferences.” But right now, it has been really fun.

I agree. I think it’s fun to step back for a while and become a recluse and enjoy your local world, and then to come out, peek your head out from underneath the rock and see the people again. There’s something invigorating and exciting about it. But when you’re just constantly on that track of just travel, travel, travel, conference, conference, conference - it can tend to burn out. So I think everyone needs to step back, but then also step out and see some people… Because that’s where the – that’s where the magic happens, isn’t it, Adam? I mean, that’s where the real relationships actually form.

I think so. The IRL is really where it’s at. I heard that somewhere, and I liked it. Then I experienced it and I was just like “Give me more, please. Non-stop.”

Well, trying to figure out where the AI thing is going… Claude 4.5 dropped today. I don’t know if anybody played with it yet… I have not. But, you know, better, stronger, faster… Uh, still not writing all the code for us…

I think I did use it today, actually.

But they said something like it can go 30 hours on a coding bender… I just thought “Well, that’s really good marketing”, because I have no idea if it’s true or not, but I was like “TThat’s a great way to describe what your thing can do.” Which is more than I can last. I don’t know, Adam, how long you can bend… Or Jose, I’m sure you’ve been on some benders in your life… But 30 hours - holy cow, man.

So that would be stronger. I’m trying to think of where it fits in the category of better, stronger, faster. It’s stronger, man. 30 hours straight. I think it doesn’t lose context, or something… I don’t know. Did you read this?

Maybe they are doing more things… So right now, they do the auto compactation of the chat… And context engineering is all the rage right now. But they do the automatic compactation of the conversation, which is summarizing it… But something that they do is also that when the context is getting filled, they just prune the tool’s output from the beginning. So there is some files or some searches or commands you ran at the very beginning of the conversation - they prune that. And that also allows them to go for long without having to summarize stuff. Because when you summarize, there’s always a chance that you are losing some data.

Right.

And something that I do with it a lot is that you can actually have conversations with the agents about this stuff. So a lot of people say “Oh, I’m using this agent, and this agent doesn’t tell me which tools you have available.” But you can always just ask the agent “Which tools do you have available?” And then you can just say – so something that I do is invoke Echo a hundred thousand times, three times… So I force it to fill in the context, and then I’m like “What do you see now on that tool that was invoked?” And then it’s like “Oh, the two output disappeared.” It’s surprised itself, like, “It now says this thing.”

That’s hilarious.

And you can trick it to crash very easily as well… You’re like “Oh, why do you have this tool? Wouldn’t you think this tool would be better?” And then you just say “Would this tool be better?” It imagines that the tool exists, and it’s like “Yeah, it would be better. Let me try calling it.” But it obviously doesn’t exist, and then the agent crashes. So I like having those meta conversations, and they get surprised, or tripped up.

[00:12:16.05] Yeah, it’s fun. It’s almost like talking to a kid. It’s just very easy to pull the wool over a kid’s eyes, and they’ll just… They’re just gullible, because they don’t have life’s experience that we do. And you can have a lot of fun, as long as keep it in good natured fun, and not trying to actually trick a kid. But with your AI, who cares? It’s a robot. Trick it all you want, Jose. Get it to do all kinds of stuff.

Have you heard - we just learned this from Feross - that people are actually using prompts in their malware now? So if you can get arbitrary code execution on someone’s computer… For instance, this was in the case of NX, which is a mono repo command line tool… And they hacked the NX NPM package, distributed some malware… And if you’re running the NX command and you’re infected, in there was an actual prompt to ask Claude Code to do stuff for it, instead of coding it out.

Oh. [laughs]

Yeah. And it was really kind of smart, because what they asked us to do was the things that are kind of fuzzy finding for humans - or for programs, I should say - which is like “Find all the interesting files on this computer.” Which, of course you could have a list of where the interesting files are, and you could search certain things… But Claude Code can just go do tool calls, and read stuff, and just hand back a list of interesting, hackable files, like secrets and whatnot. Anyways, I just found that to be amusing.

So even attackers are getting lazy to write their own code, that’s what you’re telling me.

Exactly. [laughs] This is the promised land, isn’t it? You don’t have to code anymore when you’re hacking someone’s computer.

Yeah, I wonder why that was the best route… Was it because of their laziness, or their lack of desire to write that script, or just because they were just trying to leverage a Claude Code-enabled developer’s machine? What do you think the true psychology of that choice was?

I think they’re just thinking this is the fastest way to the best result. Most programmers are like “Well, what’s the fastest way to the best result?” Well, I could write a program – and besides, I only have so much stuff I can shove in. I’m assuming the more stuff you put in, the more likely are you to be found, so maybe some compression is in there… But it’s like, if I could just prompt something to scour your computer for interesting files, that’s pretty good at it, that’s a lot faster than me having to write a program that scours your computer for interesting files. That’s my guess. I don’t know, Jose, why do you think somebody might do that? Maybe they’re just showing off.

Yeah, I don’t know…

They just want to trick a computer, you know? They want to trick an AI…

…with their nefarious deeds. So, okay, you like to mess with them… How much value are you getting? Because a survey says that we’re getting tons of value, but quantified research says that we’re not. I don’t know if you’ve read any of the research, but a lot of recent papers - a lot meaning at least more than one - have come out and said developers think that they’re more productive with AI coding tools, but it’s actually slowing them down. What are your thoughts, Jose?

Well, I have many thoughts on this. The first one is that nobody’s surprised we’re really useless at estimating stuff…

[laughs] Yes… We’ve been proving that for years, haven’t we?

[laughs] Yeah. So of course we are estimating things wrong.

Good point.

[00:15:44.13] And people call this exaggerated, like “Oh my God, I’m three times more productive”, or even twice. For me, it’s just kind of pointless, because if actually – if you’re even a third more productive, 33%, that’s kind of massive. That’s huge.

And then I think people fail to consider there are other studies where developers – I don’t remember the exact number… It’s like, we spend like 50% of the time coding, let’s say. And then if you’re using agents for coding, of course, how more productive you’re is going to be where you’re using agents. And if you’re only using agents for coding, you can only optimize that 50%, and not all the other things. So all that said – yeah, and then the other thing is that a lot of people, they don’t consider the time that they lose when something doesn’t work. Everybody’s happy, like “Oh, I used the agent. It worked. I was super-productive.” But there are a lot of times where it’s just not productive, and then you ended up trying to coerce it to do the right thing, and then it doesn’t do it, and then you quit… And then you try it again, and then it works, and you completely forgot about that bad experience. The bad experience is actually one of the reasons why I never like the AI completion suggestion… Because I would read it, and if it’s not what I want, it would always throw me out of my loop. And that time where I read it, and then I’m like “Oh damn, I lost my flow” - how do you measure that? If you’re only measuring “Oh, it was accepted two thirds of the time”, but the one third was so disruptive to me that… You know? So with all that said, I think that I get a benefit from it.

You do.

“I think.” [laughter]

I think.

“There’s my six caveats, but I still think I do.”

Citation needed, right? I joke that I would love the – we could use the Wikipedia citation need. It should be an HTML feature. We should just be able to put that everywhere, like in conversations…

After every sentence that I say?

Because the other thing is –

It’s Rigby.

Is that Samsung’s thing?

Silicon Valley.

Oh, gosh… Sorry.

Oh, yeah.

Continue.

You know I was not going to get that. You were hoping Jose is going to get it, weren’t you?

I was, yes.

Alright. Well, some people got it…

Yeah, we can talk about Silicon Valley later, but yeah. So the… Yeah, you just did the AI completion [unintelligible 00:18:32.06]

He just autocompleted the wrong thing. [laughter]

Yes. Perfect.

In the reals. In the reals.

He lost his [unintelligible 00:18:40.05] Okay, he’s back. Context switching.

So there are a couple of things that I do that I think – you have to find where it works and where it doesn’t work. And of course, it’s going to change as those things improve. So for example, I tried it a couple of times to help me work with Elixir type system stuff, and it doesn’t work. It’s going to be useless. Yeah. I’m not going to try again. Maybe in six months, maybe in a year, things change enough that it can help me with that kind of work… But I don’t feel it’s there. But for example, when working with Tidewave, because it supports other web frameworks, I often implement the feature in Elixir.

Tell the people what Tidewave is real quick, so that – the three of us know, nobody else knows. What’s Tidewave?

Yes. So Tidewave is a coding agent for full stack web applications.

So I’m going to summarize it, we can jump into it later, but the idea is to have a coding agent that is tightly integrated with your web framework. So we understand what is on the DOM, and how that maps to a template… It can coordinate the browser, so it gives it a really strong verification loop. So as you ask it to build features, it can verify that features work, you can interact with the actual web page, and ask changes on the page, instead of asking for changes on the code…

[00:20:05.19] I have this whole idea that I think we should run coding agents on top of what we produce. So if I’m working on a library, what I produce is API docs - fine, run that in an editor. But if I’m building a web application, I want to run the coding agent in the actual browser, because I want it to understand what I produce, and I want to be able to interact with what I produce, because if I can’t do that, we are doing boring translation work, all the time. Like, looking what happens in the screen, go to the editor, ask it to change things… And then the agent says “I’m done.” You reload the page, there’s an exception… You have to copy and paste the exception, back to the agent… You don’t want to do this boring stuff, right? And I say, the data science folks, they were the first ones to notice that, because they were the first ones to put the coding agents inside notebooks. They’re like “Okay, let’s run this thing inside a notebook, because if it understands my variables, if it understands my cells, they’re going to be more productive.” But nobody caught up to that trend. We kind of regressed. We first put it in the editor, and then we put it in the command line. We should be going up, right?

So that’s Tidewave… And I do think Tidewave can help you be more productive with AI… Because allowing the agent to verify what it builds is going to make it so it builds better things, things that are guaranteed to work, and are going to spend less time on that loop.

So when I’m working on Tidewave – we support Phoenix, we support Rails, we are working on Django, Next.js, and a couple others. I usually implement the feature in Elixir, tell the agent “Hey, I implemented this feature in Elixir.” Then I go to the Rails project, implement the same thing. And there are a couple of things that I tell it, like “Don’t add tests, don’t use mocks…” So there are some threats in there, but…

Wait, you say “Don’t add tests”, and then you say “Don’t use mocks”? I mean, if it’s not writing any tests, how is it –

Sorry, don’t add additional tests than the ones I wrote. Sorry.

Because the Elixir PR - it’s good. I wrote it.

Right. It has a test in there, so it’s copying those over.

It has the proper tasks, yeah. Because it tends – I think my experience with coding agents for coding is way better than testing. Because testing stands – not in Elixir, but because I’m doing a lot of Ruby and Python, it tends to use mocks a lot, and just writes a bunch of redundant tests. So it’s a whole separate discussion, but it’s really good. Like, when I ask it “Get this PR here, translate to this repository”, a lot of the time it’s just perfect. It says “It’s done”, it runs the tests, it runs the linter, and I can just push it. I send a PR, people review… So that’s really good.

So I think that’s one of the things you have to figure out where it works and where it doesn’t. Take notes of that, and find the loops and tools that make it work for you. It’s like any other tool. And I think AI has this particular problem, that some people say “Oh, it’s just magical.” It’s kind of like a lottery, in some sense. Some people go try AI, and because it’s probabilistic, they get a bad experience, and they’re like “Oh, this sucks, and I’m going to try it again”, because people come with the expectation that it’s just going to work… And then some people - again you’re trying it for the first time; by just the randomness of it, they have a good first experience. And then they start investing in it and refining it. And that’s the process - you do have to figure out what is there and what isn’t. And then the other thing that I tell people to do, which works really well for me, is to – I don’t correct the agent. If it does something wrong – or if it’s like 70%, 80% good, I just go and finish it. That’s fine.

[00:24:19.02] Well, you coded for it.

It depends. So I do two things. So imagine that I ask it, it does this thing, and then I leave, I come back, it’s like “Oh, this sucks.” I’m not asking it “Oh, you were supposed to do this instead.” Because often when it does something wrong, it’s there in the context, it has a really…

It’s just gonna keep getting it wrong.

Yeah… And then when it fixes, it doesn’t fix everything… So when it does something wrong, I usually go and I start a new chat… I just discard everything. Nobody’s going to be upset. I just discard everything. It’s like, “Okay, start again, but do this, this, and don’t do that.” I add a little bit more of context. And then if necessary, I start again.

So you start fresh, with additional little warnings or instructions.

Adam, you do the opposite, don’t you? You never write the code. You just keep telling it to do stuff.

Yeah, I don’t really –

[unintelligible 00:25:06.19] start fresh…

Yeah, I think by and large it’s writing code I can’t really write myself anyway, so it’s gonna do a better job than I’m going to do…

See, Jose is a better programmer than both of us, so he can just fix things…

We’re just different people, you know?

So what are you using it for?

CLI tools, really. I’m having fun with a Proxmox CLI where it instantiates a virtual machine with a given cloud on it, image, and it’s like a command line away, basically. It’s cool. So I can spin up a new server immediately, essentially. I can package it as a server, you can share it with me as a Git repo… It’s kind of cool. That, and I would say 7z-arch, which is a – you know, 7z is the compression algorithm… So I’m just working on a version of that as a CLI that’s just cooler, basically… Because 7z’s existing command structure is kind of not a lot of fun. It’s hard to remember. I always forget it. It’s highly configurable, and so I wrote something that was just more fun.

So does it wrap it and then call it underneath the hood, with specific flags?

Yeah, essentially. It did that for a while, until then it was like - we essentially just rebuilt something called lib7z, which is a wraparound… I think it is rust7z2, is the crate out there. So it actually acts as a library around 7z, essentially. And then you can write a CLI layer on top of that, because it’s a library. So that’s where it’s currently at right now.

That’s cool. So you don’t have to actually shell out. You’re actually re-implementing the functionality –

Yeah, precisely.

…with a Rust crate.

And you get a lot more data in that API as well. Like, you get a lot more granularity around files, and process, and progress, and you can control all the UX around the CLI that way.

We deal with a lot of large files and folders…

Yeah. So I’m just sort of enamoured by archiving them very well…

Archiving it to the best of his ability.

I also use coding agents for things that I’m not reviewing, particularly for prototypes. And that part has been really fun, because if you’re working on a product, you have ideas of “Wait, which directions it could go in the future?”, but usually before you would think about it, put on some notes, and then maybe, if you’re lucky, in two, three months somebody from the team can take a look at it, give feedback… And now with agents, you can just say “Okay, go for it. Implement this thing.”

So as I was saying, I have this idea that coding agents should run on top of the thing that we produce. And we talked about Tidewave Web, that works for our applications, I talk about notebooks… But if I’m working on a game, I want to have Tidewave running in the game engine. If I am building a mobile app, it needs to know about the mobile device, simulators, and all this kind of stuff. So I was able to – I think for four or five weekends straight, what I would do during the weekend is to come to the computer from time to time, see if the agent was working, and just have it build a different proof of concept of embedding Tidewave somewhere completely different. Like “Oh, what would a Tidewave browser extension look like? Which capabilities we get from this?” And doing that when I was – I had to do these kinds of things for other products; we were doing that for Livebook… It would take a really long time to validate all those things. And I could very quickly explore something different, get the lessons learned, and provide a way better blueprint for the team to work on.

Do you run it in YOLO mode or whatever is the equivalent, where it’s just doing whatever it wants to do and you come back every once in a while?

Yeah, totally. So have you considered a notifier, and ‘text me when you’re finished’ kind of a thing? Otherwise, you’ve got to keep coming back, “Are you done yet?” No, it’s not finished… Oh, it’s been done for two and a half hours, but I was watching TV…

Yeah, so in this case, because it’s the weekend, I don’t care, in the sense that I don’t want to be also interrupted. It’s not my priority.

Gotcha. So when you feel like it, you go over and check it.

Yeah. Yeah. Otherwise I’m using the notifications. I use Zed a lot, and Tidewave, and they all have notifications… And then I’m kind of listening, I’m waiting for them.

Oh, they don’t push to your phone, or anything?

I don’t think that – Tidewave doesn’t. I don’t think Zed does right now…

Because then you don’t have to wait and listen. You can be out on a walk or whatever, and be like “Oh, it’s done.” Maybe even like walkie-talkie it, give it its next task, you know… Keep on walking… [laughter]

Yeah, it’s funny, because I talked to Chris McCord about this, and it’s like “Oh, maybe I’m out to get a coffee”, and then I’m, look, if I’m out to get a coffee, I’m out to get a coffee. If it’s not, I don’t care. I’m out, you know?

Right…

Yeah, same…

But maybe you lose three hours of productivity, man. I mean, all you’ve gotta do is tell it to keep going, you know? It’s trade-offs. I get it, it’s the weekend. I like to unplug as well, but I don’t do any of the stuff that you’re talking about. I don’t have anything coding for me over the weekend. If I was, I’d at least want to be a good babysitter, not a neglecting babysitter. But to each their own, I guess. So you’re talking to Chris… It sounds like you and Chris - are you guys competitors now? I mean, doesn’t Chris have phoenix.new, and isn’t this – there can be only one, Jose.

Right…

“Right…” [laughter]

[00:33:44.21] Yeah, and that’s why he’s not coming on the show anymore. No, I’m kidding… [laughter] So… No, we do talk a lot about those things, and we are still bouncing many ideas off each other. So the way I think about this is that – I think there’s a very easy way to separate those things, is that phoenix.new is remote. Maybe we should go deeper in Tidewave, because there is a bunch of additional context here. So as I was saying, Tidewave is a coding agent for ful stack web applications, but the thing is that it runs on your machine. So it’s not – so one of my ideas is that… We are looking at bolt.new, lovable.dev, and they have all those things where you can click around, ask it to do changes… But it’s like, they want to kind of own your code, they want to be responsible for a code, and most of the times it’s frontend, or for React apps… And then I’m like “I want that for my Phoenix app that I run on my machine.”

So a lot of people are pushing “Oh, AI, and those app builders that are running on the cloud…” Tidewave is – you are accessing local host. So the way you would install it is that you would add the Tidewave package… So today it’s for Phoenix, or Rails, or in the future for Next, Django. So you just install the package, and after you install the package, you go to your application, local host, whatever, 4,000, and you do a /tidewave, and then the agent is running there, in the browser, and your web app running on the side. And now you can do all those things. You can go to the inspector, click it and say “Hey, on top of this element I want you to add a chart of the most listened podcasts in the last month.” So you can be very UI driven, and everything is running locally.

So for the people who are “Fine, look, I want to have phoenix.new be responsible for my code, for my deployment, I don’t care about that, and I want that thing to do everything for me”, then go use phoenix.new. I still think it also owns the getting started experience. It is the best way of getting started with a Phoenix app. You just go put things in the prompt, it’s going to build something for you, that you can throw it out. And for me, I’m like “Look, okay, I have my own thing, I already have my own infrastructure, my own development cycle, and I want to incorporate all those tools into what I do every day.”

For me, it’s like – you know when there was a trend, everybody was saying “Oh, we are all going to be developing on remote machines”, and then there was those developer containers… And that never really happened.

I remember we did the show, didn’t we, Adam? We did the show with whoever was at GitHub Codespaces.

Cloud development environments, essentially…

Yeah. Git container… Dev containers… Yeah.

I mean, yeah, I know some people use it. It is used, and you can use it –

Yeah, people do do it, but they’re on the fringe.

You can use it locally, but it’s not like everybody – because people would say that everybody would use it, right? Like, why would you have a local machine?

Right.

So I see it the same way. I want those tools for my framework, and running on my machine.

Okay, I am with you. I actually – when I heard Tidewave runs in the browser, I was like “Another browser thing, Jose? Like, they’re all running in the browser.” But actually, it’s different than that. It’s in the browser because that’s what your output of your web app goes. But it’s in your local browser, running against your local web server, with your local environment, and helping you build cool stuff right there… Which is kind of how I develop now anyways. Whereas phoenix.new was making me go into the browser and have a remote browser session, which I’ve always got excited about for the hour that we do the show, and then when I go back to my real life, I just don’t want to do that. I want to be on my local machine. I always have, maybe I always will. I’m getting old, so I’m getting stuck in my ways… So that makes me like Tidewave a little bit more than when I first thought “Oh, it’s –” Because one of my questions for you was going to be “Why the browser?” But it’s because I didn’t understand.

[00:38:15.12] Yeah. And the thing is – so we actually went through many possible designs. So over the last month we already had – let’s talk a little bit about the browser design. So we already had for some time the Playwright MCP. Somebody may be listening to these and say “Well, I can use VS Code with Copilot and install the Playwright MCP.” Recently –

Chrome DevTools MCP.

Chrome, yeah, they released their – and I think yesterday Cursor Browse came out…

What’s that?

It’s just controlling Chrome. It’s like the Playwright Puppeteer MCP. It’s just built in.

Gotcha.

And the issue with those tools is that it is a separate browser session. It’s not the one that you are developing. So imagine, for example, that you are working on a project manager, and what you need to do is that you need to implement a feature for transferring a project between two organizations. So in order for you to implement this feature, you need to create a user, create two organizations, probably make sure that the user is admin on both organizations, create the project, and then you can transfer it. And then a lot of the times the MCP is going to get stuck just in this process. A lot of the times the MCP cannot create an account, because creating an account requires sending something to an email that the MCP does not have, so now you start writing those backdoors for tests… So there’s a big amount of work. And then the fact that we run in the browser, we are literally running in your browser session. So because when you’re going to develop the feature, you open up the – you are already logged in your development version. You go to the page already. And when you’re going to validate that the feature works, you already have all that set up.

So because it’s running there in your session, everything that you do for development, the agent can do. And the agent is going to verify the things in front of you, not in a separate session. And then you can actually have a back and forth.

Like, if you’re using the MCP, imagine the agent’s like “Okay, let me test that it works.” And then the MCP with the separate browser is running, and then you see a bug, right when the thing is testing. How are you going to debug that? Because it’s a separate browser. How are you going to click things and say “Hey–” You would have to go around, say around this page, maybe there is a bug here… With Tidewave, it is your browser. I think that’s the most important thing. You can stop the testing, go with the Tidewave inspector, there is a bug here, fix it… And we also go the next step, which is that we integrate with the web framework.

When you inspect a DOM element, we know the DOM element and send it to the agent, but we also know where in the template, or which React component that thing came from. And we send that to the agent, so you don’t have to do the manual working of figuring that out with the agent. When there is an error page, we detect the error page of all the web frameworks we support, automatically feed that to the agent… So it’s really meant to be like “Look, it’s you, the agent, the browser, the web framework, in a shared context.” Everybody can see what the other is doing. Because otherwise, it becomes your responsibility. You are the ones who are getting information from all those places and passing it around.

It sounds pretty cool, man.

It does sound pretty cool. Bypassing a lot of stuff, I mean… The fact that it has – I mean, that’s something I just have always wanted, I guess. I mean, you’re going back and forth like that, it’s better to do it right there real time. I haven’t played with it to know the UX really of that. When you’re filling with - let’s say, is it a button; maybe it’s not working properly. What is the back and forth with the experience? Can you speak to it? Can you type to it? What are some of the interfaces you can think of?

[00:42:21.18] Right. So I think there are three ways that you are interacting with it. One is the usual chat prompt, with the difference that we know what is the page that you’re currently looking at. So you can talk to the page, in the sense that – so for example, imagine that you just boot up your dev instance, your database is empty… You can go to a page that is listing all the podcasts, like for Changelog, and you can say “Oh, this page is empty. Add some podcasts.” It knows which page it is at, so it can find information from the controller, or from the live view, and then say “Okay, that’s the data I need to–” It gives you an entry point. So that’s the chat [unintelligible 00:43:04.25] it has the context of the page.

The other one is the inspector. It’s like the browser inspector, so you can click it, and then you can mouse over elements. We show the DOM element, we also show which templates or which Phoenix template it came from… And then you can click it to open your editor, or you can click it to ask the agent to do something… And the other way that we interact with it is when we detect that something goes wrong, we just show a pop-up, like “Oh, you want to fix it.” And then you can just click a button and have it fixed for you.

So I think as a human, those are the – I may be missing some… Those are the three – it’s a very classic chat experience, with a few things on top, like inspector [unintelligible 00:44:00.06] But I think a lot of the part that we shine is in giving more tools to the agent. So the agent can do everything that a coding agent can do, but it can also run JavaScript on the page, and that’s how the agent can test that it implements something.

So for example, one of the coolest features that we use Tidewave to implement - like, if you go to tidewave.ai today, we have videos in the homepage. So I added the YouTube URLs – or not YouTube; the URLs for the… I added the video tags. And then I wanted to make it so as I was scrolling through the page, the videos started to autoplay. So I asked Tidewave to implement this, which it can do. It’s a straightforward feature. I can’t do it, but I assume it’s a straightforward feature for the agent.

“I didn’t look at the code, but I’m sure it was pretty easy.”

“Yeah.” [laughter]

So it implemented the thing. It implemented the thing. And then in order to test that it worked, it actually reloaded the page… So this Tidewave - it wrote JavaScript code to reload the page, and scroll to the first video, and then it runs on JavaScript to validate that the first video was playing, but not the other two. Then it automatically scrolled a little bit more… So the second video started playing, and then it ran some JavaScript to make sure that the second video was playing, and not the other two… And I think that’s the important part, because if you can see the agent doing that – because if the agent doesn’t do that, there’s a chance they get it wrong, right? And then if they get it wrong, who is paying the price to fix it? It’s you, because you are going to be the one who tests it, and then you have to go and tell it.

I thought you were going to say your users, because you’re going to push it out live… [laughter]

Also. Could be. Could be.

And then your users will have to tell you if it’s broken or not.

How do you limit it to the viewport? I assume the scrolling is either simulated, or it’s real; or it’s simulating it so you think it’s only scanning with you.

[00:46:13.08] No, it just runs JavaScript on the page.

So what’s in the viewport? So it’s looking at what you’re seeing, essentially.

Yes, yes.

It’s running in your browser. There are a lot of complexities in there, but this part is as straightforward as it can be. It can control the page. So the same way – because that’s the thing, people are coming up with all those different APIs to have the agent. There’s an MCP with 30 different commands to control the page. And I’m like “It knows JavaScript. It knows the DOM API. Just have it run things on the DOM.” It knows what is – I don’t know what is the command, but it knows what is the command to say “Hey, scroll a little bit.” Right? It knows.

So the only things – we had to intervene very little. So there’s one of the things that it can’t do - resize the browser window. I think it’s because browsers don’t allow you to do that, because of security concerns, or something like that. So there are some things where we have to intervene and add extra capabilities… But it’s just running things on the page.

Break: [00:47:24.17]

Can it take a – I wouldn’t say a fixed width, but a desktop design website and implement it. So it looks they way I want it to look on a desktop, and can you say “Make this a progressively–” What’s it called? Not progressive web enhancement. Responsive web design. There we go.

Responsive. Yeah.

“Can you make this responsive for these six viewports?” or something.

Right. Not yet.

I knew it.

“I knew it. I got you!” [laughs]

Because the resize thing that I told you –

[laughs] Not because it’s Jose’s fault, because I don’t think these things can do that anymore.

It’s hilarious though. I love it. This pursuit of rightness…

Oh, I’ll try. I’ll actually try.

Okay. Because I’ve been doing a lot of frontend lately, and I’m not good at it anymore. I’m learning… All the new tools are fancy, and they’re hard to use… I can’t figure Clamp out. I mean, I’ve been using Clamp wrong for weeks now.

Finally starting to get it to work… And none of these tools can do it either.

[00:49:50.20] So I play what I call LLM Russian roulette. So I take the same prompt – and usually, I’m like “Hey, can you do this thing an SVG, or whatever?” Like, I’m trying to accomplish stuff that I don’t think is possible. And I thought it should be possible, it’s the modern web, you know? And so I ask ChatGPT, I ask Claude, I’ll even ask Grok if I get too angry… And then I’ll ask Gemini. And they all give me different responses that are all wrong. None of them can do it. I want one that just tells me “Actually, Jerod, that’s not a thing that you can do. You can’t do that with web technology.” They’re not going to do that, because they want to make me happy. But I know that that kind of stuff - we’re not there yet, man. I’m just doing way too much work in the browser as a human right now.

Here’s how I would try to implement it… And let me know if that’s an approach that you tried. Because if that’s an approach that you tried, then my solution obviously is not going to work.

[laughs] I’ll let you know, trust me.

I hope I haven’t tried this.

So I was talking about the resize… That’s something that we identified recently, so we haven’t implemented it. So it doesn’t have currently the ability to resize, which means that it cannot validate responsive designs. As simple as that. So what I would try doing would be is add the feature to resize and the feature to take a screenshot of the page… Which there are some other complications, because the browsers don’t allow you to do it, for security reasons as well. I know how to solve it, it’s just going to take a – I’m just explaining why you’re not going to have this feature tomorrow.

[laughs]

There’s some work… And then have it look at the screenshots and see if it can see things are good or bad. How do you think about this approach?

In my experience, their ability to look at screenshots and decipher things is really bad.

It’s not there. Okay.

Like, they have vision, but it’s not precise enough, you know? And so I haven’t tried that specifically, but I also don’t think it’s going to work. I’d love you to try and prove me wrong. I would love to be wrong. But in my experience, when you pass a screenshot or you say “Take a screenshot and then inspect the visual”, 9 times out of 10, they’re wrong. All of them.

I wonder if we could use accessibility APIs –

Oh, this is hilarious…

Wait, what?

This is Jose. The guys is such –

I’m just loving the exchanges… I’m sorry.

This is Jose. The guy is such a problem solver that he can’t help himself right now…

Continue, Jose.

He’s like “Let’s debug this thing.”

So you already saw me getting off track with the AI suggestion; we saw it live. So this is also a real life nerd sniping happening right here…

Yup. We’re shaving the yak. So - yeah, what were you going to say? Accessibility APIs.

I’m wondering if we could use accessibility APIs somehow to measure size of elements, and what is visible, what is not, but… Maybe not.

Yeah, I don’t know about that. I just get angry and I just do it myself.

It goes back to what I was saying, in the sense that the way for us to eliminate the AI guessing is adding more verification tools. So if the browsers could tell me “Oh, the fonts here are too small. These things are clipping.” That’s why I was thinking about accessibility APIs. Because if the browser tells me that, then I can get that thing, which is going to be better than a screenshot, and send it to the agent.

That might actually work.

Right, but I don’t know if this accessibility API exists. So that’s why I’m –

Well, don’t ask an LLM. They’ll all tell you that it does exist, and they’ll give you the code –

Right. Emphatically.

I love when they produce SVG, and I’m trying to get a tapered border and all this kind of stuff… And they’re like “Here you go”, and then they tell me all the reasons why it’s going to look good. And I put it in there, and I put it in there and I’m like “Dude, it looks a bow tie. You just drew a bow tie.” It’s so far off that I have to laugh, because otherwise I’m just going to cry, and just be like “Why am I even wasting my time with you guys?”

[00:54:05.12] So there’s certain things where they just have these inadequacies, and they’re all inadequate at this point, in my experience. I haven’t done 4.5 yet, so maybe after this call I’ll go see if Claude can do this… But I don’t know. I don’t feel like I’m pushing the envelope. I feel like I’m kind of just an intrepid person trying to get something done, and thinking that you can do things that maybe you just can’t even do in the browser right now. But I think being able to develop out a simple – I’m not going to say fixed width; desktop-styled website, and say “Make this responsive” - that should just be a thing. Don’t you think? If you build that in Tidewave, Jose - I mean, people are going to line up with their money.

I think so. I mean, because that’s the thing… I don’t want to do that work, so I hope that AI can do it.

Yeah, totally.

That’s the thing.

Like, I can also do it, but it’s just slower and tedious. And that’s what the promise is, we don’t have to do this stuff anymore. And I’m not good at it anymore. It’s just – it’s guess and check. I’ve got to guess and check. “Oh, it’s still too big. Now it’s too little.” Alright, I’m done complaining. Adam, take us somewhere else. I’m airing my grievances.

Well, one thing I was going to go back to was – Jose, I think one thing you were mentioning was how when you scroll Tidewave.ai as you see these movies come in… I’m actually back, I think, 15 minutes potentially, but you were describing this page here… And now that I’ve actually caught up and I’ve scrolled it, maybe that’s where we can go. What was the a-ha moment here when you did this? Because you said you were kind of going back and forth. Did you not do any of this design yourself? Did you just sort of prompt it? What was the experience like for getting this page to be this?

So for this page in particular, we were just doing the design of the page, and then we knew we wanted to add all the scrolling… And then we just asked it to do it, and then it did it… And I think what was surprising about that is because… I mean, it’s obvious, but that’s exactly how–

The autoplay of the videos was key, right?

Autoplay video, but it wasn’t the autoplay. It was how it tested itself to know that it got the autoplay right.

Yes. And that’s exactly how we would test it. I mean, it’s obvious that the way you test the autoplay scrolling is by scrolling.

You scroll and you watch it autoplay, and you make sure the other ones aren’t. But it’s just running JavaScript.

But it’s really nice to see it happening by itself. And then it goes back to other stuff… Tidewave has access to everything. Another way that I like to phrase this is imagine you’re working with somebody, and somebody sends up a request, and then you open up the work they did in the browser, and then they’re like “Wait, this looks bad”, and then you go back to the person, like “Did you look at it in the browser? Did you try it out?” And then the person says “No.” And then we’ll be like “What?!” You have to test things in the browser, right? Or I use the repo all the time as well. It helps me develop a lot. But we are asking coding agents to develop without the proper browser, without a repo… So Tidewave gives all those things as well.

Oh yeah, you asked about what are the user tools, and I started talking about the agentic tools. So one is coordinating the browser, but the other one is that we also give access to a repo running inside a web application, because we use the repo for development… Why we’re not giving one to the agent? I would be a worse developer if I didn’t have a repo. And then we have MCPs for like “Oh, you can install an MCP to talk to Postgres.” But then I’m like “My web application already knows how to talk to the database. It already has all the credentials in there. Why are you asking me to configure a separate thing?”

[00:58:02.28] So a lot of the times it builds a feature and then it tests the feature in the browser, and then it does a database query to make sure that the change also happened in the database. So that’s kind of – yeah, so we’re going back 15 minutes, but that’s closing the loop of what are the tools that the agent have… And the whole purpose is to make sure they’re producing something that is really good, and I’m not going to waste my time telling it obvious things, like “Oh, the video actually doesn’t play. Oh, the change was not actually saved to the database.”

You mentioned – I think one thing you mentioned there was MCP servers. Have you, Jerod, messed with MCP servers at all? I really haven’t, personally.

Mostly just Figmas.

And I told you my experience with that, which was that it was nobody’s fault, except for the state of the art is not quite what I needed, but…

Yeah. Jose, I imagine you’re probably playing with them heavily… How exactly does that fit into your flow? Because from what I understand it just adds more tooling to the context window, which is already kind of small, and so we’re always battling that auto compression, or just having to refresh the entire chat whenever you feel like it, I suppose… How do you work in those kinds of tools, into your workflows, without I guess bloating the context?

I actually have a hot take in here… It’s not a unique hot take, but – so to answer your question, which is going to kind of reveal the hot take, is that…

He’s just teasing it. He is not going to tell what it is. He’s just teasing the hot take. [laughs] Stop setting it up, Jose, and give us the hot take.

So almost all of our APIs write code. Oh, you can execute code in the context of the web application, you can execute code in the context of the web page… That’s it. Because we are doing all this dense – oh, I said about the database. “Oh, I’m going to have an MCP for the database.” No. My web application already knows how to talk to the database. Just use that. “Oh, I want to have an MCP to talk to GitHub”, and I’m like “Well, I already logged you in GitHub in the browser. I already have the GitHub command line. Use that.” Right? For coding agents, we are even going as far as adding MCPs for documentation… And then I’m like “Why I’m going to a separate website to get documentation? You are a coding agent. The code is on your machine, and usually with the code you have documentation. Why don’t you use the documentation that is there, already on your machine, with the exact version that you’re using?” Because sometimes you go to the remote server, and then we get the documentation for Phoenix 1.8, but we are still on 1.7…

So for me, the answer for the context thing is I’m going to have just a small amount of tools, and what those tools are going to do is that they can run code. And I’ll let them do whatever they need to do. So trying to keep the amount of tools minimal and powerful. And this take that “Oh, MCPs are too much. You probably just need to code”, that’s kind of – I’m not the first one to say it, but I also think MCPs… The user experience, the developer experience around MCPs for coding agents is really poor. I mean, to be fair, it’s new, it’s still evolving… It’s probably six months old at this point.

So we have issue where – so one of the MCP tools that we’re using - it was working for GPT-5, but not for Gemini, and then we fixed it for Gemini and it broke for GPT-5. Like, if the server disconnects, they cannot reconnect again. There are all those sort of annoying issues there. And then… Do you know about the Figma dev mode thing?

Mm-hm.

[01:02:15.16] So there’s an MCP in Figma dev mode, so you can run Figma on your machine, there’s a desktop client, and then I can go to Figma, inspect an element, click a component that I want it to implement… And then, you know what the workflow is today - I have to go to Figma, click on the component, and then I have to go to the agent and say “I have selected a component. Please implement it.”

And then, when it’s done, you have to redo it yourself.

[laughs] That’s my experience. “Oh, good job.” Not good. Not good. It doesn’t look like it’s supposed to look like.

I already clicked the thing. Why do I have to go back and tell you that I already – you know?

Because they’re separate tools. They’re distinctly different tools.

You’re meant to have a protocol for it. Those things should communicate.

I know you know the answer, I’m just saying it out loud. Tidewave, it’s all integrated.

It’s all integrated. And I actually want to hold as much as possible with actually adding MCP support to Tidewave, because I think we will have better integrations if we do it by hand. So for example, when I do Figma for Tidewave, when you click on the Figma thing, we will know. And we just tell it, “Oh, you want to implement this? Oh yeah, just click the button. You don’t have to type anything.”

Just click it. Now, is it going to do it right? That’s up to your model, right? And Tidewave is just using whatever model you bring to it, basically…

Yes. What we can do is that we can help it – you’ll be able to click something on Figma, and then click something on Tidewave, and then we’ll be able to say “Oh, you should implement this, and this is exactly where it is.” So we can improve the experience there. But when we send all the information to the agent, if that’s going to be ultimately better. And the agent will be able to validate that some things look good, like it did with the video and the scrolling. So we are giving it more tools to verify that it did a better job than it just working blind… But ultimately, yeah. And I think that’s going to be true for a lot of things.

So there is a tool called Conductor, that - for example, they added a GitHub integration. And one of the things they do is that they know which Git branch you’re using. So in their GitHub integration, they know the comments dropping a PR for that branch, and they automatically surface that in the UI. So you can ask the agent to solve a comment as somebody’s commenting on GitHub. So those sort of experiences, doing through MCP, it’s just - ugh. It’s like “Oh, get all the comments from me.” Right? It’s like…

So I really think that for coding agents – I don’t want to generalize this too much, but I think for coding agents, for a lot of the things, you can build… Like, MCP doesn’t allow you to push information, right? So that’s what I’m complaining about. GitHub should be able to push information for the MCP, like “Oh, there’s this comment. Oh, I click this on Figma. It doesn’t support that.” And we’re not talking about even the security issues. So I feel lik – yeah, I want to give you a good package, with everything. At the point you’re telling users to “Oh, just go and install those different MCPs”, you kind of gave up on the developer experience, because it’s not there yet.

I would tend to agree, I think, with that.

How hot was that? Was it way too much teasing for not too spicy, or…?

[01:06:09.18] It’s a lot of spice in there.

A lot of spice.

It’s a variety of spices.

There was some hedging around – I just feel like you could have dropped it a little hotter, and then…

I could, yeah.

[unintelligible 01:06:19.27] could have gone ghost, you know? Gone ghost.

And also, I just tend to agree. I think MCP servers seems to be like a builder-driven technology right now, versus user-driven. I feel like it’s – it was so quickly adopted by all the builders, and as users we’re kind of like “Were we asking for this necessarily? And could you do it so that it was –” I would say more transparent perhaps, or maybe just user friendly for us as end users… But man, I’ve never seen an API or specification or a protocol get built out across the entire tech industry so fast. And we’re talking less like than a year.

30 days.

Well, I mean, from their first announcement back in November, I believe, was when MCP was announced by the Anthropic team, less than a year ago… And nobody paid much attention to it for three months. And then all of a sudden in the spring-ish, it’s like, everybody just started building MCP servers. Everybody.

That’s true, yeah.

And then as end users we’re kind of like “Did we ask for this, or…?” I don’t know. I’m not sure what – I don’t know, do you don’t want to be left out? I’m not sure why everybody just thought immediately we’ve got to do this, but… It was pretty interesting to behold that.

I don’t have a lot of context around MCP servers, because I don’t really use any of them. But when I think of them, it’s more like a CLI tool that’s on the system already… Rather than pollute my context window with a tool that’s an MCP server, why not just have a tool on the system that you can use, not have to be an MCP –

Instead of using the GitHub MCP server, you might use the GitHub CLI to access data from GitHub? Is that what you’re saying?

Right. If I already have GH installed and it’s already authenticated, why not just use the tool, versus some sort of MCP server that just is in my context? Like, why does that have to be in my session and configured? It’s also instrumentation of tooling… It’s a lot of ceremony. It’s a lot.

But even if you say “Look, what if you don’t have the GitHub command line tool?” Yeah, have it right – like, it can write Elixir, it can write JavaScript, it can write Python to talk to the API. Then, of course – so I think the authentication part of the MCP is interesting, because if it had to write a tool, it would have to ask for your credentials somehow. So that part is good. But it feels like – in certain ways, that’s probably all we needed, a way for the agents to ask… Which - we have OAuth, and other things… A way for the agents to ask for your permission to talk to some API on your behalf. Because if it has the code, it could also do things like “Oh, it can ask information. It can get the raw data from GitHub.” Then use whatever library to compute the information that you want. Right? And give you a better result than trying to do with the MCP, getting plain text, and then maybe doing something interesting with that… Right? We have those things that are really good at coding, and we are sometimes dumbing things down to a text interface, while they could write code.

I like to think – one of the questions that I ask myself… People are talking a lot about personal devices. We’re going to have our personal devices that are AI-augmented, and this kind of things… And I say, that thing needs to know how to run code. Because how can you have some generic personal assistant can do everything, and that thing cannot run code?

Any assistant of mine has got to be able to run code.

That’s right. “You better run code, I’m telling you… Get out of here…” [laughs]

[01:10:20.01] It’s like the first thing on the resume. “Can this person run code?” Well, yeah, I tend to agree. I think MCP is an interesting phenomenon, and most widely adopted by builders technology that I can think of in history. So there it is. It’s there now, but not necessarily – it didn’t necessarily have to be there. And there you have it. Spicy, spicy Jose.

What else? I mean, Tidewave - you’re trying to make a business out of this thing? You’re trying to make a living? What are you trying to do?

Yes, so it is a paid product… We considered it a little bit, but then I realized “Well, this is an AI thing.” It’s a very rapidly changing landscape, so if you want to be able to keep up and feel invested in these, and continue improving it, and also support different kinds of frameworks, we need to find a sustainable way of doing that. So… Yeah, we’ll see. The launch was pretty good, we got a lot of people excited… But it also pointed out – so today’s like “Bring your own API key”, and the future that people ask the most for is Claude Code support. So being able to bring Codex and Claude Code. And yeah, let’s see. I think in the email I sent you folks my product history; my history of building products has been cataloged by the Changelog…

Yeah, pretty much. We’re doing our job there. The Changelog of Jose’s products, you know?

Yes, yes.

It really is.

Yeah. So there’s Livebook, which is also running, and now Tidewave.

And what about Elixir, man? Is it done, or…? Are you done with it, or still working on it?

Still working on it. It changes. So around now, there are a good amount of Tidewave things happening; it’s fresh. I think we are about five weeks since we launched. So we just launched it, and you know, when you launch something like that, there’s a lot of work. Feedback, and prioritizing… I think it’s kind of like about half/half of my time on Elixir, half on Tidewave… But otherwise, most of my work is still going to the Elixir type system and Elixir work.

The other thing that I want to do – Tidewave is an example, but going outside of Tidewave, one of the things that makes me excited about AI is because we can look at the tools, and find ways to improve and build new developer tools… And I’ve been exploring some ideas around those areas. The tests that the coding agents write, I usually don’t like them. They are redundant. And I think a lot of people don’t pay attention to – or they use too many mocks. A lot of people don’t pay attention to quote quality in tests. “A test is a test.” So I’m trying to figure out ways of improving that. So for example, when the agent’s writing tests, can we measure coverage and guide the agent to write tests based on coverage, but also give information about “Oh, those tests, they are redundant. They are pretty much checking the same lines of code. You can try unifying them.”

[01:14:06.10] And the cool thing is that we are thinking about those things because we want to automate the agent, but a lot of it translates to better developer tools. We released this for the agent, but developers can also use it. So I think a lot of the work that we are doing right now will feed back into better tools. Even when I’m working on Tidewave, a good amount of the work will eventually feed back into better tools for Elixir and the community, too.

Right on, man. Well, keep fighting the good fight. Always love talking to you, and I always love hearing what you are working on. I am going to give Tidewave a ride, in earnest. I have a Rails app now. You know, we have an Elixir Phoenix app, so I can use it in both contexts… And I’ll let you know what I think, give you some feedback.

L Yeah, let me know. And then, right now it’s either bring your own key, or you can use your GitHub Copilot integration… And then hopefully in about a month – what are the tools that you use today?

So I use Claude Code, I have ChatGPT Pro, but I don’t actually use Codex… I’m not sure if I get Codex with Pro. I have Gemini CLI… I don’t know – it’s very confusing; when you get Claude Code, do you also get tokens for the API? I don’t think so. So you buy those separately. I’d rather not buy more of those. Is this why people want to bring their Claude Code subscription? Because they get –

Exactly. Yeah.

Yeah, I would love to do that, and not have another toll bridge, toll road, as Adam calls them…

Toll booth.

Toll booths. So that’s my current setup. Adam, what are you using? You’ve got some AMP subscription maybe?

About five bucks left in AMP. I still can’t hold it right. It’s always expensive for me. It’s really great, though. It’s so cool how it works… It’s really one of the best, but I haven’t found a way to hold it in a way that isn’t expensive.

Oh, okay.

So Claude Code primarily. Same, I have an Anthropic key, but only because I think one thing had to have it, and I think I got the trial balance they give you. I’m still on that. So there’s nothing past that. But Claude Code. Augment Code I like as well. They’re cool. AMP - I still like AMP, it’s just I haven’t found a way to make it not expensive for me. I don’t know. But it is really, really, really good when it does it’s thing. It’s good.

Is there anything in particular that you like about it?

It seems to be just – it’s got this oracle… So speaking to AMP Code, it’s got an Oracle where it can go back and consult. It’s kind of “Ultrathink”, now that I think about it, Jerod. It’s not quite that, but it’s a bit more, where it’ll go into a deeper understanding of coding patterns, and a learned behavior across, let’s just say a Rust CLI ecosystem. Like, how do those work generally? What are good patterns? And it will come back and tell you stuff like that.

So I find that its research and its ability to execute in a hands-free YOLO environment is just – it’s really good at that. You wind it up on the right thing, with the right research, the right context, the right everything… It just plows through it, for hours, and just does amazing work. But it gets expensive if you don’t work with it and babysit it.

Do you prompt for the oracle, or it automatically figures out that – it’s like the plan mode in Claude Code, or it automatically figures out “Oh, now’s the time for some oracling”?

Yeah, I think it does it on its own desire, but you can also say “Hey, in this exercise, go ahead and prompt the oracle as well. Tap them, get them involved.” I don’t know, it feels cool. It does it, and good results come, I guess, but… You can either prompt it yourself, or it just kind of does it when it needs to. I am not an AMP Code expert by any means, but that’s how I experienced it.

[01:18:14.27] Yeah. So wrapping it up – yeah, right now it’s bring your own API key. And Claude Code, your Open AI subscription or Claude subscription does not give an API token. And we cannot actually – because you know, Claude Code is just using a Claude API. But we cannot use that API. It’s actually not legal according to Anthropic terms. So we decided to not do that. That’s why we’re working on the whole Claude Code, Codex integration kind of things. So either bring your own key – but really, at the end here, I really would recommend giving the GitHub Copilot a try… It’s confusing, because Microsoft calls everything Copilot… But there is a GitHub Copilot plan that gives you access to a bunch of different models… And it’s a predictable plan, in the sense that – the thing with paying for tokens is that it’s very hard for you to predict how much it’s going to be… And the GitHub Copilot subscription is per messages, which at least improves the visibility a little bit, and it has a basic plan quite affordable. So that’s a good way to try it out for now, to get some feedback.

And yeah, we are hopefully launching Claude Code, we are – Zed released something called ACP. I don’t know if you saw those news, with the agent client protocol…

Yes, very cool.

So you can talk to Codex, Claude Code, Gemini CLI… So we are building on top of that. But it’s a bunch – it’s work, because you’re running on the browser, and ACP is an IO protocol. So you can figure out all the hoops that we have to jump to make those things talk to each other. But yeah, hopefully we’ll be launching that soon, alongside Django, Next.js and so on.

I wish Anthropic would just give you, when you get some sort of subscription, they’ll just give you a token or a key that you can use against that subscription at the same pace that you use Claude Code. I guess they’re just subsidizing that to death, and then don’t want to subsidize their API… Because I can pay 20 bucks a month or whatever it is and use the dog due out of Claude Code… But I’ve got to pay 200 bucks or 500 bucks equivalent to use this API the same amount. I just made those numbers up, but you can see the discrepancy is there. It doesn’t make sense to me. I guess they just want you using their CLI a lot.

I would say it’s not even about how much the cost, it’s just how predictable it is. Right? Because –

Yeah, because you don’t want to get dinged for making the bad prompt. You want to set – I’m fine with 20 bucks, 40 bucks, 50 bucks a month, but just because I use it a lot, don’t give me a – I tell it to Ultrathink, and it’s like “Well, that’s $17 for that Ultrathink”, and it was still wrong. [laughs] Like, can I get my money back, too? Are there returns?

I know, right? Service degradation is a real problem for me.

But here’s the thing… I actually think pushing people towards using Claude Code more, or Codex, and building on top of those tools, like with ACP, is not actually a bad idea.

Okay… Why is that?

Because here’s – okay, let me tell you a story. A quick story. I know we’re going – so when we first implemented Tidewave, we focused on Anthropic and the Claude models. And if you go to Claude Code prompt, it has things like “You should be concise. Don’t use too many words. Use four words.” I think it even said at some point a one-answer word is best. And of course, it doesn’t listen to that. It finishes the feature, it just like dumps four pages of text about the thing that it implemented, that nobody ever reads.

[01:22:36.08] So when we did our prompt, we tested with those things as well. It does improve a little bit, but it also says things like “Don’t write a code comment.” And it always writes a code comment.

Ayway… So we wrote the prompt, and then when GPT-5 came out, we decided to give it a try, and start supporting OpenAI… And it was very curious, because it would say “Hey, implement this feature”, it would do all those things, and then at the end - done. And then you would ask something and it would say “Good.” And then we realized that the prompt we had for Anthropic that was saying “Be concise”, GPT-5 was actually listening to that prompt, and it was being concise. That’s why it was just saying “Done. Good.” It was not doing any fluff, or anything. And that’s when –

That’s hilarious.

…you realize that you actually have to come with – if you’re building a coding agent like I am, you have to actually build a prompt per model. And now GPT-5 Codex came up with its own prompt that is different from the GPT-5… So just doing that, fine-tuning the prompt per model - that’s a pain. That’s boring work. That’s not something I want to do. And then it gets the other thing, which is then they have the tools… And at this point, those coding models, they are becoming so important for those companies that they’re actually fine-tuning how it should send edits to a file. They are fine-tuning the models for that. So when the GPT-5 Codex model came out, they also said “Look, this model is best at sending this kind of diffs and edits over the wire.” So now I have to implement the specific editing tools per model that I support… And then each of those models come with their own context engineering techniques… So at that point, if you’re like me and you’re building a coding agent, you want to be able to get that infrastructure and build on top. And then it comes with a nice thing… So going back to the hot take, if you’re building your agentic tooling for coding, instead of doing the MCP - don’t do the MCP. Build on top of ACP, and have control of the agent, and use all those things, and extend that instead.

[01:25:09.13] And with the announcement with Claude Sonnet 4.5 today/yesterday, they actually recognized that. They renamed the Claude Code SDK to Claude Agents SDK. They moved a couple of things around for it to be a better SDK for people to build on top. Because I think that there is a lot to gain for leveraging everything. They are tightening the models, those tools, and we want to be able to leverage that.

Right. That makes sense. They’re putting a lot of work in to take that model and make it an agent… And there’s no reason why everybody else needs to do that work as well.

Well, how would you go about building on that right now? Where would you go? What’s the starting point to building that right now?

I don’t know. What’s the website?

Oh, I don’t know.

Zen.dev/acp, or something?

No, I would just search for “agent client protocol” on Google and see where that –

Agentclientprotocol.com.

Alright. All together, right?

All spelled out, yup. Alright. Of course, when you google that, it will be your first hit.

And I believe there is an SDK type script. I don’t know other languages right now, but the protocol is –

That’s the only language that you know? [laughter] There’s a hot take. You heard it here first… Jose only knows TypeScript. That’s when you know it’s getting late. I just autocompleted you. [laughter]

Yeah… But I can see it becoming more and more important, and we are going to see more SDKs. But the protocol is also relatively straightforward as well… So yeah, I’m hoping that – I can really see a lot of value in there, and I hope it’s going to catch up to the point where – because I think Gemini’s CLI is supported, built-in in the CLI… But when Zed released support for Claude Code, for example, it’s because they have a wrapper. So I hope it grows to the point where the CLIs, they are coming with built-in support for it. And then I hope it grows to the point that – which of the big providers have their CLI’s version right now? It’s Gemini, it’s OpenAI, and it’s Anthropic. Grok, they don’t have theirs, I think.

ZAI, they don’t have theirs… So I actually hope those other companies, they start providing those CLIs as well, with all those things we have been talking about in the sense like “Look, here are the optimized diffs. Those are the things we improved for”, so we can move to the point where we are all building on top, and not particularly inventing that wheel. So I really hope it grows.

More CLIs. Give them to me.

There you go. Thanks for hanging with us, Jose. It’s always a pleasure, man.

My pleasure. Yeah.

Alright, bye –

Bye, friends.

Ooh. Synchronized.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00