On the use of AI in development
TL;DR
Developers are the canary in the coal mine for real-world productivity from LLMs. AI allowed us to release a massive new version of TalSource. Vibe coding accelerates development right until it leaves you stuck in the mud. How I program has changed more in the last 12 months than in the previous 12 years, but old lessons remain relevant.
Warning to my dear reader: This is a long and somewhat technical piece. If that's not your thing, this isn't either.
Releasing v1.0
In the first week of January, TalSource launched version 1.0. It represents a step-change improvement over our existing platform, both in terms of underlying technology and from a UX perspective. Comparing what is in production today to where we were a year ago - we've come a long way.
From a tech perspective, this release is a beautiful case study of things you traditionally should not do because they are risky - which, despite the odds, turned out okay. [1] There are some lessons I thought I would commit to paper. Let's see how outdated this will be in a year.
We switched out multiple critical elements of our tech stack at the same time. It landed in a massive drop and pray, fix-forward release. Much sweat went into planning, executing, and worrying ahead of time. Grey hairs were still grown during the long hours of the weekend, and I'm still waiting to hear back on that sponsorship deal from Red Bull.
In the end, given expectations for this type of vodka release, it went okayish. The enabling factor was an AI empowered development workflow. As I write this, I can feel eyes glazing over - another vibe coding enthusiast who doesn't know how to code.
But that's not what I'm talking about. I'm very skeptical about vibe coding. As is any reasonable developer with a couple of projects under their belt that went sideways and descended into chaos - long before LLMs were a thing.
How I got here
Before joining TalSource as a co-founder and CTO, I was taking a sabbatical to look after our first kid who was born that year. The 5 years before that, I was working at Swiss crypto unicorn Sygnum first in a product and later in a developer role on tokenization solution and integrations with the core banking solution. I learned from many excellent colleagues about how a complex tech architecture works through the stack - from frontend to DevOps. I had been programming since I was 15 years old, mostly on solo projects, but had not worked in software engineering in a professional capacity prior. I love building things and like to think of myself as a fast learner.[2]
How TalSource got here
I was hesitant initially to join TalSource but did offer to help my co-founder Vijay build some basics to get him started: A traditional web application to present candidates to our clients in good old Potemkin village startup fashion.[3] I did this mostly because I like Vijay, and because I had the capacity - looking after babies is tiring but not exactly mentally engaging.
Without overthinking, I defined our starting tech stack as:
- Node with Express in the backend + Postgres
- VanillaJS, HTML, CSS
- Selenium for e2e testing
- Monorepo structure
- Heroku deployment with a rudimentary GitLab CI/CD
The biggest factors in the decision to use this stack were mundane:
- KISS - keep it simple, so I can move fast
- Prefer tech I'm familiar with and have worked with over the latest fad
- Reuse elements of previous personal projects to speed up the boring parts
- Avoid vendor lock-in and difficult to undo decisions
- Expect most code to get rewritten anyway or thrown out as irrelevant
With that we were off to the races. We built some basic features for the early customers my co-founder was able to hustle in the door. A few months in, after continually reminding Vijay that he needs to find a tech co-founder, especially one who knows AI, I figured, if I'm already knee-deep in, might as well learn a bit about AI. Turns out this AI stuff was very fun to work with.
We iterated on the product. We tried different things. I got lost in the weeds a couple of times. I began irregularly using ChatGPT (the web interface) for selected simple development tasks. It was impressive, like a magic trick, but frankly limited in impact. On recommendation from friends, I started cursorily using Cursor. It was an improvement, but to be honest, it didn't really click at the time. There are probably two reasons for that:
- Effectively working with Cursor requires a focus on different aspects of development that is unintuitive to someone used to writing code
- The quality of Cursor output improved massively over the last 12 months. I suspect the big jump was when Claude 3.5 Sonnet became good - but my experience was more of a drawn out crescendo of consecutive "Oh it can do that now? - Nice."
Video insights
"Wouldn't it be cool if we had videos of our candidate interviews with chapters like they have on YouTube?" my co-founder said. My first reaction was gulp. Having spent time as a product manager, one develops an intuition for what is easy, and what could potentially be nearly impossible.[4]
"Let me take a look at it," I said. Maybe there was a way to integrate an existing product or library, or a shortcut to get to an 80%-solution with economic effort. I had never done any processing of videos from code or even the command line.
I started playing around and doing research. On this topic, like many others, some rather niche subject expertise is required. There are countless gotchas that will trip you up. I started leaning heavily on ChatGPT and Cursor, and I got it to work on my machine surprisingly fast.
I then realized that Heroku was never going to run this sort of load (1-hour video files get large) in a way that would not break our bank - so over to AWS it was. That is not exactly my area of expertise either and AWS is the online equivalent of the Paradox of Choice's marmalade counter with 100 flavors complicated by unreadable price tags.
It was messy, but it worked. It would have taken unacceptably long, had I had to spend the time to learn what I needed to know to get it done. LLMs shortcut the learning process massively for me here. We shipped the video insights feature. It's not perfect, but it works. Our clients loved it - it saved them valuable time, and despite our original suspicions to the contrary, candidates didn't hate the idea of being recorded either if it meant they would not have to tell the exact same story multiple times through the recruiting process. The experience planted a seed of confidence in the approach.
Featuritis and its symptoms
Over almost two years, we'd been talking to clients and internal users, iterating and refining the product and service experience. The scope of our product grew in parallel. We focused on trying and building features as fast as possible to get feedback. What suffered in the process was the internal admin experience. While the client-facing interfaces were kept as simple as we could keep them, on the internal side, we made an implicit decision to prioritize features over user experience. It was starting to show. The design choice to use VanillaJS and plain HTML and CSS in the frontend was starting to become painful. There are only so many handwritten refresh loops you can cram in before things get finicky.
Riding high and free
Fresh off the success of the video insights feature, I started enviously eyeing frontend frameworks. Tailwind for CSS and a modern frontend framework like Svelte and React started looking attractive.
I spent a weekend researching Svelte, then set Cursor loose on the codebase and prompted: "Migrate this page to Svelte." It looked nice. I did more of the same, started tweaking it with further instructions, thinking of the user and UX. It reminded me of when I was a product manager giving feedback to our frontend engineers at Sygnum. Hmm, maybe we can just redo the client facing interface with this, I thought. Then I showed it to my non-technical co-founder who was equally enthusiastic.
"I think we should revamp our admin frontend. I think we can do this in two weeks," I told him. Everything was going swimmingly; I was making massive and fast progress. I was flying in a framework I barely understood. It looked good. With a working and battle-tested backend in place and an existing UI I could point the LLM to for reference "do it like this but make it prettier and change XYZ."
Our existing testing framework and e2e tests with Selenium didn't play well with the new frontend that shipped with Vite as a build system. I decided to switch over to Playwright - it seemed faster and less crusty. Unfortunately, testing only the admin UI without the client UI was like trying to rock a chair with two legs. The client facing UI is simpler anyway, maybe I'll just migrate that as well, I decided. I told my co-founder that with the increased scope it was going to take just a bit longer. I showed him the work in progress UIs, and he was as excited as I was.
Things moved nicely along, the front-end was looking like a million bucks. It worked 80%. If you've built applications, you probably spotted all the warning signs I wantonly passed. I might have seen them too, had I not been too busy enjoying the wind in my hair. Vibe coding is a powerful sensation of unearned power.
Valley of doom
It worked 80% and then I hit a wall. It was something silly, something that should have been easy. But the LLM kept getting stuck.
The pattern of:
- "Please fix A" - and it did while breaking B.
- "Please fix B again." Which it did, but then A was broken.
- "No fix both A and B."
- "I've fixed it, look," it said, while holding up clearly wrecked code.
I'll just fix it myself, I thought, rolling up my sleeves, and I truly looked under the hood for the first time. What I saw was horrifying - 1000+ lines of code Svelte files. One tangled gigantic mess.[5]
It might not have been half as bad initially, when the LLM was still following the structure from the old frontend. But after I had started to fix it, every failed change made it worse. With an AI empowered development workflow, I had to relearn the lesson that it is best to work in small chunks of progress. And that sometimes, it is better to retrace and retry rather than to try to power through.
Funny how fundamental lessons need to be relearned, if you think you're in a different context. It was all adding up to a real oh shit moment - now I'm here; committed and I don't know my new framework (Svelte).
Digging my way out of the hole
When the excavator fails, you break out the pickaxe and shovel. I started doing what I should have been doing from the start: Imposing my own structure and logic on the mess. Refactoring and cleaning up code. Breaking it into smaller logical chunks and composing them properly. Separating concerns, ensuring well-structured interfaces. Mostly by hand but occasionally the excavator was still helpful. LLMs can be useful for refactoring if you tell them what you expect to see and send it back until it gets it right.
The advice "If you find yourself in a hole, stop digging" probably needs a caveat that sometimes you do need to dig yourself out - just sideways not down. We had tasted the superior UI and UX that our product could have. The test library was running 10 times faster. We wanted this - now I had to make it happen, even as it was turning into a nightmare of a release.
Functionally, we were now looking at a rewrite of a significant part of our codebase. But it also offered an opportunity to bake in all the learnings over the past year of experience with clients and internal users.[6]
Two weeks turned into three months. Eventually, it got done. And it was released. It mostly worked. After a scramble and several bug fix releases later, it really worked. We had labored hard.
After the release is before the release. Still, I figured I'd take a moment to put pen to paper. I'm by no means an early adopter of an AI development process. Despite this, there may be some useful nuggets of insight for developers just embarking on the journey of finding a good way to use the powerful new tools now at our beck and call.[7] I will attempt to draw some general lessons.
Implications
AI may have landed in the daily lives of developers, but much stays the same: Good developers remain good developers. Code still needs to be written so humans can read and understand it. Clear communication is as relevant as ever. Being able to think coherently and understand context stays a prerequisite for building good product. Understanding customers remains critical; as is an intuition for UX. UI may wax and wane; data is eternal. The core skill that becomes more valuable is judgement: taste.
Writing frontend code with LLMs works decently well, IF you're working in the crumple zone (constrained failure scope, unlikely to corrupt data, security is handled elsewhere) AND are organizationally able to quickly iterate to fix when you do break something.
I generally trust AI to drive on unimportant side roads, but I'll guide it with a heavy hand on any key interchanges and junctions. Where it matters, the design there must be mine. I may ask AI to fill in potholes, but I will double check it hasn't instead walled off a section of the highway (after all if the cars can't get to the pothole, the pothole is no longer an issue, now is it) or spontaneously decided to replace a concrete overpass with the quaint rope suspension bridge from the Temple of Doom (it still allows you to get across the river; what's the big fuss).
On the backend, I'm less enthusiastic. I review every line, think through the security implications on every endpoint. Often, I still write mostly by hand or edit an AI draft until I might as well have written it from scratch.[8] I'm frightened of code rot. If a few leaves fall off, the tree will be fine. But I cannot allow corruption to grow into the trunk.
I do generally review code more aggressively, even as I write less from scratch. As a developer, I'm fully responsible for the structure and health of the codebase. In the data structure, lazy compromises are rarely a good trade-off anymore. Code quality issues and refactoring can no longer be a backlog item to be scope-pushed from release to release - the issues need to be addressed shortly after they arise. Since an AI empowered development workflow increases the raw quantity of new code, the chance of breaking things in a complex interrelated system, the criticality of testing is higher than ever before.
As I'm developing with LLMs, I've noticed a pattern that feels a bit like building a dirt berm. You pile on for a bit, carefully adding new things. Then every now and then you need to stop piling and start compacting. Going through the new loose dirt added and packing it down, to create a solid foundation to pile back on again. The earlier I compact the code, the less trouble I run into. Whenever possible I try to compact before committing. Failing that before pushing to staging. But when I wait too long with compacting and get overenthusiastic with piling on, compacting suddenly gets harder.
On the business side, as TalSource feels the spring breeze of incipient product market fit - we will look to bring on an additional developer this year. The requirements for that developer are different from what they might have been 3 years ago. We can't bring in a junior to offload low-priority tasks. We will instead be looking for someone with experience and taste, someone who has built codebases into a corner before and knows the difference between good code and bad code.[9]
The bigger picture
Writing code has arguably turned into the first killer app for LLMs. This should not be surprising. After all, which developer doesn't like writing tools for themselves. The task at hand is furthermore inherently suitable for LLMs:
- The output is text, but in simplified yet precise grammar that crucially allows verification at multiple levels (syntax, logic).
- Context (especially if working in a monorepo) is mostly there in your repository.
- LLMs have been trained on the internet, which might be characterized as a gigantic pile of web code with a sprinkling of content on top.
- Developers are inherently used to communicating very precisely in a way machines understand.
- Experienced developers are pragmatic craftsmen rather than Artistes.
- They are used to rapid changes in aspects of their work and the relearning required.[10]
There are other fields I have some basic familiarity with that share some of these characteristics, e.g. accounting and legal work - but none shares this confluence of conveniences driving towards adoption.
In the field of AI empowered headhunting, where TalSource is active, there are aspects that make this trickier. Even so, working daily with the tools in an area where LLMs have begun to deliver on the hype - where the contours of how AI will be useful in a practical day-to-day way are slowly emerging from the fog of irrational exuberance - is a rich well of ideas for our product.
PS: In an age of AI slop, the above was written by a human hand. I'm proud of the fact that I can still (at least on a good day; reasonable people may disagree) write better than an LLM. I did use AI to review drafts, find errors, make suggestions, work with me as an editor.
I think that is the way it should be. AI made the product better, but I made the product. AI is a tool. Homo sapiens is a social tool user. As long as we don't endeavor to build a gigantic footgun with our fanciest new tool - we ought to be fine. We may even live long and prosper.
Footnotes
- I waited two months after the release before publishing, to be sure I can safely make that statement.
- The combination of those two aspects of my personality has naturally resulted in a lifelong attraction to the startup world.
- In the meantime, we have largely built the village behind the facade.
- XKCD has a brilliant quip on this: "It can be hard to explain the difference between the easy and the virtually impossible."
- Of course, the frontend can never be quite as pristine as the backend.
- It was also a chance to throw overboard features that should not have been built in the first place.
- At least the story of an author's own foolishness can still be useful when repeating the same error in a different flavor.
- I find LLMs can be a good way to get through the equivalent of writer's block for developers: quickly away from a blank screen and into the problem.
- At least formed on reasonable opinions from experience rather than supposition or dogma.
- Consider the unreasonable enthusiasm continuously granted to new frameworks and languages.