[LLMs] How much should an LLM do for you?

Participants in a coding workshop - several people sitting around a table with laptops — Participants in a coding workshop

I’ve started experimenting with using LLMs to help me build software. One of the first things I did was to watch the video of Llewellyn Falco’s talk about using process files as blueprints for Agentic AI, from Craft Conference 2025. I learnt a lot from this, and it was a great starting point to get me going with LLMs.

In that video, Llewellyn does over an hour of live augmented coding on stage. And one of the first things I noticed was that he uses the LLM for almost everything. I was about to list the things he uses it for, but then I thought instead, I’d take you through one of my first LLM coding sessions, and how I deliberately tried to be more Llewellyn.

Do LLMs save you time?

One of the things I’ve noticed is how much people argue over whether LLMs can genuinely save them time and effort when coding. My goal is to have the machine do the boring fiddly stuff for me, so I can focus on the more interesting stuff.

“Time and effort when coding” are notoriously difficult to measure, but one of the things I do consistently already (because I’m me) is time everything I do (spreadsheets ftw). So one of the things I’ve been doing is deliberately experimenting with different approaches and timing the results. Time is one metric I’m using to track any benefit I’m getting from the LLMs.

screenshot of a section of a spreadsheet showing date, day, description and time for a bunch of AI coding related activities — my timesheet spreadsheet

In one of those first coding sessions, I was experimenting with different approaches to a coding task. I chose the Fizzbuzz kata. Using LLMs for katas is kind of pointless (I agree with Emily Bache on this), but my goal was not to practice the kata. My goal was to use the LLM in different ways and measure the results, to see whether and how it could make coding easier and more fun. ## Be More Llewellyn

So. As well as using a standard Test Driven Development (TDD) approach (which I’ll write about separately), I also asked it to write the code first and then write the tests. I’m not generally a fan of writing code first and tests second, but I was deliberately comparing and measuring different approaches. And here’s where I had to keep reminding myself to be more Llewellyn, in the following ways:

1. I asked the LLM to write code to implement FizzBuzz. That is, I asked it to write code that would output the numbers from 1 to 100, but substituting “Fizz” for multiples of three, “Buzz” for multiple of five, and “FizzBuzz” for multiples of fifteen (ie both five and three).

Temptation: To describe the FizzBuzz challenge to the LLM, manually, myself.
Danger: Nope. Too time-consuming. Unnecessary effort.
Solution: Get the LLM to go find its own FizzBuzz specification. Ask it to implement it.

Screenshot of conversation with LLM where it finds and summarises a description of the Fizzbuzz kata

2. The LLM found the rules of FizzBuzz and generated the requested code, very quickly (within a minute - although see here for the difficulties in getting claude.ai to help me measure the speed of its responses).

Temptation: To look at the code Claude generated for me.
Danger: Nope. I’ve never been quick at reviewing code. It’ll take me a while to get my head into it, and then I’ll get distracted by trying to understand and critique every tiny detail.
Solution: Just run the code! Does it work?

3. The code worked. But did it really? Had I missed anything?

Temptation: To look at the code.
Danger: Nope. Same problem still applies. Too slow.
Solution: Get the LLM to write tests to prove it works.

Screenshot of me asking the LLM "Can you write me sone tests so I can tell whether this code is really working"

4. Claude wrote some tests for me, again very quickly. I saw it chuntering away in the output, describing a raft of tests which clearly contained redundancy and duplication.

Temptation: To look at the test code. I’m already pretty confident I won’t like it.
Danger: I’ll have to work out which tests are redundant, and manually make sense of the code to work out whether it does what I want it to.
Solution: Just run the tests! See what happens!

5. I ran the tests. They failed! I started to look at the test code, and saw a lot of comments, stream of consciousness type stuff. I was getting a hint of why they might be failing…

Temptation: Surely now it’s time to dig into the code? Work out exactly why these tests are failing?
Danger: Still slow.
Solution: Ask Claude to work out why the tests are failing.

Screenshot of Claude conversation: "test_larger_numbers is failing. Why is that? Looking at the test_larger_numbers test, I can see the issue! There's an error in my test logic. Let me trace through it:"

Screenshot of Claude conversation: "100 ÷ 15 = 6.67... (so 100 is NOT divisible by both 3 and 5) 100 ÷ 5 = 20 (so 100 IS divisible by 5) 100 ÷ 3 = 33.33... (so 100 is NOT divisible by 3) So 100 should indeed print "Buzz", which means my test expectation is correct. But let me check if there might be other issues in the test. Actually, let me run through the specific assertions that are failing. The problem might be with these lines: pythonself.assertEqual(output[44], "FizzBuzz") # 45 (divisible by both 3 and 5) self.assertEqual(output[59], "FizzBuzz") # 60 (divisible by both 3 and 5) self.assertEqual(output[74], "FizzBuzz") # 75 (divisible by both 3 and 5) self.assertEqual(output[89], "FizzBuzz") # 90 (divisible by both 3 and 5)"

6. The thing is, to Claude, code is just a series of text-based tokens. It's not something that can be logically reasoned about like I would. So Claude completely misses the actual problem. Two contradictory tests, one of which had been commented on but not removed:

Screenshot of the following code from the tests: "self.assertEqual(output[99], "FizzBuzz") # 100 (divisible by both 3 and 5? No, just 5) # Actually, let me correct that - 100 is divisible by 5 but not 3 self.assertEqual(output[99], "Buzz") # 100 (divisible by 5 only)"

Temptation: Ask Claude to try again to spot the problem.

Danger: Claude will continue to miss it, because Claude is not clever in the way you might think. Note that in other LLM coding sessions, this has been a real flashpoint for me. Getting sucked down a rabbit-hole is always a danger when building software, but it’s particularly easy to get pulled into the death-loop of “we’re almost there, let’s try one more thing” when working with LLMs. Even though the cost is small, the sunk cost fallacy still applies. Kent Beck has interesting things to say about it in this podcast conversation with Jessica Kerr (which I’ll write more about another time).

Solution: Finally, now it’s time for me to read and understand the code.

The Jagged Frontier

Ethan Mollick, author of Co-Intelligence (which I recommend) talks about the jagged frontier. LLMs are great at some things and terrible at others, and the line between the two is jagged and constantly changing. You have to keep pushing and experimenting and working out what they’re good at and what they’re really not good at. As well as how to give them what they need to exploit their strengths and mitigate their weaknesses (process files are great for this - more on this in another post).

In Llewellyn Falco’s coding demo, there are several other examples of asking the AI to do something rather than do it himself - most notably in the creation and refinement of process files. There are many things we’re used to doing ourselves, but when we’re working with LLMs it can be unnecessary and slow us down.

At this point in the Fizzbuzz example, I’m not done. The code isn’t great and the tests are problematic. Getting the LLM to do as much as possible is only one useful principle when working with these tools. There are many others. But it’s a useful example of a broader principle, which is:

Don’t take anything for granted

What’s the LLM doing for you? Why? What’s the benefit?
Is this really helping you, or has it started to hinder you without you even noticing?

Speed isn’t everything in software development. Far from it. Often you want to deliberately slow down and take your time. But if the LLM can do things fast and you don’t need - at this point - to do those things yourself, don’t. Let the LLM do its thing.

Come on this journey with me

Clare (me) and Emily Bache smiling during a workshop

I plan to keep writing about this. I’ve already got a raft of draft posts in pocket. I love to learn, and I love to teach (and I’m really bloody good at it). I use teaching as a way of deepening my own knowledge and pushing me to learn things more effectively.

If you want to know more, you can do the following:

Read some advice on getting started with LLM-augmented coding
Subscribe to this newsletter
Sign up for a workshop in building robust maintainable software using genAI:
- 25th Nov 2025: A one-day workshop at Software Architecture Gathering in Berlin
Keep an eye on my socials (LinkedIn, BlueSky, Mastodon)
- …and my events page…
- …for new workshops and training sessions
Read the testimonials
- …about the above workshops and others

🔗 Want to share this article or save it for later? Here’s a handy link for you!

Clare Sudbery

[LLMs] How much should an LLM do for you?

Do LLMs save you time?

The Jagged Frontier

Don’t take anything for granted

Come on this journey with me

[LLMs] How does agentic AI affect the next generation of coders?

[LLMs] How can I make a quick thing?

Retreat week!