experiments with claude, part ⅴ: ClaudeLog
Originally, this was going to be the last in my series of stuff I did with Claude that I found compelling, but… the news, good or bad, is that I’ll be posting at least one more soon. This one, though, is definitely the piece of work that convinced me that Claude was useful for more than mechanical transformation.
Project Five: ClaudeLog
In some of my previous posts, I posted links to transcripts of my chats with Claude, including its internal monologue, its tools used, and other stuff that you won’t see if you just copy text out of the viewer. Those transcripts were generated by a tool that I built with Claude, which I’m going to talk about, now.
I think that the experience of using Claude in its current form (as opposed to what we saw from coding agents a year ago) is fairly stunning. It sometimes screws up, it sometimes makes errors, it sometimes takes four passes at doing things that any normal person would routinely do in one… but on the other hand, it can generate large, complex, readable, correct pieces of software from a standing start. Saying “it’s real good, folks” is not sufficient. I think that to understand what it can do, you should see for yourself. This is not to say that there are no reasons to hesitate or to worry about the serious impact that LLMs are having, and will continue to have. But one criticism I continue to see is “these things spew out unmaintainable nonsense”, which is not claim that really stands up to much real use.
Also, one friend recently said to me, “I want to be a programmer, not a copy editor.” I had to explain that while, yes, you do need to read and think about possible errors in agent-generated code, the experience is much more one of design and construct than of proof reading.
Since not everybody is going to say, “Hey, Rik, let’s pair and look at Claude,” and because I am not going to pair with every rando who might ask, I thought it would be good to provide a thorough transcript. I knew that Claude keeps a detailed log of its sessions (even though, amusingly, Claude claimed it knew of no such thing).
I had looked through the JSONL files in which sessions were stored, and the data looked a bit messy, but probably sufficient. Without giving it too much investigation, I opened up Claude Code and said…
I want to provide transcripts of my interactions with Claude to colleagues who are learning to use the system. I understand that my ~/.claude directory contains transcripts of the chat sessions. Please build a TUI application that:
- lets me drill down through projects to individual sessions, showing previews of the sessions as I go
- lets me select a session to be transcribed for my use
- produces a Markdown summary of the session that I can embed in a document
While your general instructions say to prefer Perl, in this case I would like you to use whatever language you believe will achieve the results most successfully with a pleasant terminal interface.
You can, of course, read the transcription of this whole conversation, produced by the program that the conversation eventually led to. There’s a lot of experimentation, a fair bit of stumbling, and overall you can see how I discover what it is I want while I’m building it. This is normal development, but…
In normal development with such a significant “discovery” aspect, it’s very common to spend a lot of time upshifting and downshifting. That is, first I imagine that general structure of what I want. I write a pseudo-code outline of the high-level code. Then I begin converting individual pieces into real code. Almost continuously, I’m shifting between design thinking and implementation. These are (for me, and I suspect for others) distinct ways of thinking, and while it’s not “context switching”, there is, I think, an analogous overhead.
Using Claude, I am really focusing on one of those angles at a time. I started with “here is a very rough outline” and within 20 minutes, I had a working program. I never, ever had to switch into implementation thinking, to get there. Then I had many tight, quick exchanges in the form, “This is wrong, change it” or “I’m ready to describe the next feature”.
At the top of the transcript, you’ll see this line:
Duration: 7h 50m (3h active)
This means that from the first to last event in the transcript, about eight hours passed on the clock, but I was only engaged in the chat for about three. Probably I took a long lunch in there, or maybe worked on something more pressing for a while. Or I just stopped and thought about it, or spent time reading transcripts and thinking about what could be better.
By the end of the day, I had a really useful program. The HTML it was generating was definitely fit for purpose. On the other hand, I made the mistake of looking at the code…
Look, it wasn’t absolutely impenetrable. It just felt very, very amateurish. It got the job done, and I could read it, but there was very little abstraction. There was nearly no separation of concerns. There were no real “layers” to speak of. And when layers did exist, they often duplicated work.
Hardly surprising: if you imagine Clade as often analogous to a junior programmer (not a great analogy, but often useful), and you imagine me as the lousy boss who kept saying, “Implement this feature immediately and I don’t care about code quality”, of course there was a ton of debt. And of course it racked up fast, because I was racking up features fast. The question I needed to answer was Will Claude’s code always be unmaintainable after a short while?
I decided to answer that the next day. Instead of adding more features, I’d just look at fixing the code. It was already clear to me how the code should’ve been structured. I just didn’t tell Claude about it, because I was focused on features. Why didn’t Claude architect it well from the beginning? Because nobody asked it to. This seems sort of stupid, but it’s often how this stuff works.
So, the next day, I started a new session. There were two key parts of this:
I am very happy with the output of this program. I would now like to begin some serious refactoring for maintainability. The first step should be fairly simple. I would like you to split out the code that takes a session JSONL file and emits HTML. Put that in its own module, so that the “build HTML from JSONL” can be called without the session browser being involved.
…and…
Next, I would like to separate out the generation of HTML from the interpretation of the JSONL file. I propose the following:
- a SessionFile object that represents the log itself, and is created with a static method like
session_file = SessionFile.from_path(filepath)- a SessionLog object that represents a sequence of SessionTurn
- a SessionFile has a SessionLog, which is computed lazily when requested
- a SessionTurn is either a UserTurn, a ClaudeTurn, a SystemTurn, or an IdleTurn
- non-IdleTurn turns have a sequence of TurnEvents, which are things like the chat, the tool use, the thinking, and so on
- the HTML generator is passed an already-computed SessionLog, which it then just formats, rather than formatting and event-aggregating all at once
This is a significant change. Make a plan, think hard, and ask me any questions you need to ask up front.
There’s a bunch of other stuff that went on, but these are the big ones. You can read the whole refactoring transcript.
Claude thought for a while, then asked me some (very reasonable) questions. I answered them, and Claude produced a plan. Pretty soon, it all worked. The program still did just what it did before, but now the code made sense, and it was much easier to add some new features, because they could be added as small changes to a few layers, instead of changes to the one big ball of mud.
Part of what made this easy was that up front I said, “save a copy of input and output now, so that while you refactor, you can keep checking you didn’t change the output”. Claude did this, checking the output repeatedly as it worked.
Again, a lot of other stuff went on, but I think this is an accurate summary.
One way of looking at this as a two-stage process is “Rik should’ve made sure Claude did it right to start”, and that’s an okay position, but it brings back some of that upshifting and downshifting that I mentioned.
Another option is, “Rik should’ve given Claude standing instructions to pay more attention to architecture.” Yes, probably that would be useful, but also then I’d probably be spending more time thinking about its architecture, because it would be talking about its architecture.
I think the best view might be: because fixing the design later is at least sometimes not as costly (in time, not to mention morale), the human is given more freedom to choose the approach that they will find more satisfying or pleasant. In the end, we want the code to be correct, reliable, readable, and maintainable. The path we take to get there isn’t necessarily all that relevant. (But please leave a good git history.)
I’ll publish the code for CladeLog at some point.