LLM Backend Development Reflection

Originally written for Dr. Martin Kellogg's CS 485: AI-Assisted Software Engineering

Essay Prompt:
Do you feel more or less confident in backend code compared to frontend code generated by LLMs?
How do you test what you can't see?

I feel measurably less confident when generating backend code with an LLM compared to frontend code. I'm hesitant to say that I feel confident in LLM-generated code at all. Not because I don't think LLMs are capable (I think we're well beyond that discussion), and definitely not because I think I could write better code, but because I simply don't know enough. Not only am I not an experienced software engineer, technically I am not even a software engineer. I'm still a student, so other than someone with absolutely zero experience, I'm about as low on the totem pole as you can get.

More important than my own inexperienced observations: can even the most experienced software engineer skim the thousands of lines of code generated by an LLM and say, "Yep, that's good code!"? They can probably get a feel for it, and if it's really bad code, they'll be able to spot it. But if you have ever been put onto a project you're not familiar with—especially one that uses a language or framework you're not familiar with—you know how difficult the onboarding process can be. It can be days, weeks, or even month, depending on the size of the codebase, before you feel comfortable saying you understand how all the intricate bits and pieces work with one another.

For my group's project, this is especially true. As of the time of writing, our main branch has an estimated 20,678 lines of code¹. This is an absurd amount of code that our small team of students has managed to produce over the course of just 25 days in awfully sporadic, unfocused bursts of effort. Not to mention, we have written none of it—not a single line. So if you asked me about the technical details of this project, frankly, I would have to ask Claude to give you an answer. I truly do not know how our application works. If you took away the LLM and asked me to implement a feature, my ability to navigate the codebase would be the same as if you had thrown me into an entirely different project.

When working on the frontend, we could open up the application and test things without much difficulty:

I've asked the model to create a button.
Did it create the button?
It did.
Great!

Backend development is a completely different ball game, regardless of whether or not you're using an LLM. It involves a high-level, often abstract understanding of how data will be stored, how it will get from point A to point B, how users will interact with the system, how the frontend and the backend communicate, how the backend relies on external APIs, and so much more. The early stages of backend development are particularly abstract. Database schemas and API endpoints seem to make sense, but without careful, thoughtful analysis and rigorous testing, there's no way to know for sure. Testing becomes the obvious backbone of this system. Just as mocking was there to support our frontend development, testing provides the foundation of our backend development. This is true of any project. How else can we be sure that our schemas are correct and our endpoints work as intended? Working with an LLM makes testing significantly more important. This means our project should have unit testing for each file and each function to ensure expected outputs, integration testing to make sure the modules work together as desired, and acceptance testing to ensure that the program meets the original specifications.

That sounds like a lot of work, though... let's just have the LLM write those up 😁.

Footnote

¹This number was previously incorrectly cited as ~4.25 million. That figure was calculated using cloc excluding the llm-logs/* directory and any markdown files. However, I failed to realize it had also included node_modules and other dependencies.