Tinkering with a "hands-off" agent

It’s great to write another entry for Java Advent this year! Last year, I wrote about how you can use Timefold Solver, an optimization library written in Java, to solve planning problems such as employee scheduling or, in true holiday spirit, optimizing Santa’s travel route.

This year, as the world is now all going “agentic”, I was curious to learn more about agents and decided to start tinkering myself. I started a pet project with a basic idea:

“Could I generate a fully working optimization application purely from a simple problem description?”

Here’s where that curiosity led.

Basics: Framework

I like to approach pet projects in a way that they incorporate some ideas/frameworks/concepts I’ve played with before, while still challenging me to learn some new things.

There was no doubt in my mind about using Java, so I started looking for Java compatible agentic frameworks. I ran into Langchain4J Agentic, the Agent Development Kit and a few others. Since I had the most experience with Langchain4J already, that became my framework of choice.

I scoured through the docs to get a sense of what building an agent was like and then just got to work. I just needed 1 more thing: a benchmark / test scenario.

Let’s Sing!

I was looking for an example scenario. Something I would love to optimize that was also in my typical fashion:

Silly (can’t be too serious when messing around with technology).
Relatively simple (so everyone could understand it).
Somewhat practical.

After talking to Wouter Bauweraerts from the Belgian Java Community (hi Wouter 👋), I settled on a karaoke bar scheduler. Yes. Karaoke.

Here’s the prompt I wanted the agent to digest:

I run a Karaoke bar. I have 2 stages and I want to schedule songs following these rules:

– Avoid having the same singer 2 times in a row
– Avoid having songs from the same artist 2 times in a row

A singer can sign up with a song. The duration of the performance depends on the lenght of the song.
I want to be able to create a clear clock of when the next song will start, so include timing.

Time to get to work on the agent!

(Note: All example prompts shown here are abbreviated. The real ones are as long as a 10 year old’s wishlist for Santa.)

Agent v0.1: Just an LLM prompt in disguise

The first version of my agent was pretty simple. Just take the input prompt, add a bit of information in the @SystemMessage and @UserMessage to steer the result a bit and hope for the best. After all, with all the hype, this was surely supposed to work? I even scoped it to only write the Timefold related code, not bothering with any sort of user interface.

https://gist.github.com/TomCools/c227ab745e1ae5a9c6f145cfe7e680e1

Then I needed to choose the underlying LLM model. Langchain4j makes it trivial to change the LLM the Agent uses. As I was still in the exploratory phase, I decided to use a locally hosted LLM, qwen3:8b, which I run on my macbook using Ollama.

https://gist.github.com/TomCools/625c5c2c88b3e6a046f10bc3fd4dd20a

As I was patiently waiting for my brand new agent to give its very first result, I was already dreaming of what awesome result would come out of that writeCode method. As some of you might expect… it’s not that simple, unless the end result you wanted is a dumpster fire of mangled code.

https://gist.github.com/TomCools/c424652a075624b879c57e61174b7536

Hey, at least it did output something! So let’s see how we can improve this.

Agent v0.2: Limiting scope even more

I felt I was asking too much from my single agent (or maybe this is just my human bias talking), so I decided to split it up into 2 separate coding agents: 1 for the Java domain code, 1 for the Java constraint stream code.

Constraint Streams is the syntax you use with Timefold to write constraints, such as the “Avoid having the same singer 2 times in a row” constraint.

The result was slightly better. The agents were more narrowly focussed and seemed able to keep it together. The output however is still nowhere near being workable. It was at least starting to look a bit more like workable Java code.

Agent v0.3: Better base model

Agents are only as good as their underlying model. Given that I’ve had more success with Claude and ChatGPT when asking similar coding questions, I felt it was time to replace the model used by my agent to Claude Sonnet 4.5 (20250929).

With this change in model I now have to start paying for my tokens. I started proceeding a bit more carefully and sometimes moved back to Ollama just for basic runs. Langchain4j makes this a 1 line code change 💗.

Changing the model immediately brought my agent to life. The returned code at least looked like it was written by someone who had seen Java before. However, copy pasting the code into my IDE just showed how glaringly wrong it actually was. It’s the sort of code that would never pass a code-review… oh wait! 💡

Agent v0.4: Review and Feedback

I had seen a subject header on the agentic langchain documentation about sub-agents and agentic workflows. So I decided to try it. Instead of a single “in and out” workflow, I added a Review agent, which reviews the written code, provides feedback and then puts the coding agent to work again with that feedback.

*Agentic coder and code reviewer work in a loop. Instead of continuing after X reviews, you could also let the reviewer **score** the solution and continue once a certain score has been reached.*

The cool thing is that we can actually give these agents tools to work with… and what better tool to give an agent which needs to write compilable code than an actual Java Compiler.

https://gist.github.com/TomCools/72342a51c40b92b5a5dcd4ad81b02f1b

The resulting code was pretty ok and usually compiled just fine. But it did make some very obvious mistakes from a Timefold perspective (e.g. adding both @ShadowVariable and @PlanningVariable to a single field).

Agent v0.5: Avoiding the same mistakes (easyRAG)

To avoid these similar mistakes, I tried to give the agents a bit more information to work with by using RAG (Retrieval Augmented Generation). As I just wanted a simple solution here, I used the easyRAG features of Langchain4j. This allows me to just point to a directory which had some documentation included so the agent would be able to use the contained documents.

https://gist.github.com/TomCools/1159f3c86fc8f91ec62e5da3fc4be63e

We have a lot of documentation for Timefold Solver, generated to pdf, it’s a good 400 pages. Not all of that is relevant for every agent, so I added some simplified documentation in different subdirectories for each of agents and then added those to the agent with a ContentRetriever.

I ran it a couple of times to see the impact these documents had. Whenever a new mistake seemed to become prevalent, I added more information to the relevant documents to improve the result.

This also led me to make a couple updates to our actual documentation. Turns out that if an LLM can’t understand your documentation humans will struggle with it as well.

At this point, it was getting slightly painful to see my API credits decline. With the subagents in a loop, this was burning through a lot of tokens, so I tried to figure out: how can I get a better result, without burning through so many tokens?

Agent v0.5: (Plant)UML to the rescue!

Upon inspection, I noticed a lot of the comments by the review agents were not about code details at all, but about the structure of the classes and the placement of the annotations Timefold Solver needs. When solving these problems myself, these are all things I’d add to a diagram before even opening my editor… so why not do the same here?

Instead of skipping directly to the code, I introduced 1 new agent. This “modelling” agent would “model” how the solution should look and format it in PlantUML. I gave this agent the documentation we have for modelling problems with Timefold Solver and explicitly asked it to add some details about design choices it made to the PlantUML diagram.

*Generated Diagram, notes contain the design decisions*. *It did a pretty decent job here!*

Now I not only made “thinking about the structure of our solution” way more explicit, I also have a reviewable asset that is passed between agents: the PlantUML diagram. As it turns out, PlantUML is an excellent format if you want to pass information about the structure of your classes between agents without giving it the entire codebase.

I adjusted the coding agents to accept the PlantUML diagram as input, so they did not get overloaded with residual information about the problem statement.

This massively improved consistency and reduced hallucinations. It now worked well enough that I ventured into the much harder part.

Agent v0.6: UI and Backend

Having a nice scheduling core is nice, hosting a Karaoke with hardcoded Java will probably not be very conductive for a great evening. 🙂 So we still need a UI.

I don’t think that having GenAI scaffold an entire project is a good move here, so I looked into tools that would help me create a project structure.

In the Java world, one potential solution is to rely on JHipster and more specifically the JHipster Domain Language (JDL) . JDL allows you to describe a project: entities, relationships and some other properties.

https://gist.github.com/TomCools/bdda56e9826fcf37779a781fda05c1e5

These are all things we can let an LLM fill in, based on the PlantUML and the problem description. Then you can create a full application with a single command (jhipster import-jdl model.jdl), which I wrapped into a Tool so I can have an agent create the JDL and generate a simple application for my problem.

And now we have a fully working application (Angular + Spring Boot), including login screen, metrics and CRUD pages for every entity involved entity, all at a fraction of the token cost we’d have if we let GenAI do everything.

Conclusion

While I still have to resolve some gaps, like actually connecting Timefold Solver to the rest of the JHipster generated code and creating a better scheduling UI, I think I’ll stop this pet project here for now. For me, the real value of these tinkering projects isn’t what I end up building. It’s everything I learn while fumbling toward it.

Here are my main lessons.

Having inspectable intermediate formats makes the development process much easier and the end result way more auditable.
At the moment of writing, it’s still much better to have a human in the loop. Even tiny mistakes made in one of the first steps can lead to a failed result in the end. Being able to intervene and correct makes the whole thing so much easier than this “fully autonomous” thing I was trying.
Better models = better results. We may not want to spend the money, but it did change my results from absolute garbage to something that quite often works. I was very happy with Langchain4J in this regard, I only needed to change 1 line of code to try a different model.
Learning is still so reinvigorating! It’s been a while since I’ve felt a big jolt of love for a pet project but this one struck like a lightning bolt. Yes, it can get annoying sometimes to get GenAI to do what you want, but as a dad raising a small 2 year old kid, I have been trained to be patient and steer it in the right direction.

Happy Holidays! 🙂

Author: Tom Cools

Developer Relations Engineer for Timefold, Java Champion and leader of the Belgian Java User Group. Tom has a decade worth of experience delivering systems and loves to share not only knowledge but also passion for our craft. You can read more at his blog (http://www.tomcools.be) or follow him on Bluesky (@tomcools.be).

Github

Tinkering with a “hands-off” agent

Basics: Framework

Let’s Sing!

Agent v0.1: Just an LLM prompt in disguise

Agent v0.2: Limiting scope even more

Agent v0.3: Better base model

Agent v0.4: Review and Feedback

Agent v0.5: Avoiding the same mistakes (easyRAG)

Agent v0.5: (Plant)UML to the rescue!

Agent v0.6: UI and Backend

Conclusion

Author: Tom Cools

Like this:

Related

Leave a ReplyCancel reply

Tinkering with a “hands-off” agent

Basics: Framework

Let’s Sing!

Agent v0.1: Just an LLM prompt in disguise

Agent v0.2: Limiting scope even more

Agent v0.3: Better base model

Agent v0.4: Review and Feedback

Agent v0.5: Avoiding the same mistakes (easyRAG)

Agent v0.5: (Plant)UML to the rescue!

Agent v0.6: UI and Backend

Conclusion

Author: Tom Cools

Share this:

Like this:

Related

Leave a ReplyCancel reply

Santa’s Python Pitfalls: A Java Developer’s Guide to Staying Safe This Christmas

Delegating Java tasks to Supervised AI Dev Pipelines

Out with the Old, In with the New: A Guide to Application Upkeep