return ...

Agentic AI Workflows for OpenJDK Development

OpenJDK is a large and complex codebase and navigating it efficiently takes years of experience. This post is about how I’ve been using agentic AI workflows to move faster in that environment, what has worked for me, and where I as a human still provide the most value.

At the time of writing (2026-05-23), work in the OpenJDK community should follow the OpenJDK Interim Policy on Generative AI. A short summary of the policy is:

Contributions in the OpenJDK Community must not include content generated, in part or in full, by large language models, diffusion models, or similar deep-learning systems.

Contributors in the OpenJDK Community may use generative AI tools privately to help comprehend, debug, and review OpenJDK code and other content, and to do research related to OpenJDK Projects, so long as they do not contribute content generated by such tools.

Agentic Workflows

In my work I use git worktrees to separate larger features that I am developing. This is a great way to isolate JDK builds and source code changes from each other. It is also a great way to isolate agents from each other, which is why many of the skills I create are centered around worktrees. Without an isolation layer like this, I’ve seen agents interfering with each other by changing nearby files which alters assumptions and can lead to wrong conclusions.

Investigations (bugs, review, understanding)

One of the more interesting use-cases of AI has been in approaching large changes, or large systems in general, using multiple agents to investigate different leads in parallel. This has been very helpful in my work on larger code-changes and ongoing projects in OpenJDK.

To achieve this I have a set of skills and MCP server(s) that I use in agentic workflows. These skills are most powerful when used in combination, but they are all designed to be used independently as well, for use in ad-hoc scenarios. Below is an overview of the agentic workflow that the skills and MCP servers fit into:

A flowchart of the agentic workflow showcasing the main overarching parts: parallel agents, serialized build-queue, markdown handoffs and structured/unstructured tools

  1. jdk-worktree-build

    This skill details how to configure and build the JDK, with specific instructions on what to name configurations, when to use different configurations/builds for specific purposes, when to reconfigure, and much more. Realistically, much of this skill can be inherited from the doc/building.md documentation of OpenJDK, but right now it contains much knowledge that I’ve gathered from manually working on OpenJDK for nearly two years.

    Since I’m using worktrees in my “non-agentic” work as well, I can reuse this skill when hooking the AI into other work, which is nice.

  2. jdk-build-queue (MCP server and skill)

    Building/compiling the JDK is a task which requires a significant amount of resources on the system. This means I only want one build to be compiled at a time on my system. To serialize builds among agents, I have an MCP server that is a lock and queue combination, where agents request to start a build, or place themselves on a queue to build next. When a build is finished, the agent releases the lock and signals the next agent that it is their turn.

    The skill of the same name (jdk-build-queue) contains instructions of when to use the MCP server, and when to execute things without serializing using the MCP server. Tasks like running a lightweight test without building anything (make test-only), or just test-compiling a single source-code file, can be done without locking. This allows agents to make as much progress as possible in parallel, while not hurting each others’ progress too much.

    I’ve designed the MCP server to only be a wrapper around a lock and queue. Actual builds are handled by the agents themselves, using the jdk-worktree-build skill.

    I went with a separated approach for jdk-worktree-build and jdk-build-queue since skills are more flexible to adjust and read, which is good since I’m still figuring out what a good approach looks like. MCP servers are likely more deterministic, but are also less flexible. Neither a skill or MCP server provide good protection for agent hallucination, since agents can interpret skills non-deterministically, or choose not to use an MCP server at all. I think the optimal workflow here would include a functional way that makes builds only possible when holding a build lock, which is a complex task, but maybe worth investigating in the future.

  3. jdk-lsp-clangd

    This skill, together with an MCP server of the same name, exposes the language server protocol client clangd to agents. This functionality is built into harnesses like Opencode and Claude Code, but not to Codex. There’s an open issue proposing this to be added to Codex at github.com/openai/codex/issues/8745.

    The benefits of exposing clangd to agents is that agents can work with code in a much more structured way, as opposed to only grepping through the OpenJDK codebase using grep/ripgrep. Clangd exposes capabilities like listing incoming calls to functions, getting all references of a variable, and showing type hierarchies, among others.

    Since clangd requires a compilation database (like a compile_commands.json file), this skill+MCP server combination hooks into the OpenJDK way to set one up using the make target make compile-commands, and it’s default location.

  4. jdk-tree-sitter

    Just like exposing a language server to agents, tree-sitter allows agents to have a more structured way to look at code, now instead by looking at code files through the lens of a concrete syntax tree (CST). This skill doesn’t strictly speaking do anything specific for OpenJDK, so could really be something general, but it does have extra functionality to set up language databases in case they are not installed on the system.

  5. jdk-investigation-agent

    This is the “orchestrator” skill, detailing the steps in the agentic workflow that combines the rest of the skills together into a coherent workflow. It details how to start an investigation by figuring out what areas to look in (“throwing a wide net”), and then hand off work to a number of agents that will create their own worktrees to work from. The agent(s) will then investigate the source code, maybe spawn more agents that try to create a reproducer if a concrete lead is found, maybe instrument the code to aid investigation, and maybe compile the JDK if it needs to, using the jdk-build-queue skill.

    Since we’re prohibited from contributing AI-generated content to OpenJDK, this skill also tells the agent that it should not include source-code changes in its findings, only to explain findings in prose.

Markdown Handoffs

When agents are finished with their investigation, they create markdown files with a report of issues they’ve found. These reports are just as much for me to read and understand what has been found, but also as a starting point if I need to investigate further using an AI. They can also be plugged in to separate visualization tools and indexing databases, which I’ve experimented with a bit.

The handoffs also help manage context (see context engineering), by compacting the agent’s findings into a summarized document.

Agentic Workflows Conclusion

In informal A/B testing, exposing structured tools like clangd and tree-sitter generally reduced investigation time and their token usage, likely because it reduces the need for repeated file searches and exploration using grep. It’s difficult to draw any precise conclusions from this since agent behavior is non-deterministic and workflows vary between runs, but the effect is noticeable enough that I now have it available in all my agentic workflows.

When investigating several bugs and enhancements using this workflow, I’ve often found that agents place extra weight on language-code asymmetries rather than behavioral correctness. It often sides with language in stating what’s wrong, like “TODO”-comments being high priority, or that a documentation for an API is always more correct than the code implementation, even though the code could be much more reasonable in some scenarios. My best guess is that this happens because agents show bias towards written language, since that probably makes up the majority of their training data.

Agents are undoubtedly good at finding inconsistencies, but less so of knowing what a “correct” approach for a given situation is. It doesn’t have a good sense of what “matters” when taking an entire system into account. I find this is something you get a good grasp for over time through experience, which is one specific area where I as a “human-in-the-loop” provide great value.

In more localized settings, the agents have shown great value in narrowing down and finding/creating reproducers, small Java-programs that can trigger a certain bug or behavior. Many issues are intermittent or only happen under very specific circumstances, so having a reliable reproducer makes implementing a fix a whole lot easier.

Debugging

I’ve used AI in several ways to debug issues in OpenJDK so far, mainly using MCP servers to interact with a debugger.

The first MCP server I tried out was LLDB’s built-in MCP server, which was added in the XCode 26 release. The built-in MCP server works by starting a debugging session on your own, then starting an MCP server inside of it, and hooking that into a local agent. This has been useful when messing around inside lldb and I want to hook an agent into it to ask questions and help me in an already started debugging session. However, this falls short when you want the agent to start a debugging session on its own, since MCP servers need to be “running” before the harness (like Codex) is started.

To get around this, I’ve both created my own MCP servers and tried several from GitHub, that work like “wrappers” around a debugger’s functionality. These work very similarly for tools like GDB, RR and LLDB, by providing “tools” for starting a session, running various commands like next, step, setting a breakpoint, reverse-executing and so on. What’s tricky to get right in a tool like this is not allowing the agent to “break the sandbox”, by executing arbitrary commands through the debugger MCP interface.

Using MCP servers for debugging is a powerful way to have an agent help you investigate an issue, and installing MCP servers is quite easy. I’ve found that it is hard to nudge the agent to investigate things in the right direction, and to hinder it from drawing bad conclusions. Giving the agent the right instructions/context to a problem is, to me, as much of an art as debugging by hand, and is another area where human intuition still provides significant value.

Restricting Execution

A non-trivial amount of my work is investigating performance regressions, where I’ve attempted to use AI in a few scenarios. So far it has been really helpful in finding ways to isolate variables and root-causes by testing changes iteratively.

In one regression I tackled an issue with Transparent Huge Pages on Linux (THP), which can be tuned via the files listed below. Changing the values in these files requires sudo privileges, which I don’t want to provide the agent blanket access to.

/sys/kernel/mm/transparent_hugepage/enabled
/sys/kernel/mm/transparent_hugepage/shmem_enabled

To get around this, I created a wrapper that allows setting the value of these files to either always or never, not anything else. Then I changed the owner of the wrapper to the root user so that an agent can’t change the contents of it, and added the wrapper to the sudoers file so that it can be run without sudo.

Either using a skill, or telling an agent directly, I inform it that this wrapper exists and to use it for toggling the THP mode(s). This way I can have the agent change modes without prompting me for permission or using sudo commands.

This is one approach to allow only specific actions to agents with a fairly minimal setup. You could achieve similar restriction with an MCP server, but that requires a bit more work.

Closing Thoughts

The bottleneck in my work, and in many open source projects I believe, is not finding things to work on (which AI can definitely help with), but the deeply human work of navigating a project. Knowing when something is ready to propose, how to frame it given what’s already in-flight and planned for the future, and discussions with reviewers, are often more complex than the code changes themselves.

I’ve been able to get the most out of using AI and agents in areas which I am already very competent in, where I can provide a solid background and context. Pairing this with my experience in contributing to several large-scale OpenJDK projects, I’ve been able to provide additional value in multiple areas. But, if nothing else, the AI is a really good “bollplank” as the Swedish saying goes, opening up an approachable way to ask “stupid” questions to challenge my assumptions and help me learn new ideas and concepts in less time.

So far in my career I’ve spent a lot of time pattern-matching, trying to find ways to minimize repetition. I’ve found that this skill transfers directly to working with AI, where recognizing repeating patterns and finding streamlined solutions allows me to work much more efficiently with AI than I otherwise would have.

Using worktrees, MCP server(s), and approaches to manage context/memory are things that many developers already know about. I think the more complex task is being inventive and creative in figuring out how to apply methods like these to the specific thing you’re working on, like OpenJDK development in my case.