Delegating Java tasks to Supervised AI Dev Pipelines

Juan Antonio Breña Moral

2 months ago

During the second part of this year, Anysphere, the company behind Cursor IDE, released 2 new products that could help you in 2026 increase the level of automation in your software operations. The names of both products are: Cursor Agent CLI and Cursor Cloud Agents. The article will explain the features that both products share and the unique capabilities of each. Finally, the article will share some insights for creating great supervised AI Dev pipelines.

What is Cursor Agent CLI in a Pipeline context?

In August 2025, Anysphere released Cursor Agent CLI, a new way to interact with frontier models but not coupled with a particular IDE. With this local development approach, the software engineer added a new way to enrich the development experience, but what happens if we use this product in a pipeline? In that case, we will add new capabilities.

Let’s review the following pipeline to understand the concept:

name: Run Cursor Agent on Demand

on:

  workflow_dispatch:

jobs:

  agent-on-demand:

    runs-on: ubuntu-latest
    timeout-minutes: 5

    permissions:

      contents: write

      pull-requests: write

    steps:

      - name: Checkout repository

        uses: actions/checkout@v6

        with:

          token: ${{ secrets.GITHUB_TOKEN }}

          fetch-depth: 0


      - name: Install Cursor CLI

        run: |

          curl https://cursor.com/install -fsS | bash

          echo "$HOME/.cursor/bin" >> $GITHUB_PATH


      - name: Run Cursor Agent

        env:

          CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}

        run: |

          echo "=== User Prompt:===";

         PROMPT="Develop a classic Java class HelloWorld.java program that print Hello World in the console only"

          echo "$PROMPT";

          echo "=== Cursor Agent Execution:===";

          echo "";

          cursor-agent -p "$PROMPT" --model auto


      - name: Create PR with changes

        env:

          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

          PAT_TOKEN: ${{ secrets.PAT_TOKEN }}

          GITHUB_REPOSITORY: ${{ github.repository }}

          GITHUB_ACTOR: ${{ github.actor }}

        run: |

          chmod +x .github/scripts/create-pr.sh

          .github/scripts/create-pr.sh

In a few lines of code, a pipeline is able to execute a task with the help of Frontier models and at the end of the process, submit a PR to be reviewed by the team.

Once you have a clear idea about how to start working with this product, let’s jump to the second product released, Cursor Cloud Agent.

What are Cursor Cloud Agents?

In October 2025, Cursor Cloud Agents was released, and it provides a collection of REST endpoints to handle the service. The different resources are organized into 3 categories:

Agent Management (Launch, Follow up, Stop & Delete)
Agent Information (Status, Conversation & List of Agents)
General Information (Models, Repositories & API keys)

Using this service, you can delegate tasks to frontier models, but all operations run on Cursor cloud infrastructure, not in your pipelines like with Cursor Agent CLI.

As the service provides different REST endpoints, it is important to understand the minimum concepts to orchestrate tasks with them.

Understanding the lifecycle of a Cursor Cloud Agent request

Step 1: Launching a new AI Agent

When a user want to use this service, launch a HTTP POST request to provision a new cloud AI agent, the service will require the following information:

A valid Github/Gitlab repository to operate with permissions
A user prompt with the clear goal to be achieved
An available frontier model to process your user prompt
A Cursor API Key to authenticate the request and validate if the user has permissions to operate with the required Git repository.

Note: In this article we will put focus on Prompts based on Text plain, not images.

Once the User sends the request, the service will return a HTTP response with status code 201 indicating that the request was received and the service will be processed soon, an Agent-ID which is pretty useful to be used with other REST resources to track the progress and an Agent State, in this case, CREATING.

Note: You could track the whole process here in a visual way: https://cursor.com/agents

What happens under the hood?

Once the service receives the request, it will provision an EC2 instance running in AWS region us-east-1 with the following features:

OS: Linux Distro
Cores: 4 cores
Memory: 16GB
Disk: 126GB HDD
Java: Java 21

Inside this Linux container, the service will perform a git checkout operation of the git repository described in the request, and after that, it will start working on the details described in the user prompt.

As you can observe, the request receives a fast response, but the whole process is asynchronous. So how do you track the progress of your user prompt as it works on your repository?

Step 2: What is the status of my AI Agent?

An AI Agent has the following states:

RUNNING
FINISHED
ERROR
CREATING
EXPIRED

If you remember from the first step, the AI Agent returned the state CREATING, and if everything goes well, the current state should now be RUNNING. But how do you know what the real status is? For that purpose, there exists a GET endpoint to receive the status from an Agent ID.

By calling the status endpoint periodically, the user/process can know when the AI Agent has changed the state to FINISHED.

Once the AI Agent is in a FINISHED state and the process has changed anything in the git repository, it will execute internally a git commit & git push to a feature branch and will create a PR to be reviewed.

Finally we have our lovely Hello World in the repository:

package info.jab.examples;

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}

Step 3: Review the pull request

Once Cursor Cloud Agent reaches the goal specified in the user prompt, the service will create a PR in the repository to be reviewed by your team, independent of your Git branch strategy like Trunk-Based Development, Gitflow, or similar.

When to choose Cursor Agent CLI and when to choose Cursor Cloud Agents?

Exploring new technologies always has a cost. Let’s list a few factors to help in your decision-making:

Complexity of the scenario to automate: If the scenario to delegate is easy, like a simple task or a sequence of operations, Cursor Agent CLI could be the first option. On the other hand, if you need to model a process that behaves like a state machine or a directed graph, Cursor Cloud Agents offer better options because they provide more granular control of the execution.
Experience with AI agents in the team: If you are starting with AI Agents, Cursor Agent CLI could be the first option to run a pilot because it requires less effort to start and receive results.
Pipeline capacity: Depending on the pipeline capacity, you might consider whether to run the operations in your pipelines or outside.
Costs: Both products use the same subscription, so the cost structure is the same today.
Tooling: Frontier models use tools to interact in the environments where they operate. Currently, Cursor Agent CLI has support for MCP, but this feature is not available in Cursor Cloud Agents. An important question might be: Why do you need MCP for everything? A CLI interface might be enough.
Observability: If observability is a critical factor, Cursor Cloud Agents provides a specific endpoint to retrieve the internal agent conversation. This approach is useful for analysis in case of errors.

Until now, we have reviewed the way to execute user prompts by comparing 2 products, but in both cases, you can decouple the location of your prompts from the location of the execution. In the next sections, we will explore aspects of user prompts that will help you be more efficient and maintain them with less effort.

Developing great user prompts

Until now, we have only explained the output of the service—in the previous case, the creation of a Java class that writes to the terminal’s standard output. But how do you increase efficiency in the process? It’s simple: send a request with a user prompt that minimizes ambiguity to reach the defined goals.

An initial Hello World user prompt

You might think that a good user prompt could be:

Develop a classic Java class HelloWorld.java program
that prints "Hello World" in the console only.

And nothing more. But in practice, this idea—which apparently seems very easy—could be interpreted by frontier models in several ways, independent of which frontier model is used, because frontier models have non-deterministic behavior and may have doubts about:

The location of the Java class
The approach to compile the class (javac or using a build system)
Whether they need to commit .class files
Whether they need to use System.out.println or IO.println (Java 25+)

If you understand the potential problems on the frontier model side, let’s iterate on this user prompt.

Moving away from plain text user prompts

When you use modern IDEs with AI features and the frontier model doesn’t return the expected result, you continue the conversation, and after a few iterations, the result is as expected. But when using this kind of service running in your pipelines where you expect accurate results, you need to define restrictions and other details clearly to achieve your goals. So little by little, that user prompt will require some structure to operate accurately.

Encoding your User prompts in PML format

PML is the acronym for Prompt Markup Language, an XML Schema designed to help software engineers describe user prompts accurately.

Take a look at the evolution from plain text to PML with the new sections:

Text plain:

Develop a classic Java class HelloWorld.java program
that prints "Hello World" in the console only.

XML with PML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<prompt xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:noNamespaceSchemaLocation="https://jabrena.github.io/pml/schemas/0.3.0/pml.xsd">

    <role>
        You are a Senior software engineer with extensive         experience in Java software development
    </role>

    <goal>
        Develop a classic Java class HelloWorld.java program
        that print "Hello World" in the console only
    </goal>

    <constraints>
        <constraint-list>
            <constraint>The develop the class in the Maven module `sandbox`</constraint>
            <constraint>The develop the class in the package info.jab.examples</constraint>
            <constraint>Do not create any test class</constraint>
            <constraint>Do not touch the build file (pom.xml)</constraint>
        </constraint-list>
    </constraints>

    <output-format>
        <output-format-list>
            <output-format-item>Don not explain anything</output-format-item>
        </output-format-list>
    </output-format>

    <safeguards>
        <safeguards-list>
            <safeguards-item>Build the solution with Maven Only</safeguards-item>
        </safeguards-list>
    </safeguards>

    <acceptance-criteria>
        <acceptance-criteria-list>
            <acceptance-criteria-item>The solution is compiled successfully with `./mvnw clean compile -pl sandbox`</acceptance-criteria-item>
            <acceptance-criteria-item>The solution only prints "Hello World" in the console</acceptance-criteria-item>
            <acceptance-criteria-item>Only commit java sources only and push the changes to the branch to create the PR</acceptance-criteria-item>
        </acceptance-criteria-list>
    </acceptance-criteria>
</prompt>

Although we have increased the number of lines, now the user prompt look robust and we have mitigated the ambiguity and now it has a better structure and it will be easier to maintain in the future with new refinements.

User prompt
- Role
- Goal
- Restrictions
- Output format
- Safeguards
- Acceptance criteria

Once you have created the document, it can be validated with the XML Schema and later transformed to another format like Markdown.

Here is the result converted into Markdown:

## Role

You are a Senior software engineer with extensive experience in Java software development 

## Goal 

Develop a classic Java class HelloWorld.java program 
that print "Hello World" in the console only 

## Constraints 

- The develop the class in the Maven module `sandbox` 
- The develop the class in the package info.jab.examples 
- Do not invest time in planning 
- Do not create any test class 
- Do not touch the build file (pom.xml) 

## Output Format 

- Don not explain anything 

## Safeguards 

- Build the solution with Maven Only 

## Acceptance Criteria 

The goal will be achieved if the following criteria are met: 

- The solution is compiled successfully with `./mvnw clean compile -pl sandbox` 
- The solution only prints "Hello World" in the console 
- The solution is committed and pushed to the branch to create the PR 
- Only commit java sources only and push the changes to the branch to create the PR

Using XML as the source format for your user prompts, you could use the composability features that XML includes. On the other hand, when creating or updating PML files, you always create files with the same syntax, so your prompts will be homogeneous at scale.

What happen if something goes wrong?

Don’t be naive—even the most complex systems in the world, like nuclear plants, have incidents in different ways, so why wouldn’t this kind of integration have them too? Let’s explore different types of issues that your threat model plan should cover in your projects using this kind of technology.

Scenario: using Cursor Agent cli from a Pipeline

Imagine the scenario where you delegate a task to Cursor Agent CLI in the execution of your pipeline. What issues could happen?

Scenario: Using Cursor Agent CLi from the pipeline

Issues at Pipeline Level

Third-party dependencies: If your pipeline is too complex, the global reliability may suffer. Review the chain of dependencies to simplify your pipeline and avoid runtime issues. Log runtime issues with third-party dependencies. Using a predefined Docker image could reduce the number of runtime issues.
Unexpected files included in the PR: When frontier models try to resolve the goals described in the user prompt, they sometimes need to create scripts, extra files, or simply store files for analysis and debugging. Refine your user prompts to specify exactly what files and file extensions are valid; another alternative is to combine this with .gitignore files.

Issues at Cursor Agent CLI Level

Cursor Agent CLI doesn’t make progress: This is rare, but you may not see execution progress due to different runtime issues. For such cases, it is important to define realistic timeouts for the pipeline step that involves this integration.
Cursor Agent CLI enters a loop: Sometimes, if you send an unclear or unrealistic user prompt, the process may enter a loop. To avoid this, review your user prompts, and if this happens, define clear timeouts to reduce costs and finish the process sooner.

Scenario: Orchestrating Cursor Cloud Agent from a Pipeline

Imagine the scenario where you try to orchestrate an integration with the service Cursor Cloud Agent from a popular Pipeline. What issues could happen?

Scenario: Orchestrating Cursor Cloud Agents from the Pipeline

Issues at Pipeline level

Third-party dependencies: If your pipeline is too complex, the global reliability may suffer. Review the chain of dependencies to simplify your pipeline and avoid runtime issues. Log runtime issues with third-party dependencies. Using a predefined Docker image build could reduce the number of runtime issues.
The task related to Cursor Cloud Agent failed: It is not common, but the service can fail. In that case, I recommend logging the Agent ID returned from the launch operation (POST /v0/agents) and writing the internal conversation using the REST endpoint GET /v0/agents/{id}/conversation.
I am not able to create more Cursor Cloud Agents: If you use the service and submit the PR, at the end of the process, you should delete the agent to release resources at the Cursor level—the resources are not infinite. Use the endpoint DELETE /v0/agents.

Issues at Cursor Cloud Agent level

Cursor Cloud Agent finished with an ERROR state: Yes, it is not common, but sometimes it happens for different reasons. On the user side, for example, if you change a customized Cursor environment described in .cursor/environment.json and the associated Dockerfile, you could encounter this issue, but there could also be unknown runtime issues at the Cursor level. Logging the Agent ID and the conversation can be useful.
Cursor Cloud Agent doesn’t make progress: This is rare, but you may not see execution progress due to different runtime issues. For such cases, it is important to define realistic timeouts for the pipeline step that involves this integration.
Cursor Cloud Agent enters a loop: Sometimes, if you send an unclear or unrealistic user prompt, the process may enter a loop. To avoid this, review your user prompts, and if this happens, define clear timeouts to reduce costs and finish the process sooner.
Cursor Cloud Agent includes files not requested in the PR: When frontier models try to resolve the goals described in the user prompt, they sometimes need to create scripts, extra files, or simply store files for analysis and debugging. Refine your user prompts to specify exactly what files and file extensions are valid; another alternative is to combine this with .gitignore files.

In general, it is a good practice to log the Agent ID for potential Cursor support and log the internal frontier model conversation for further analysis in order to improve the user prompt. Do not miss creating a threat model in your projects.

Real world scenarioS

If you have doubts about what scenarios could be used for this new cloud service, I’ll share a few scenarios that you might find interesting.

Continuous documentation: Not everyone loves documenting solutions, and sometimes the documentation is outdated over time. You could use this service to update the documentation at different levels for the team, externally, or simply to train people.
Continuous coding standard refactoring: People come and go on your team, and everyone has a different programming style. In Java, you can establish format rules with plugins like Spotless or similar, but there is no tooling to unify the programming paradigm or style. Google and other companies have published Java guides—why not refactor your software using your style? If you have good tests, you’re safe.
Fix changes if a third party breaks the contract: Oh my god, that team changed the contract again, and the product owner didn’t estimate the task in the sprint. Okay, let’s monitor the contract—if something changes, let’s delegate the action to evaluate the level of change and determine if it’s acceptable, then adapt the anticorruption layer to the new change.
Continuous Sonar cleanup: Oh my god, the Sonar gate is blocked again, and the product owner doesn’t allow us to release the product. Okay, let’s run the pipeline that retrieves the failing security hotspots and issues with blocker & high severity to be fixed today.
Continuous profiling: Not everyone on the team has good skills to understand files like flamegraphs, thread dump files, GC logs, etc. But why not delegate that task to the service to discover new opportunities to improve your products?
Simplify complexity: Periodically, you could run a pipeline that reviews the current implementation to simplify architecture, implementation, data types used, etc. Simple systems are maintained better.
Empower people based on team issues: Periodically, issues reported in the ticket system or similar platforms could be a good source of ideas to train the squad and sharpen the axe.
…
Solve the Advent of Code 2025 with a scheduled pipeline every day.

https://adventofcode.com/2025

Creativity and your monthly budget mark the limit.

LIMITATIONS

This technology is awesome, but you should consider the following factors:

Cognitive load: When adding these new virtual hands to your squad, you need to ensure that everyone is able to review the new PRs with quality. Apart from using this technology for delivery, review the new opportunities to strengthen the team based on the issues identified as input.
Budget: This technology is not free. It is another input for your engineering manager in terms of cost and resources to maintain the prompts and pipelines.
Resources: You could create several pipelines, but you need to calibrate with the current cadence of PR reviews to avoid saturating the process.

ExampleS in action

using cursor agent cli in action: Orchestrating Cursor Cloud Agent from a Pipeline

Review the following step to understand how to run a pipeline with Cursor Agent CLI using user prompts based on PML.

 - name: Run Cursor Agent
   env:
     CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
   run: |
     echo "=== User Prompt:===";
     jbang trust add https://github.com/jabrena/
     PROMPT=$(jbang pml-to-md.0.4.0-SNAPSHOT@jabrena convert pml-hello-world-java.xml)
     echo "$PROMPT";
     echo "=== Cursor Agent Execution:===";
     echo "";
     cursor-agent -p "$PROMPT" --model auto

In the previous example, the Cursor agent processes a user prompt in Markdown which was originally created in XML (using a PML schema).

Orchestrating Cursor Cloud Agent from a Pipeline

A picture is worth a thousand words. You can see a service that monitors Cursor Cloud Agent runtime at the following address: https://jabrena.github.io/cursor-cloud-agent-rest-api-status/

Cursor Cloud Agent REST API Status

Every hour, the service tests the execution to verify different aspects of the solution. After a month of running the service, I can assert that latencies are stable, and this fact is important when designing AI solutions that don’t require near real-time feedback. Further information about the pipeline here: https://github.com/jabrena/cursor-cloud-agent-rest-api-status/blob/main/.github/workflows/scheduled-ping-agent.yaml

Note: The service has been running for more than 1 month (30 × 24 × 4 samples stored). Under the hood, the pipeline uses Churrera CLI, an Open source Java CLI tool designed to orchestrate Cursor Cloud Agents and measure latencies.

Takeaways

Cursor Agent CLI is a nice way to start using frontier models in pipelines.
Cursor Cloud Agents is a nice way to use frontier models for parallel and complex scenarios because it offers an easy REST interface with fine-grained control over the process.
Running a pilot in non-critical services could be a good way to understand the possibilities and train people.
PML is a great way to write robust user prompts based on XML, which can be verified and transformed later to other hierarchical formats like Markdown.
Decouple User prompts from Agent systems. It will be considered as another IT asset in the future.
You can combine User prompts plus System prompts like Cursor rules or Claude Skills to enrich the final result.
Both alternatives create pull requests, the team has the last word on accepting the new code added to the main/develop branch. This model is also compatible if you use Extreme Programming.
Review your engineering processes to include a threat modeling task for new operations that include AI.
Models can interact with the environment using CLI tools or MCP tools.
You can assign the operation of these pipelines to junior profiles because these problems were modeled with user prompts, so the risk is very limited. However, there exists a nice opportunity to improve the solutions with ideas like adding functional programming patterns, improving the OOP design, improving performance based on multiple factors, etc.

References

Author: Juan Antonio Breña Moral

Software Engineering Manager – Platform Engineer – Java Specialist

Twitter LinkedIn Github