JVM Advent

The JVM Programming Advent Calendar

AI orchestration with Semantic Kernel

Why AI orchestration

Large Language Models (LLMs) are changing the way we develop and interact with applications and software. To back-office employees to business users to developers, natural language is the future of user interaction. This brings challenges and innovation in the way we build and code AI applications.

We need modern ways to manage these AI applications. AI orchestration is the process of managing multiple AI applications and services in a coordinated way. It helps to optimize the performance, scalability, and reliability of AI solutions. AI orchestration is solving the problem of complexity and fragmentation in the AI landscape, where different tools, platforms, and frameworks are used for different tasks and goals.

What is Semantic Kernel aka SK?

Semantic Kernel (SK) is a powerful SDK that allows you to combine traditional programming languages, such as Java, C#, and Python, with the most advanced Large Language Model (LLM) AI “prompts” that support prompt templating, chaining, and planning features. This lets you create new functionalities in your apps that can boost the productivity of your users: for example, summarizing a long chat conversation, highlighting an important “next step” that’s automatically added to your to-do list, or planning a whole vacation instead of just booking a flight.

Components of SK

There are several components within SK to build AI applications. In the following section let me walk through them and explain their usage know-how.

Kernel

The kernel orchestrates a user’s ask. To do so, the kernel runs a pipeline or chain of tasks that is defined. While the pipeline or chain is executing, a common context is provided by the kernel so data can be shared and passed between those underlying tasks.

First to create a Kernel we need the OpenAI endpoint and model details. In the following example we are using Azure OpenAI endpoint.

client.azureopenai.key=XXYYZZ1234
client.azureopenai.endpoint=https://nljug.openai.azure.com/
client.azureopenai.deploymentname=gpt-35-turbo

Then to initialize Kernel, we would need to read these properties (in this example from conf.properties file)

AzureOpenAISettings settings = new AzureOpenAISettings(SettingsMap.
                    getWithAdditional(List.of(new File("src/main/resources/conf.properties"))));

OpenAIAsyncClient client = new OpenAIClientBuilder().endpoint(settings.getEndpoint())
        .credential(new AzureKeyCredential(settings.getKey())).buildAsyncClient();

TextCompletion textCompletion = SKBuilders.chatCompletion()
        .withOpenAIClient(client)
        .withModelId(settings().getDeploymentName())
        .build();

Kernel kernel = SKBuilders.kernel().withDefaultAIService(textCompletion).build();

You can select the LLM service while instantiating Kernel object, for example, all models of OpenAI, Azure OpenAI or Hugging Face.

Also, to gain visibility of what the Kernel is doing you can add telemetry and logs to this object.

Plugins

Plugins are like the “body” of your AI app. They consist of prompts and native functions. You can connect your application to AI plugins, enabling interactions with the real world. By leveraging plugins, you can encapsulate various capabilities into a cohesive unit of functionality. This unified functionality can then be executed by the kernel. Plugins have the flexibility to incorporate both native code and requests to AI services through semantic functions.

A plugin consists of one or more Semantic functions. To create a Semantic function, you need to define a “skprompt.txt”, which holds the prompt you want to use with input definitions.

Below is the example of a “Translate” function.

Translate the input below into {{$language}}

MAKE SURE YOU ONLY USE {{$language}}.

{{$input}}

Translation:

To provide a semantic description of this function (and configure the AI service), you’ll need to create a “config.json” file in the same folder as the prompt. This file outlines the function’s input parameters and description.

{
  "schema": 1,
  "type": "completion",
  "description": "Translate the input into a language of your choice",
  "completion": {
    "max_tokens": 2000,
    "temperature": 0.7,
    "top_p": 0.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "stop_sequences": [
      "[done]"
    ]
  },
  "input": {
    "parameters": [
      {
        "name": "input",
        "description": "Text to translate",
        "defaultValue": ""
      },
      {
        "name": "language",
        "description": "language of translation",
        "defaultValue": ""
      }
    ]
  }
}

To use this in the Kernel object and execute the Semantic function, you will need to write the below lines of code.

ReadOnlyFunctionCollection skill = kernel.
        importSkillFromDirectory("TranslateSkill", "src/main/resources/Skills", "TranslateSkill");
CompletionSKFunction translateFunction = skill.getFunction("Translate", CompletionSKFunction.class);
SKContext translateContext = SKBuilders.context().build();
translateContext.setVariable("input", "How are you doing?");
translateContext.setVariable("language", "Dutch");
Mono<SKContext> result = translateFunction.invokeAsync(summarizeContext);

We can also create Plugins and functions as a Java class. Then we can instantiate and use it in type-safe way. There is a submodule defines some out-of-the-box Plugins you can use –> semantickernel-plugin-core. But it is also possible to implement your own, using annotations @DefineSKFunction, @SKFunctionInputAttribute, @SKFunctionParameters. This pattern is specifically used to invoke external APIs, or some execute business logic and is useful when you chain different semantic functions.

Chaining Plugins/Semantic functions

Chaining plugins or semantic functions involves linking multiple functions together to create a cohesive pipeline. Each function processes input data and passes the result to the next function. This allows developers to build complex workflows, combining various AI services and native code seamlessly. Think of it as assembling a series of interconnected building blocks, where each function contributes to the overall functionality of the application.

The following code demonstrates chaining of Plugins where Summarize and Translate happens one after another for a given input text.

kernel.importSkillFromDirectory("SummarizeSkill", "src/main/resources/Skills", "SummarizeSkill");
kernel.importSkillFromDirectory("TranslateSkill", "src/main/resources/Skills", "TranslateSkill");

SKContext summarizeContext = SKBuilders.context().build();
summarizeContext.setVariable("input", ChatTranscript);
summarizeContext.setVariable("language", "dutch");
Mono<SKContext> result = kernel.runAsync(
                summarizeContext.getVariables(),
                kernel.getSkill("SummarizeSkill").getFunction("Summarize"),
                kernel.getSkill("TranslateSkill").getFunction("Translate"));

Planners

Planner in the Semantic Kernel is a powerful feature that automatically orchestrates AI services. It takes a user’s request and generates a plan on how to achieve it. By intelligently combining registered plugins, planners create workflows for tasks like reminders or complex data mining, enhancing the flexibility and efficiency of AI-powered applications.

For a given construct above, if the “Task” is to Summarize and then Translate a given text, the Planner will pick up “Summarizer” and “Translator” functions to generate the “Result”.

Another example is, for a “Task” to Define and then Email generation for a given text, the Planner will pick up “Define” and “Email Gen.” functions to generate the “Result”.

There are 3 different planners available, Action, Sequential and Stepwise.

The following code example shows a Sequential planner which has 3 Plugins with multiple semantic functions in each of them, and depending on the task it will intelligently pick the semantic functions necessary to perform the task.

kernel.importSkillFromDirectory("WriterSkill", "src/main/resources/Skills", "WriterSkill");
kernel.importSkillFromDirectory("SummarizeSkill", "src/main/resources/Skills", "SummarizeSkill");
kernel.importSkillFromDirectory("DesignThinkingSkill", "src/main/resources/Skills", "DesignThinkingSkill");

SequentialPlanner planner = new SequentialPlanner(kernel, new SequentialPlannerRequestSettings(
                    <relevancyThreshold>, <maxRelevantFunctions>, Set.of(), Set.of(), Set.of(),
                    <maxTokens>
                    ), <SystemPrompt>);

Mono<SKContext> result = planner.
        createPlanAsync("rewrite the following text in Yoda from Starwars style" + <TextToSummarize>)
        .invokeAsync();

Giving memories to SK

What’s memory? Think of giving LLM information to support it when executing a (or a set) of Plugins. For example, you want to get a summary of your dental insurance details from a 100-page insurance document, here you might choose to send the whole document along with your query to the LLM, but every model in LLM has a limit on tokens what it can process with per request. To manage that token limitation, it is helpful if you can search the document first, get the relevant pages or texts out of it and then query the LLM with those selected pages/texts. That way LLM can efficiently summarize the content and not run out of tokens per request. The “selected pages/texts” in this example are Memories.

For this example, we will be using 2 Kernels, one with AI Services type Embedding and another one with TextCompletion.

Kernel with embedding needs a MemoryStore, this can be in-memory or any vector store. Although, right now, SK in Java supports only Azure Cognitive Search, support to other vector stores will be coming soon.

The kernel with Embedding with Azure Cognitive Search (ACS) as data or vector store is instantiated as follows:

EmbeddingGeneration<String> textEmbeddingGenerationService =
        SKBuilders.textEmbeddingGeneration()
                .withOpenAIClient(openAIAsyncClient)
                .withModelId("embedding")
                .build();

Kernel kernel = SKBuilders.kernel()
        .withDefaultAIService(textEmbeddingGenerationService)
        .withMemoryStorage(new AzureCognitiveSearchMemory("<ACS_ENDPOINT>", "<ACS_KEY>"))
        .build();

If documents and/or data are already indexed in ACS, you can search on it, which gives you back the relevant sections of the memory.

Mono<List<MemoryQueryResult>> relevantMemory =

        kernel.getMemory()
        .searchAsync(<INDEX_NAME>,<QUERY> , 2, 0.7f, true);

 

Then we can use these search results as an extra context while invoking LLM, for example, a Summarizer function on the above search results.

List<MemoryQueryResult> relevantMems = relevantMemory.block();

StringBuilder memory = new StringBuilder();
relevantMems.forEach(relevantMem -> memory.append("text: ").append(relevantMem.getMetadata().getText()));

Kernel kernel = kernel();
ReadOnlyFunctionCollection conversationSummarySkill =
kernel.importSkill(new ConversationSummarySkill(kernel),null);

Mono<SKContext> summary = conversationSummarySkill
.getFunction("SummarizeConversation", SKFunction.class)
.invokeAsync(relevantMemory);

 

The above example uses Summarizer defined as a Java class using annotation @DefineSKFunction.

Wrapping Up

This article introduces SK, there are many more components and patterns you can implement using this SDK which I could not include here. The link to the official documentation and the GitHub repo of SK.

If you are interested in playing around with examples, please check out my personal GitHub repo.

Author: Soham Dasgupta

I am a technology enthusiast working at Microsoft as a Solution Architect, with over 17 years of experience in software programming, designing, and architecture which includes on-prem, cloud-native applications, and web-based conversational application design.

Next Post

Previous Post

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2024 JVM Advent | Powered by steinhauer.software Logosteinhauer.software

Theme by Anders Norén