How-To

The Kernel SDK provides all the building blocks needed to create sophisticated AI applications. If you want to include any dependencies in your Skill, have a look here

Completion

The base building block of any AI framework is the ability to do completion requests:

from pharia_skill import Csi, skill
from pydantic import BaseModel

# define Input & Output models
# ...

@skill
def complete(csi: Csi, input: Input) -> Output:
    prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

    You are a poet who strictly speaks in haikus.<|eot_id|><|start_header_id|>user<|end_header_id|>

    {input.topic}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
    params = CompletionParams(max_tokens=64)
    completion = csi.complete("llama-3.1-8b-instruct", prompt, params)
    return Output(haiku=completion)

RAG

Skills can access knowledge from external documents. You can query the DocumentIndex:

from pharia_skill import Csi, IndexPath, skill

@skill
def rag(csi: Csi, input: Input) -> Output:
    # specify the index we query against
    index = IndexPath(
        namespace="my-team-namespace"
        collection="confluence",
        index="asym-256",
    )

    # search for the input topic in the confluence collection
    documents = csi.search(index, query=input.topic)

Streaming

The SDK provides interfaces to receive chat and completion responses in chunks, and to return intermediate responses. This allows building Skills that stream their output in small chunks, in contrast to only returning a single response.

Message Stream

A message stream tries to model a conversation as a sequence of messages. In a stream there is only one active message at a time. You can write text to the message stream using a writer, which is passed into the Skill. The Kernel takes care of translating the interactions with the writer into Server-Sent-Events. Each message has a begin and an end, which can be indicated by writer.begin_message and writer.end_message respectively. Between these, you can iteratively append text with writer.append_to_message When ending the message, you can provide an optional, arbitrary payload.

writer.begin_message("assistant")
writer.append_to_message("Hello, ")
writer.append_to_message("world!")
writer.end_message(None)

Requesting a Stream

To request a chat completion as a stream, you can use the csi.chat_stream context manager. It returns a ChatStreamResponse, which provides a stream method you can iterate over:

params = ChatParams()
with csi.chat_stream(model, messages, params) as response:
    for event in response.stream():
        # e.g. writer.append_to_message(event.content)
        ...

The writer also provides a convenience method for returning a ChatStreamResponse directly:

params = ChatParams()
with csi.chat_stream(model, messages, params) as response:
    writer.forward_response(response)

In case you want to stream a completion response (in contrast to a chat response), you can use csi.completion_stream.

Decorator

Skills that stream their output must be annotated with the message_stream decorator. They have some unique properties:

They take a second argument of type MessageWriter
They don’t return anything
If you want to return a custom payload, use writer.end_message

from pharia_skill import ChatParams, Csi, Message, MessageWriter, message_stream

@message_stream
def haiku_stream(csi: Csi, writer: MessageWriter[None], input: Input) -> None:
    model = "llama-3.1-8b-instruct"
    messages = [
        Message.system("You are a poet who strictly speaks in haikus."),
        Message.user(input.topic),
    ]
    params = ChatParams()
    with csi.chat_stream(model, messages, params) as response:
        writer.forward_response(response)

Using the message_stream decorator requires passing the --skill-type message-stream-skill flag when running pharia-skill build.

Conversational Search

Conversational search is the idea to have a chat conversation with an LLM which has access to a knowledge database. To implement this, we first need a Skill that exposes a chat interface.

The OpenAI Chat API is emerging as a standard to expose conversational interface of LLMs. This API is also offered in the Csi with the chat method. Leveraging this, you can easily expose your own custom flavoured chat API as a Kernel Skill. Note that you can return expose internal datatypes in the interface of you Skill as long as they are wrapped in a Pydantic model:

from pharia_skill import Csi, Message, skill

class ChatInterface(BaseModel):
    """A chat input that is compatible with the OpenAI chat API."""

    message: list[Message]

@skill
def conversational_search(csi: Csi, input: ChatInterface) -> ChatInterface:
    # Alter the input message in any way to apply your own flavour
    # You could add a search lookup to allow conversational search, or just
    # prepend a custom system prompt
    input = do_search_lookup(input)
    output = csi.chat("llama-3.1-8b-instruct", input.messages)
    return ChatInterface(input.messages + [output.message])

You only need to define the do_search_lookup function and augment the incoming messages with some context.

Tools

The Kernel and SDK offer support for function calling and tool invocations. Details on how tools can be made available via MCP can be found in the Tool Calling section of the core concepts.

Automatic Tool Calling

The csi.chat_stream supports automatic tool calling. Tools are made available to the model by specifying their name as part of the request. They are automatically added to the system prompt. If a custom system prompt is provided, these are merged by the SDK.

with csi.chat_stream(model, messages, tools=["search", "fetch"]) as response:
    ...

The tool names are resolved to the correct schema by the SDK, as long as they are available ot the namespace. In case the model requests a tool call, it is executed, and the response is fed back to the model. This loop continues until the model returns a non-tool call response. A typical usage pattern would be:

from pydantic import BaseModel
from pharia_skill import Csi, Message, MessageWriter, message_stream


class Input(BaseModel):
    messages: list[Message]


@message_stream
def web_search(csi: Csi, writer: MessageWriter[None], input: Input) -> None:
    model = "llama-3.3-70b-instruct"
    with csi.chat_stream(model, input.messages, tools=["search", "fetch"]) as response:
        writer.forward_response(response)

Manual Tool Calling

The csi.chat_stream_step functions offers more granular control over the execution loop. It allows specifying available tool schemas, but leaves the execution of the tool to the user. A typical usage pattern might look like:

from pydantic import BaseModel
from pharia_skill import Csi, Message, MessageWriter, message_stream


class Input(BaseModel):
    messages: list[Message]


@message_stream
def web_search(csi: Csi, writer: MessageWriter[None], input: Input) -> None:
    messages = input.messages
    model = "llama-3.3-70b-instruct"

    # Retrieve the tool schemas from the csi
    tools = [t for t in csi.list_tools() if t.name in ("search", "fetch")]

    response = csi.chat_stream_step(model, messages, params, tools)
    while (tool_call := response.tool_call()) is not None:
        # add the tool call request to the conversation
        messages.append(tool_call.as_message())
        try:
            tool_response = self.invoke_tool(tool_call.name, **tool_call.parameters)
            # add the tool response to the conversation
            messages.append(tool_response.as_message())
        except ToolError as e:
            messages.append(
                Message.tool(f'failed[stderr]:{{"error": {e.message}}}[/stderr]')
            )
        response = self.chat_stream_step(model, messages, params, tool_schemas)

    writer.forward_response(response)

The tool_call method allows to check if a response contains a tool calling. It does not consume the stream for normal responses, so the last response to the end user will still be returned as a stream.

Stream Events

For message stream skills, the Kernel reports tool call events via the SSE stream to the caller. The caller will receive an event when a tool call starts and when a tool call finishes.