# How-To

The Kernel SDK provides all the building blocks needed to create sophisticated AI applications.
If you want to include any dependencies in your Skill, have a look [here](03-core_concepts.md#wasm-component)

## Completion

The base building block of any AI framework is the ability to do completion requests:

```python
from pharia_skill import Csi, skill
from pydantic import BaseModel

# define Input & Output models
# ...

@skill
def complete(csi: Csi, input: Input) -> Output:
    prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

    You are a poet who strictly speaks in haikus.<|eot_id|><|start_header_id|>user<|end_header_id|>

    {input.topic}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
    params = CompletionParams(max_tokens=64)
    completion = csi.complete("llama-3.1-8b-instruct", prompt, params)
    return Output(haiku=completion)
```

## RAG

Skills can access knowledge from external documents. You can query the DocumentIndex:

```python
from pharia_skill import Csi, IndexPath, skill

@skill
def rag(csi: Csi, input: Input) -> Output:
    # specify the index we query against
    index = IndexPath(
        namespace="my-team-namespace"
        collection="confluence",
        index="asym-256",
    )

    # search for the input topic in the confluence collection
    documents = csi.search(index, query=input.topic)
```

## Streaming

The SDK provides interfaces to receive chat and completion responses in chunks, and to return intermediate responses.
This allows building Skills that stream their output in small chunks, in contrast to only returning a single response.

### Message Stream

A message stream tries to model a conversation as a sequence of messages.
In a stream there is only one active message at a time.
You can write text to the message stream using a writer, which is passed into the Skill.
The Kernel takes care of translating the interactions with the writer into Server-Sent-Events.
Each message has a begin and an end, which can be indicated by [writer.begin_message](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.MessageWriter.begin_message) and [writer.end_message](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.MessageWriter.begin_message) respectively.
Between these, you can iteratively append text with [writer.append_to_message](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.MessageWriter.append_to_message)
When ending the message, you can provide an optional, arbitrary payload.

```python
writer.begin_message("assistant")
writer.append_to_message("Hello, ")
writer.append_to_message("world!")
writer.end_message(None)
```

### Requesting a Stream

To request a chat completion as a stream, you can use the [csi.chat_stream](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.Csi.chat_stream) context manager.
It returns a [ChatStreamResponse](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.ChatStreamResponse), which provides a `stream` method you can iterate over:

```python
params = ChatParams()
with csi.chat_stream(model, messages, params) as response:
    for event in response.stream():
        # e.g. writer.append_to_message(event.content)
        ...
```

The writer also provides a convenience method for returning a `ChatStreamResponse` directly:

```python
params = ChatParams()
with csi.chat_stream(model, messages, params) as response:
    writer.forward_response(response)
```

In case you want to stream a completion response (in contrast to a chat response), you can use [csi.completion_stream](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.Csi.completion_stream).

### Decorator

Skills that stream their output must be annotated with the [message_stream](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.message_stream) decorator.
They have some unique properties:

1. They take a second argument of type `MessageWriter`
2. They don't return anything
3. If you want to return a custom payload, use [writer.end_message](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.MessageWriter.end_message)

```python
from pharia_skill import ChatParams, Csi, Message, MessageWriter, message_stream

@message_stream
def haiku_stream(csi: Csi, writer: MessageWriter[None], input: Input) -> None:
    model = "llama-3.1-8b-instruct"
    messages = [
        Message.system("You are a poet who strictly speaks in haikus."),
        Message.user(input.topic),
    ]
    params = ChatParams()
    with csi.chat_stream(model, messages, params) as response:
        writer.forward_response(response)
```

Using the `message_stream` decorator requires passing the `--skill-type message-stream-skill` flag when running `pharia-skill build`.

## Conversational Search

Conversational search is the idea to have a chat conversation with an LLM which has access to a knowledge database.
To implement this, we first need a Skill that exposes a chat interface.

The [OpenAI Chat API](https://platform.openai.com/docs/api-reference/chat) is emerging as a standard to expose conversational interface of LLMs.
This API is also offered in the `Csi` with the `chat` method. Leveraging this, you can easily expose your own custom flavoured chat API as a Kernel Skill.
Note that you can return expose internal datatypes in the interface of you Skill as long as they are wrapped in a `Pydantic` model:

```python
from pharia_skill import Csi, Message, skill

class ChatInterface(BaseModel):
    """A chat input that is compatible with the OpenAI chat API."""

    message: list[Message]

@skill
def conversational_search(csi: Csi, input: ChatInterface) -> ChatInterface:
    # Alter the input message in any way to apply your own flavour
    # You could add a search lookup to allow conversational search, or just
    # prepend a custom system prompt
    input = do_search_lookup(input)
    output = csi.chat("llama-3.1-8b-instruct", input.messages)
    return ChatInterface(input.messages + [output.message])
```

You only need to define the `do_search_lookup` function and augment the incoming messages with some context.

## Tools

The Kernel and SDK offer support for function calling and tool invocations.
Details on how tools can be made available via MCP can be found in the [Tool Calling](03-core_concepts.md#tool-calling) section of the core concepts.

### Automatic Tool Calling

The [csi.chat_stream](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.Csi.chat_stream) supports automatic tool calling.
Tools are made available to the model by specifying their name as part of the request.
They are automatically added to the system prompt. If a custom system prompt is provided, these are merged by the SDK.

```python
with csi.chat_stream(model, messages, tools=["search", "fetch"]) as response:
    ...
```
The tool names are resolved to the correct schema by the SDK, as long as they are available ot the namespace.
In case the model requests a tool call, it is executed, and the response is fed back to the model.
This loop continues until the model returns a non-tool call response.
A typical usage pattern would be:

```python
from pydantic import BaseModel
from pharia_skill import Csi, Message, MessageWriter, message_stream


class Input(BaseModel):
    messages: list[Message]


@message_stream
def web_search(csi: Csi, writer: MessageWriter[None], input: Input) -> None:
    model = "llama-3.3-70b-instruct"
    with csi.chat_stream(model, input.messages, tools=["search", "fetch"]) as response:
        writer.forward_response(response)
```


### Manual Tool Calling

The [csi.chat_stream_step](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.Csi.chat_stream_step) functions offers more granular control over the execution loop.
It allows specifying available tool schemas, but leaves the execution of the tool to the user.
A typical usage pattern might look like:

```python
from pydantic import BaseModel
from pharia_skill import Csi, Message, MessageWriter, message_stream


class Input(BaseModel):
    messages: list[Message]


@message_stream
def web_search(csi: Csi, writer: MessageWriter[None], input: Input) -> None:
    messages = input.messages
    model = "llama-3.3-70b-instruct"

    # Retrieve the tool schemas from the csi
    tools = [t for t in csi.list_tools() if t.name in ("search", "fetch")]

    response = csi.chat_stream_step(model, messages, params, tools)
    while (tool_call := response.tool_call()) is not None:
        # add the tool call request to the conversation
        messages.append(tool_call.as_message())
        try:
            tool_response = self.invoke_tool(tool_call.name, **tool_call.parameters)
            # add the tool response to the conversation
            messages.append(tool_response.as_message())
        except ToolError as e:
            messages.append(
                Message.tool(f'failed[stderr]:{{"error": {e.message}}}[/stderr]')
            )
        response = self.chat_stream_step(model, messages, params, tool_schemas)

    writer.forward_response(response)
```

The [tool_call](https://pharia-skill.readthedocs.io/en/latest/references.html#pharia_skill.ChatStreamResponse.tool_call) method allows to check if a response contains a tool calling.
It does not consume the stream for normal responses, so the last response to the end user will still be returned as a stream.

### Stream Events

For message stream skills, the Kernel reports tool call events via the SSE stream to the caller.
The caller will receive an event when a tool call starts and when a tool call finishes.