The AI Building Blocks framework is provider-agnostic and fully extensible. You can integrate new inference backends, custom transports, or unique model types simply by implementing standard task interfaces. This design allows your Unity Agents (LLM, Object Detection, STT, TTS) to remain unchanged while switching to any compatible model or service.

Before You Begin

Review existing Providers (Llama API, OpenAI, HuggingFace) before creating your own. Use them as templates. All Providers follow the same task interface patterns and are discovered automatically at install time.

Architecture Recap

Each AI Building Block consists of two layers:

Layer	Role
Agent (MonoBehaviour)	Handles Unity-side orchestration: captures input, triggers inference, and dispatches results.
Provider (ScriptableObject)	Defines the backend — how and where inference happens (Cloud, Local, On-Device). Defines input/output structure.

By creating a new Provider, you can integrate:

A custom REST API or WebSocket inference service.

A new model type (for example, segmentation, diffusion, emotion recognition).

Core Interfaces

Every Provider implements one or more of the following interfaces:

Interface	Purpose
`IChatTask`	Defines text or multimodal LLM chat tasks.
`IObjectDetectionTask`	Handles object detection (boxes, labels, confidence).
`ISpeechToTextTask`	Converts audio input to transcribed text.
`ITextToSpeechTask`	Converts text into playable speech audio.

All interfaces expose a single async method (for example, RunTaskAsync() or RunChatAsync()) that accepts structured input and returns a strongly typed result.

Example — IChatTask Provider

Llama API Provider implementing IChatTask

The Llama API Provider demonstrates a full implementation of an IChatTask Provider for the Maverick 4 model — Meta’s latest multimodal Llama model capable of both text and image understanding.


    [CreateAssetMenu(menuName = "Meta/AI/Providers/Cloud/Llama API Provider")]
    public class LlamaApiProvider : AIProviderBase, IChatTask
    {

    [Header("Llama API Settings")]
    [SerializeField] private string endpointUrl = "https://api.llama.com/v1/chat/completions";
    [SerializeField] private string apiKey;
    [SerializeField] private string model = "llama-4-maverick";
    [SerializeField] private bool supportsVision = true;
    [SerializeField] private bool inlineRemoteImages = true;
    [SerializeField] private bool resolveRemoteRedirects = false;

    public override string ProviderName => "Llama API (Maverick 4)";

    public async Task<string> RunChatAsync(string prompt, CancellationToken token)
    {
        // Construct the Maverick 4 request payload
        var body = new
        {
            model = model,
            messages = new[]
            {
                new { role = "user", content = prompt }
            },
            temperature = 0.6f,
            max_tokens = 512,
            top_p = 0.9f,
            stream = false
        };

        var json = JsonUtility.ToJson(body);

        // Prepare headers for the API call
        var headers = new Dictionary<string, string>
        {
            { "Authorization", $"Bearer {apiKey}" }
        };

        // Send the request using the shared HttpTransport
        var response = await HttpTransport.PostJsonAsync(endpointUrl, json, token, headers);

        if (string.IsNullOrEmpty(response))
        {
            HandleError("Empty response from Llama API Maverick 4");
            return string.Empty;
        }

        // Parse the response and extract the message content
        var result = JsonUtility.FromJson<LlamaResponse>(response);
        return result?.choices != null && result.choices.Length > 0
            ? result.choices[0].message.content
            : string.Empty;
        }

        [System.Serializable]
        private class LlamaResponse
        {
            public Choice[] choices;
        }

        [System.Serializable]
        private class Choice
        {
            public Message message;
        }

        [System.Serializable]
        private class Message
        {
            public string role;
            public string content;
        }
    }

This provider supports text-only and multimodal prompts (text + image) when supportsVision is enabled. It can be assigned directly to any LlmAgent, which automatically manages the prompt flow and image packaging for multimodal tasks.

Using `AIProviderBase`

All Providers in the framework — including LlamaApiProvider — inherit from AIProviderBase, which provides:

Responsibility	Description
API Key Management	Stores and secures tokens as serialized fields.
Endpoint Configuration	Defines your API URL and model ID (for example, `llama-4-maverick`).
Network Handling	Uses `HttpTransport` for all JSON or binary POST requests with built-in retry and cancellation logic.
Error Handling	Centralized `HandleError()` and `OnError` events for reliable debugging and logging.

This shared base class ensures that all Providers behave consistently and integrate seamlessly with Unity’s async execution model.

Registering Your Provider

Your Provider is automatically detected by the AIBlocksInstallationRoutine. To make it appear in the setup wizard:

Add a CreateAssetMenu attribute: Meta/AI/Providers/<YourCategory>

Implement one of the known interfaces (IChatTask, IObjectDetectionTask, and so on).

Save the Provider as an asset in your project.

During installation, the system scans all available Provider assets, groups them by inference type, and displays them in the configuration dialog.

Integrating Custom Transports

The built-in HttpTransport handles standard HTTP workflows, but you can extend it for real-time streaming or gRPC/WebSocket backends.

Example Workflow

Create MyWebSocketTransport.cs.

Implement connection setup, message sending, and receive callbacks.

Use it internally within your Provider instead of HttpTransport.

This lets you build Providers that support live streaming (for example, token-by-token LLM output) or extremely low-latency local inference — without modifying any Agent code.

Extending Existing Agents

Sometimes it’s easier to extend an existing Agent rather than a Provider. Subclassing allows for custom prompts, pre/post-processing, or contextual awareness. This keeps your Provider unchanged while customizing runtime behavior.

public class ContextualLlmAgent : LlmAgent
{
    public string Context = "You are a helpful assistant.";

    public override async Task SendPrompt(string userInput)
    {
        var fullPrompt = Context + "\n" + userInput;
        await base.SendPrompt(fullPrompt);
    }
}

→ Next: Troubleshooting and FAQ

Did you find this page helpful?