This guide helps you diagnose and resolve common issues when integrating AI Building Blocks — including setup errors, on-device performance bottlenecks, provider misconfiguration, and streaming latency.

Before You Start

Most issues stem from missing credentials, wrong model IDs, or incorrect endpoints. Always verify your API keys, endpoint URLs, and Unity console logs first.

Common Setup Issues

Missing API keys or invalid credentials. HTTP 401/403.

Symptoms: HTTP 401 / 403 errors or empty responses.

Verify your API key in the Provider Inspector.

Ensure it’s valid for the selected endpoint (Hugging Face router vs. hosted model).

For HuggingFace make sure to check if the inference runs through the HuggingFace endpoint or the one of the inference provider.

Check your provider's API key settings for the correct scopes (for example, read/write access).

Some models (for example, gpt-4o) require a premium subscription or API credits.

Model not responding or timing out. HTTP 429.

Symptoms: Unity freezing, empty responses, or HTTP 429 errors (rate-limiting).

Check your Internet connection or local server availability.

Increase timeout in HttpTransport.PostJsonAsync() if needed.

Avoid sending multiple concurrent HTTP requests to the same model endpoint.

Inspect Unity Console logs for HTTP or JSON parsing exceptions.

Wrong endpoint or model ID. HTTP 404.

Symptoms: 404 or “Model not found.”

Copy endpoints directly from the provider’s example curl commands. Go to the deployment tab and copy the name of the exact endpoint and model name. Also not which API key you are supposed to use for that endpoint. All this information can be found in the curl command.

Ensure router and hosted model endpoints are not mixed up.

Confirm your API key has access to the specific model under your organization.

Unity Inference Engine

On-device model fails to load

Symptoms:.onnx or .sentis models fail to initialize or return null tensors.

Check tensor input/output shapes against your expected data.

Look for Unity Inference Engine console errors such as Unsupported operator.

If in doubt, place the model under Resources/ or StreamingAssets/.

First inference causes stutter or freeze

Symptoms: Frame spike during the first AI model run.

Run a warm-up inference at application startup.

Enable Split Over Frames and adjust Layers Per Frame in your Inference Engine Provider.

Preload models during splash or loading screens to allocate buffers early.

General performance recommendations

Reuse IWorker instances instead of recreating them per frame.

Pre-allocate and reuse tensors instead of reallocating.

Quantize large models to FP16 or INT8 to reduce GPU load.

Batch smaller inputs when possible.

Profile frequently using Unity Profiler and Quest Developer Hub.

Object Detection optimization

Reduce input texture resolution.

Move NMS (Non-Max Suppression) to GPU using GpuNMS and NMSCompute.compute.

Adjust Layers Per Frame for stable frame rate.

Frequently Asked Questions

Can I use multiple Providers at once?

Yes. Assign different Providers to different Agents — for example, run Object Detection locally while using a cloud LLM for text generation.

How can I reduce API cost or token usage?

Use local models (Ollama, Unity Inference Engine) during development and switch to cloud Providers for production builds. Make sure no inference is running every frame unless absolutely necessary. Do not ship any API credentials in a production build or GitHub repo.

Can I use fine-tuned models?

Yes. Export your fine-tuned models to ONNX or serve them via HTTP, then connect through a custom Provider implementing the correct interface.

Why doesn’t my Provider appear in the setup wizard?

Ensure your Provider implements at least one task interface (IChatTask, IObjectDetectionTask, and so on) and defines a valid CreateAssetMenu path.

Can I use the Meta Quest microphone and camera directly?

Yes. SpeechToTextAgent accesses Unity’s Microphone API, and ObjectDetectionAgent supports both WebCamTexture and PassthroughCameraAccess.

What’s the difference between ONNX and Sentis models?

ONNX: Generic open format, slower to load.

Sentis (.sentis): Precompiled Unity format — optimized for fast startup and low memory usage.

→ Next: Back to Overview

Did you find this page helpful?