First install ollama before anything else. This tutorial requires it
Unreal Engine 5.8 Semantic Search with Ollama
This setup is for local, offline semantic search in UE 5.8 using Ollama as the model backend. Epic’s Semantic Search plugin is still marked Experimental and described as “very early work in progress,” so treat it as preview tech rather than a production-ready system.
What this pipeline does
Semantic Search is no longer plain keyword search. The plugin builds a vector index, then uses an embedding provider to store and compare asset meanings. Epic’s API docs say the module loads the saved index, starts its queue, and uses the currently registered embedding provider. Ollama provides the local OpenAI-compatible endpoints and embedding endpoints needed for this kind of workflow.
Recommended working model pair
For the captioning side, use Gemma 3 4B QAT: gemma3:4b-it-qat. Gemma 3 is multimodal, and Ollama’s Gemma 3 docs explicitly list QAT variants; they also note QAT keeps similar quality to BF16 with a lower memory footprint. That matters because the plugin’s captioning step is sensitive to speed and output consistency.
For embeddings, use Qwen3 Embedding 4B: qwen3-embedding:4b. Qwen’s official blog lists the 4B embedding model with an embedding dimension of 2560, and Ollama documents Qwen3 Embedding as a dedicated text embedding family for retrieval and semantic search.
Note: you can go for 12b or even higher parameter models if you have faster gpu and more vram but it has to process and respond to a request within less than 30 seconds so smaller parameter models are faster.
Install Ollama models
Pull the models first:
bash
ollama pull gemma3:4b-it-qat
ollama pull qwen3-embedding:4b
If your Ollama library uses a different exact tag, use the tag that appears in ollama list. The important part is the model family and tier, not guessing the name.
Unreal settings
In Project Settings → Semantic Search, set the fields like this:
Captioning
• Base URL: http://localhost:11434/v1
• API Key: ollama
• Model: gemma3:4b-it-qat
Embedding
• Base URL: http://localhost:11434/v1
• API Key: ollama
• Model: qwen3-embedding:4b
• Embedding Dimension: 2560
Ollama’s OpenAI compatibility docs show that the OpenAI-compatible client should point at http://localhost:11434/v1, and that the API key is required by the client but ignored by Ollama. The same docs show support for /v1/chat/completions, /v1/responses, /v1/models, and /v1/embeddings.
Verify Ollama before indexing in Unreal
Test the server directly first:
bash
curl http://localhost:11434/v1/models
Then test captioning:
bash
curl http://localhost:11434/v1/chat/completions \
-H “Content-Type: application/json” \
-d '{
"model": "gemma3:4b-it-qat",
"messages": \[
{ "role": "user", "content": "Describe this asset." }
\]
}’
Then test embeddings:
bash
curl http://localhost:11434/v1/embeddings \
-H “Content-Type: application/json” \
-d '{
"model": "qwen3-embedding:4b",
"input": "metro platform lighting"
}’
Ollama documents both the OpenAI-compatible chat endpoint and the embeddings endpoint, so if these calls work, Unreal should at least be able to reach the backend correctly.
Indexing workflow
1. Enable the Semantic Search plugin.
2. Restart Unreal.
3. Enter the Ollama endpoint and model names.
4. Set embedding dimension to 2560 for Qwen3 Embedding 4B.
5. Index a small folder first.
6. Once stable, index larger content libraries.
Because the feature is experimental, a small validation pass is the safest way to confirm the pipeline before indexing a full project.
What worked best in practice
In my testing, the most reliable combination was:
• Captioning:gemma3:4b-it-qat
• Embedding:qwen3-embedding:4b
• Embedding dimension:2560
That combination gave far fewer indexing failures than larger multimodal models, which lines up with Gemma 3 QAT’s lower memory footprint and the plugin’s sensitivity to response timing.
Troubleshooting
If Unreal says connection failed, first check that Ollama is answering http://localhost:11434/v1/models. If indexing starts but then fails, the usual culprits are an incorrect model tag, wrong embedding dimension, or a model that is too slow for the plugin’s request cycle. The Unreal plugin itself is still experimental, so some instability is normal.
I used AI to write this tutorial but it is based on my own research. So there could be problem just ask them if there are. I will correct them