vespa-engine · fzowl · Nov 12, 2025 · Nov 17, 2025 · Nov 17, 2025
diff --git a/en/embedding.html b/en/embedding.html
@@ -488,6 +488,150 @@ <h4 id="splade-ranking">SPLADE ranking</h4>
 </p>
 
 
+<h3 id="voyageai-embedder">VoyageAI Embedder</h3>
+
+<p>An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> embedding API
+to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service
+and does not require local model files or ONNX inference.</p>
+
+<pre>{% highlight xml %}
+<container version="1.0">
+    <component id="voyage" type="voyage-ai-embedder">
+        <model>voyage-3.5</model>
+        <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    </component>
+</container>
+{% endhighlight %}</pre>
+
+<ul>
+    <li>
+        The <code>model</code> specifies which VoyageAI model to use.
+        Available models include <code>voyage-3.5</code> (1024 dimensions, latest and best),
+        <code>voyage-3.5-lite</code> (512 dimensions, fastest),
+        <code>voyage-code-3</code> (optimized for code), and others.
+        See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI documentation</a> for the full list.
+    </li>
+    <li>
+        The <code>api-key-secret-ref</code> references a secret in Vespa's
+        <a href="/en/cloud/security/secret-store.html">secret store</a> containing your VoyageAI API key.
+        This is required for authentication.
+    </li>
+</ul>
+
+<p>Add your VoyageAI API key to the secret store:</p>
+<pre>
+vespa secret add voyage_api_key --value "pa-xxxxx..."
+</pre>
+
+<p>See the <a href="reference/embedding-reference.html#voyageai-embedder-reference-config">reference</a>
+for all configuration parameters including caching, retry logic, and performance tuning.</p>
+
+<h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
+<p>
+    VoyageAI offers several embedding models optimized for different use cases.
+    The resulting <a href="reference/tensor.html#tensor-type-spec">tensor type</a> can be <code>float</code> or
+    <code>bfloat16</code> for storage efficiency.
+</p>
+
+<p>Latest general-purpose models (recommended):</p>
+<ul>
+    <li><a href="https://docs.voyageai.com/docs/embeddings"><strong>voyage-3.5</strong></a> produces <code>tensor&lt;float&gt;(x[1024])</code> - <strong>latest and best quality</strong>, state-of-the-art for most applications</li>
+    <li><strong>voyage-3.5-lite</strong> produces <code>tensor&lt;float&gt;(x[512])</code> - <strong>newest lite model</strong>, excellent quality at lower cost and faster speed</li>
+</ul>
+
+<p>Previous generation general-purpose models:</p>
+<ul>
+    <li>voyage-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - high quality (use voyage-3.5 for best results)</li>
+    <li>voyage-3-lite produces <code>tensor&lt;float&gt;(x[512])</code> - cost-efficient (use voyage-3.5-lite for better performance)</li>
+</ul>
+
+<p>Specialized models:</p>
+<ul>
+    <li>voyage-code-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for code search and technical content</li>
+    <li>voyage-finance-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for financial documents</li>
+    <li>voyage-law-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for legal documents</li>
+    <li>voyage-multilingual-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - supports 100+ languages</li>
+</ul>
+
+<h4 id="voyageai-input-types">Input type detection</h4>
+<p>VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
+The embedder automatically detects the context and sets the appropriate input type:</p>
+<ul>
+    <li><strong>Query context</strong>: When embedding text in query requests via <code>embed()</code></li>
+    <li><strong>Document context</strong>: When embedding document fields during indexing</li>
+</ul>
+
+<p>You can disable auto-detection and set a fixed input type:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <auto-detect-input-type>false</auto-detect-input-type>
+    <default-input-type>query</default-input-type>
+</component>
+{% endhighlight %}</pre>
+
+<h4 id="voyageai-performance">VoyageAI performance features</h4>
+<p>The VoyageAI embedder includes several performance optimizations:</p>
+<ul>
+    <li><strong>Caching</strong>: Automatically caches recent embeddings to reduce API calls.</li>
+    <li><strong>Connection pooling</strong>: Reuses HTTP connections for efficiency. Configure with <code>max-idle-connections</code> (default: 5).</li>
+    <li><strong>Retry logic</strong>: Automatically retries on rate limits and transient errors with exponential backoff. Configure with <code>max-retries</code> (default: 10).</li>
+    <li><strong>Normalization</strong>: Optional L2 normalization for cosine similarity. Enable with <code>normalize</code> set to <code>true</code>.</li>
+</ul>
+
+<p>Example with performance tuning:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <max-idle-connections>20</max-idle-connections>
+    <normalize>true</normalize>
+</component>
+{% endhighlight %}</pre>
+
+<h4 id="voyageai-usage-example">Usage example</h4>
+<p>Complete example showing document indexing and query-time embedding:</p>
+
+<p><strong>Schema definition</strong>:</p>
+<pre>
+schema doc {
+    document doc {
+        field text type string {
+            indexing: summary | index
+        }
+    }
+
+    field embedding type tensor&lt;float&gt;(x[1024]) {
+        indexing: input text | embed voyage | attribute | index
+        attribute {
+            distance-metric: angular
+        }
+    }
+
+    rank-profile semantic {
+        inputs {
+            query(q) tensor&lt;float&gt;(x[1024])
+        }
+        first-phase {
+            expression: closeness(field, embedding)
+        }
+    }
+}
+</pre>
+
+<p><strong>Query with embedding</strong>:</p>
+<pre>{% highlight bash %}
+vespa query \
+  'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
+  'input.query(q)=embed(voyage, "machine learning tutorials")'
+{% endhighlight %}</pre>
+
+<p>When using <code>normalize</code> set to <code>true</code>, use
+<a href="reference/schema-reference.html#prenormalized-angular">distance-metric: prenormalized-angular</a>
+for more efficient similarity computation.</p>
+
+
 <h2 id="embedder-performance">Embedder performance</h2>
 
 <p>Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</p>

diff --git a/en/reference/embedding-reference.html b/en/reference/embedding-reference.html
@@ -476,6 +476,197 @@ <h3 id="splade-embedder-reference-config">splade embedder reference config</h3>
 
 
 
+<h2 id="voyageai-embedder">VoyageAI Embedder</h2>
+<p>
+  An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> API
+  to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference.
+  It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search.
+</p>
+<p>
+  The VoyageAI embedder is configured in <a href="services.html">services.xml</a>,
+  within the <code>container</code> tag:
+</p>
+<pre>{% highlight xml %}
+<container id="default" version="1.0">
+    <component id="voyage" type="voyage-ai-embedder">
+        <model>voyage-3.5</model>
+        <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    </component>
+</container>
+{% endhighlight %}</pre>
+
+<h3 id="voyageai-secret-management">Secret Management</h3>
+<p>
+  The VoyageAI API key must be stored in Vespa's
+  <a href="/en/cloud/security/secret-store.html">secret store</a> for secure management:
+</p>
+<pre>
+vespa secret add voyage_api_key --value "pa-xxxxx..."
+</pre>
+<p>
+  The <code>api-key-secret-ref</code> parameter references the secret name.
+  Secrets are automatically refreshed when rotated without requiring application restart.
+</p>
+
+<h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</h3>
+<table class="table">
+  <thead>
+    <tr>
+      <th>Name</th>
+      <th>Occurrence</th>
+      <th>Description</th>
+      <th>Type</th>
+      <th>Default</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>api-key-secret-ref</td>
+      <td>One</td>
+      <td><strong>Required</strong>. Reference to the secret in Vespa's secret store containing the VoyageAI API key.</td>
+      <td>string</td>
+      <td>N/A</td>
+    </tr>
+    <tr>
+      <td>model</td>
+      <td>One</td>
+      <td>The VoyageAI model to use. Available models:
+        <ul style="margin: 5px 0;">
+          <li><strong><code>voyage-3.5</code></strong> (1024 dims) - <strong>Latest and best quality</strong>, state-of-the-art (recommended)</li>
+          <li><strong><code>voyage-3.5-lite</code></strong> (512 dims) - <strong>Newest lite model</strong>, excellent quality at lower cost</li>
+          <li><code>voyage-3</code> (1024 dims) - Previous generation, high quality</li>
+          <li><code>voyage-3-lite</code> (512 dims) - Previous generation, cost-efficient</li>
+          <li><code>voyage-code-3</code> (1024 dims) - Code search optimization</li>
+          <li><code>voyage-finance-2</code> (1024 dims) - Financial documents</li>
+          <li><code>voyage-law-2</code> (1024 dims) - Legal documents</li>
+          <li><code>voyage-multilingual-2</code> (1024 dims) - Multilingual support</li>
+        </ul>
+      </td>
+      <td>string</td>
+      <td>voyage-3.5</td>
+    </tr>
+    <tr>
+      <td>endpoint</td>
+      <td>One</td>
+      <td>VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.</td>
+      <td>string</td>
+      <td>https://api.voyageai.com/v1/embeddings</td>
+    </tr>
+    <tr>
+      <td>timeout</td>
+      <td>One</td>
+      <td>Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.</td>
+      <td>numeric</td>
+      <td>30000</td>
+    </tr>
+    <tr>
+      <td>max-retries</td>
+      <td>One</td>
+      <td>Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.</td>
+      <td>numeric</td>
+      <td>10</td>
+    </tr>
+    <tr>
+      <td>default-input-type</td>
+      <td>One</td>
+      <td>Default input type when auto-detection is disabled. Valid values: <code>query</code> or <code>document</code>. VoyageAI models use different optimizations for queries vs documents.</td>
+      <td>enum</td>
+      <td>document</td>
+    </tr>
+    <tr>
+      <td>auto-detect-input-type</td>
+      <td>One</td>
+      <td>Whether to automatically detect input type based on context. When enabled, uses <code>query</code> type for query-time embeddings and <code>document</code> type for indexing.</td>
+      <td>boolean</td>
+      <td>true</td>
+    </tr>
+    <tr>
+      <td>normalize</td>
+      <td>One</td>
+      <td>Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with <code>prenormalized-angular</code> <a href="schema-reference.html#distance-metric">distance-metric</a> for efficient similarity computation.</td>
+      <td>boolean</td>
+      <td>false</td>
+    </tr>
+    <tr>
+      <td>truncate</td>
+      <td>One</td>
+      <td>Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.</td>
+      <td>boolean</td>
+      <td>true</td>
+    </tr>
+    <tr>
+      <td>max-idle-connections</td>
+      <td>One</td>
+      <td>Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.</td>
+      <td>numeric</td>
+      <td>5</td>
+    </tr>
+  </tbody>
+</table>
+
+<h3 id="voyageai-example-configurations">Example Configurations</h3>
+
+<p><strong>Basic configuration (recommended)</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>High-performance configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <max-idle-connections>20</max-idle-connections>
+    <timeout>60000</timeout>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>Fast and cost-efficient configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage-lite" type="voyage-ai-embedder">
+    <model>voyage-3.5-lite</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>Query-optimized configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="voyage-query" type="voyage-ai-embedder">
+    <model>voyage-3.5</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <default-input-type>query</default-input-type>
+    <auto-detect-input-type>false</auto-detect-input-type>
+    <normalize>true</normalize>
+</component>
+{% endhighlight %}</pre>
+
+<p><strong>Code search configuration</strong>:</p>
+<pre>{% highlight xml %}
+<component id="code-embedder" type="voyage-ai-embedder">
+    <model>voyage-code-3</model>
+    <api-key-secret-ref>voyage_api_key</api-key-secret-ref>
+    <normalize>true</normalize>
+</component>
+{% endhighlight %}</pre>
+
+<h3 id="voyageai-cost-optimization">Cost and Performance Optimization</h3>
+<p>The VoyageAI embedder includes several features to reduce API costs and improve performance:</p>
+<ul>
+  <li><strong>Caching</strong>: The embedder automatically caches recent embeddings to prevent duplicate API calls for the same text. This is particularly effective for repeated queries or documents.</li>
+  <li><strong>Connection pooling</strong>: HTTP connections are reused to reduce connection overhead and improve throughput. Configure with <code>max-idle-connections</code> (default: 5).</li>
+  <li><strong>Retry logic</strong>: Automatic retries with exponential backoff handle rate limits and transient errors, bounded by the global timeout. Configure with <code>max-retries</code> (default: 10).</li>
+  <li><strong>Model selection</strong>: Use <code>voyage-3.5-lite</code> for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use <code>voyage-3.5</code>.</li>
+</ul>
+
+<p>For detailed performance monitoring, the embedder emits standard Vespa embedder metrics
+  (see <a href="container-metrics-reference.html">Container Metrics</a>).
+  Monitor API usage and costs through the <a href="https://www.voyageai.com/">VoyageAI dashboard</a>.</p>
+
+
+
 <h2 id="huggingface-tokenizer-embedder">Huggingface tokenizer embedder</h2>
   <p>
     The Huggingface tokenizer embedder is configured in <a href="services.html">services.xml</a>,