Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions en/embedding.html
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,150 @@ <h4 id="splade-ranking">SPLADE ranking</h4>
</p>


<h3 id="voyageai-embedder">VoyageAI Embedder</h3>

<p>An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> embedding API
to generate high-quality embeddings for semantic search. This embedder calls the VoyageAI API service
and does not require local model files or ONNX inference.</p>

<pre>{% highlight xml %}
<container version="1.0">
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
</component>
</container>
{% endhighlight %}</pre>

<ul>
<li>
The <code>model</code> specifies which VoyageAI model to use.
Available models include <code>voyage-3.5</code> (1024 dimensions, latest and best),
<code>voyage-3.5-lite</code> (512 dimensions, fastest),
<code>voyage-code-3</code> (optimized for code), and others.
See the <a href="https://docs.voyageai.com/docs/embeddings">VoyageAI documentation</a> for the full list.
</li>
<li>
The <code>api-key-secret-ref</code> references a secret in Vespa's
<a href="/en/cloud/security/secret-store.html">secret store</a> containing your VoyageAI API key.
This is required for authentication.
</li>
</ul>

<p>Add your VoyageAI API key to the secret store:</p>
<pre>
vespa secret add voyage_api_key --value "pa-xxxxx..."
</pre>

<p>See the <a href="reference/embedding-reference.html#voyageai-embedder-reference-config">reference</a>
for all configuration parameters including caching, retry logic, and performance tuning.</p>

<h4 id="voyageai-embedder-models">VoyageAI embedder models</h4>
<p>
VoyageAI offers several embedding models optimized for different use cases.
The resulting <a href="reference/tensor.html#tensor-type-spec">tensor type</a> can be <code>float</code> or
<code>bfloat16</code> for storage efficiency.
</p>

<p>Latest general-purpose models (recommended):</p>
<ul>
<li><a href="https://docs.voyageai.com/docs/embeddings"><strong>voyage-3.5</strong></a> produces <code>tensor&lt;float&gt;(x[1024])</code> - <strong>latest and best quality</strong>, state-of-the-art for most applications</li>
<li><strong>voyage-3.5-lite</strong> produces <code>tensor&lt;float&gt;(x[512])</code> - <strong>newest lite model</strong>, excellent quality at lower cost and faster speed</li>
</ul>

<p>Previous generation general-purpose models:</p>
<ul>
<li>voyage-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - high quality (use voyage-3.5 for best results)</li>
<li>voyage-3-lite produces <code>tensor&lt;float&gt;(x[512])</code> - cost-efficient (use voyage-3.5-lite for better performance)</li>
</ul>

<p>Specialized models:</p>
<ul>
<li>voyage-code-3 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for code search and technical content</li>
<li>voyage-finance-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for financial documents</li>
<li>voyage-law-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - optimized for legal documents</li>
<li>voyage-multilingual-2 produces <code>tensor&lt;float&gt;(x[1024])</code> - supports 100+ languages</li>
</ul>

<h4 id="voyageai-input-types">Input type detection</h4>
<p>VoyageAI models distinguish between query and document embeddings for improved retrieval quality.
The embedder automatically detects the context and sets the appropriate input type:</p>
<ul>
<li><strong>Query context</strong>: When embedding text in query requests via <code>embed()</code></li>
<li><strong>Document context</strong>: When embedding document fields during indexing</li>
</ul>

<p>You can disable auto-detection and set a fixed input type:</p>
<pre>{% highlight xml %}
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<auto-detect-input-type>false</auto-detect-input-type>
<default-input-type>query</default-input-type>
</component>
{% endhighlight %}</pre>

<h4 id="voyageai-performance">VoyageAI performance features</h4>
<p>The VoyageAI embedder includes several performance optimizations:</p>
<ul>
<li><strong>Caching</strong>: Automatically caches recent embeddings to reduce API calls.</li>
<li><strong>Connection pooling</strong>: Reuses HTTP connections for efficiency. Configure with <code>max-idle-connections</code> (default: 5).</li>
<li><strong>Retry logic</strong>: Automatically retries on rate limits and transient errors with exponential backoff. Configure with <code>max-retries</code> (default: 10).</li>
<li><strong>Normalization</strong>: Optional L2 normalization for cosine similarity. Enable with <code>normalize</code> set to <code>true</code>.</li>
</ul>

<p>Example with performance tuning:</p>
<pre>{% highlight xml %}
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<max-idle-connections>20</max-idle-connections>
<normalize>true</normalize>
</component>
{% endhighlight %}</pre>

<h4 id="voyageai-usage-example">Usage example</h4>
<p>Complete example showing document indexing and query-time embedding:</p>

<p><strong>Schema definition</strong>:</p>
<pre>
schema doc {
document doc {
field text type string {
indexing: summary | index
}
}

field embedding type tensor&lt;float&gt;(x[1024]) {
indexing: input text | embed voyage | attribute | index
attribute {
distance-metric: angular
}
}

rank-profile semantic {
inputs {
query(q) tensor&lt;float&gt;(x[1024])
}
first-phase {
expression: closeness(field, embedding)
}
}
}
</pre>

<p><strong>Query with embedding</strong>:</p>
<pre>{% highlight bash %}
vespa query \
'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,q)' \
'input.query(q)=embed(voyage, "machine learning tutorials")'
{% endhighlight %}</pre>

<p>When using <code>normalize</code> set to <code>true</code>, use
<a href="reference/schema-reference.html#prenormalized-angular">distance-metric: prenormalized-angular</a>
for more efficient similarity computation.</p>


<h2 id="embedder-performance">Embedder performance</h2>

<p>Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</p>
Expand Down
191 changes: 191 additions & 0 deletions en/reference/embedding-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,197 @@ <h3 id="splade-embedder-reference-config">splade embedder reference config</h3>



<h2 id="voyageai-embedder">VoyageAI Embedder</h2>
<p>
An embedder that uses the <a href="https://www.voyageai.com/">VoyageAI</a> API
to generate embeddings. This is an API-based embedder that does not require local model files or ONNX inference.
It calls the VoyageAI service to generate high-quality embeddings optimized for semantic search.
</p>
<p>
The VoyageAI embedder is configured in <a href="services.html">services.xml</a>,
within the <code>container</code> tag:
</p>
<pre>{% highlight xml %}
<container id="default" version="1.0">
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
</component>
</container>
{% endhighlight %}</pre>

<h3 id="voyageai-secret-management">Secret Management</h3>
<p>
The VoyageAI API key must be stored in Vespa's
<a href="/en/cloud/security/secret-store.html">secret store</a> for secure management:
</p>
<pre>
vespa secret add voyage_api_key --value "pa-xxxxx..."
</pre>
<p>
The <code>api-key-secret-ref</code> parameter references the secret name.
Secrets are automatically refreshed when rotated without requiring application restart.
</p>

<h3 id="voyageai-embedder-reference-config">VoyageAI embedder reference config</h3>
<table class="table">
<thead>
<tr>
<th>Name</th>
<th>Occurrence</th>
<th>Description</th>
<th>Type</th>
<th>Default</th>
</tr>
</thead>
<tbody>
<tr>
<td>api-key-secret-ref</td>
<td>One</td>
<td><strong>Required</strong>. Reference to the secret in Vespa's secret store containing the VoyageAI API key.</td>
<td>string</td>
<td>N/A</td>
</tr>
<tr>
<td>model</td>
<td>One</td>
<td>The VoyageAI model to use. Available models:
<ul style="margin: 5px 0;">
<li><strong><code>voyage-3.5</code></strong> (1024 dims) - <strong>Latest and best quality</strong>, state-of-the-art (recommended)</li>
<li><strong><code>voyage-3.5-lite</code></strong> (512 dims) - <strong>Newest lite model</strong>, excellent quality at lower cost</li>
<li><code>voyage-3</code> (1024 dims) - Previous generation, high quality</li>
<li><code>voyage-3-lite</code> (512 dims) - Previous generation, cost-efficient</li>
<li><code>voyage-code-3</code> (1024 dims) - Code search optimization</li>
<li><code>voyage-finance-2</code> (1024 dims) - Financial documents</li>
<li><code>voyage-law-2</code> (1024 dims) - Legal documents</li>
<li><code>voyage-multilingual-2</code> (1024 dims) - Multilingual support</li>
</ul>
</td>
<td>string</td>
<td>voyage-3.5</td>
</tr>
<tr>
<td>endpoint</td>
<td>One</td>
<td>VoyageAI API endpoint URL. Can be overridden for custom proxies or regional endpoints.</td>
<td>string</td>
<td>https://api.voyageai.com/v1/embeddings</td>
</tr>
<tr>
<td>timeout</td>
<td>One</td>
<td>Request timeout in milliseconds. Also serves as the bound for retry attempts - retries stop when total elapsed time would exceed this timeout. Minimum value: 1000ms.</td>
<td>numeric</td>
<td>30000</td>
</tr>
<tr>
<td>max-retries</td>
<td>One</td>
<td>Maximum number of retry attempts for failed requests. Used as a safety limit in addition to the timeout-based retry bound.</td>
<td>numeric</td>
<td>10</td>
</tr>
<tr>
<td>default-input-type</td>
<td>One</td>
<td>Default input type when auto-detection is disabled. Valid values: <code>query</code> or <code>document</code>. VoyageAI models use different optimizations for queries vs documents.</td>
<td>enum</td>
<td>document</td>
</tr>
<tr>
<td>auto-detect-input-type</td>
<td>One</td>
<td>Whether to automatically detect input type based on context. When enabled, uses <code>query</code> type for query-time embeddings and <code>document</code> type for indexing.</td>
<td>boolean</td>
<td>true</td>
</tr>
<tr>
<td>normalize</td>
<td>One</td>
<td>Whether to apply L2 normalization to embeddings. When enabled, all embedding vectors are normalized to unit length. Use with <code>prenormalized-angular</code> <a href="schema-reference.html#distance-metric">distance-metric</a> for efficient similarity computation.</td>
<td>boolean</td>
<td>false</td>
</tr>
<tr>
<td>truncate</td>
<td>One</td>
<td>Whether to truncate input text exceeding model limits. When enabled, text is automatically truncated. When disabled, requests with too-long text will fail.</td>
<td>boolean</td>
<td>true</td>
</tr>
<tr>
<td>max-idle-connections</td>
<td>One</td>
<td>Maximum number of idle HTTP connections to keep in the connection pool. Higher values improve throughput for concurrent requests but use more resources.</td>
<td>numeric</td>
<td>5</td>
</tr>
</tbody>
</table>

<h3 id="voyageai-example-configurations">Example Configurations</h3>

<p><strong>Basic configuration (recommended)</strong>:</p>
<pre>{% highlight xml %}
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
</component>
{% endhighlight %}</pre>

<p><strong>High-performance configuration</strong>:</p>
<pre>{% highlight xml %}
<component id="voyage" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<max-idle-connections>20</max-idle-connections>
<timeout>60000</timeout>
</component>
{% endhighlight %}</pre>

<p><strong>Fast and cost-efficient configuration</strong>:</p>
<pre>{% highlight xml %}
<component id="voyage-lite" type="voyage-ai-embedder">
<model>voyage-3.5-lite</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
</component>
{% endhighlight %}</pre>

<p><strong>Query-optimized configuration</strong>:</p>
<pre>{% highlight xml %}
<component id="voyage-query" type="voyage-ai-embedder">
<model>voyage-3.5</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<default-input-type>query</default-input-type>
<auto-detect-input-type>false</auto-detect-input-type>
<normalize>true</normalize>
</component>
{% endhighlight %}</pre>

<p><strong>Code search configuration</strong>:</p>
<pre>{% highlight xml %}
<component id="code-embedder" type="voyage-ai-embedder">
<model>voyage-code-3</model>
<api-key-secret-ref>voyage_api_key</api-key-secret-ref>
<normalize>true</normalize>
</component>
{% endhighlight %}</pre>

<h3 id="voyageai-cost-optimization">Cost and Performance Optimization</h3>
<p>The VoyageAI embedder includes several features to reduce API costs and improve performance:</p>
<ul>
<li><strong>Caching</strong>: The embedder automatically caches recent embeddings to prevent duplicate API calls for the same text. This is particularly effective for repeated queries or documents.</li>
<li><strong>Connection pooling</strong>: HTTP connections are reused to reduce connection overhead and improve throughput. Configure with <code>max-idle-connections</code> (default: 5).</li>
<li><strong>Retry logic</strong>: Automatic retries with exponential backoff handle rate limits and transient errors, bounded by the global timeout. Configure with <code>max-retries</code> (default: 10).</li>
<li><strong>Model selection</strong>: Use <code>voyage-3.5-lite</code> for cost-sensitive applications (512 dimensions vs 1024 dimensions reduces costs while maintaining excellent quality). For best quality, use <code>voyage-3.5</code>.</li>
</ul>

<p>For detailed performance monitoring, the embedder emits standard Vespa embedder metrics
(see <a href="container-metrics-reference.html">Container Metrics</a>).
Monitor API usage and costs through the <a href="https://www.voyageai.com/">VoyageAI dashboard</a>.</p>



<h2 id="huggingface-tokenizer-embedder">Huggingface tokenizer embedder</h2>
<p>
The Huggingface tokenizer embedder is configured in <a href="services.html">services.xml</a>,
Expand Down