<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AI related on Apache Dubbo</title><link>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/</link><description>Recent content in AI related on Apache Dubbo</description><generator>Hugo</generator><language>en</language><atom:link href="https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/index.xml" rel="self" type="application/rss+xml"/><item><title>configure the MCP</title><link>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/mcp/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/mcp/</guid><description>&lt;h2 id="mcp-model-context-protocol-gateway-configuration">MCP (Model Context Protocol) Gateway Configuration&lt;/h2>
&lt;p>This document explains how to configure the MCP (Model Context Protocol) filters within your gateway, enabling you to securely expose backend HTTP APIs as callable &amp;ldquo;tools&amp;rdquo; for AI Agents.&lt;/p>
&lt;h3 id="introduction">Introduction&lt;/h3>
&lt;p>The Model Context Protocol (MCP) serves as an intelligent bridge between AI Agents and your existing backend services. It dynamically translates a simple, unified protocol into standard HTTP requests, allowing agents to interact with your APIs as if they were native functions or tools. This approach simplifies agent development and provides a centralized point for security, control, and observability.&lt;/p></description></item><item><title>configure upstream endpoints</title><link>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/endpoint/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/endpoint/</guid><description>&lt;h2 id="llm-gateway-endpoint-configuration">LLM Gateway Endpoint Configuration&lt;/h2>
&lt;p>This document explains how to configure upstream endpoints for Large Language Models (LLMs) within your gateway&amp;rsquo;s routing configuration.&lt;/p>
&lt;h3 id="endpoint-structure">Endpoint Structure&lt;/h3>
&lt;p>Each endpoint within a cluster is defined by an id and can contain an llm_meta block for custom behavior.clusters:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#268bd2">clusters&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#268bd2">name&lt;/span>: &lt;span style="color:#2aa198">&amp;#34;my_llm_cluster&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">endpoints&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#268bd2">id&lt;/span>: &lt;span style="color:#2aa198">&amp;#34;provider-1-main&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">socket_address&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">domains&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - api.deepseek.com
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">llm_meta&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#586e75"># ... other LLM-specific configuration goes here ...&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#268bd2">id&lt;/span>: &lt;span style="color:#2aa198">&amp;#34;provider-2-fallback&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">socket_address&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">domains&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - api.openai.com/v1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#268bd2">llm_meta&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#586e75"># ... other LLM-specific configuration goes here ...&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="llm_meta-configuration-fields">&lt;code>llm_meta&lt;/code> Configuration Fields&lt;/h3>
&lt;p>The llm_meta block holds all the configuration specific to how the gateway should treat this LLM endpoint.&lt;/p></description></item><item><title>KVCache offload</title><link>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/kvcache/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/kvcache/</guid><description>&lt;h2 id="ai-kvcache-filter-configuration">AI KVCache Filter Configuration&lt;/h2>
&lt;p>This document explains how to configure and use the &lt;code>dgp.filter.ai.kvcache&lt;/code> filter in Dubbo-go-Pixiu.&lt;/p>
&lt;p>The filter integrates with vLLM (&lt;code>/tokenize&lt;/code>) and LMCache controller APIs (&lt;code>/lookup&lt;/code>, &lt;code>/pin&lt;/code>, &lt;code>/compress&lt;/code>, &lt;code>/evict&lt;/code>) to:&lt;/p>
&lt;ul>
&lt;li>provide cache-aware routing hints&lt;/li>
&lt;li>trigger cache-management actions asynchronously&lt;/li>
&lt;li>keep the main request path non-blocking&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h3 id="architecture-and-request-flow">Architecture and Request Flow&lt;/h3>
&lt;p>&lt;code>dgp.filter.ai.kvcache&lt;/code> is an HTTP decode filter. A typical request flow is:&lt;/p>
&lt;ol>
&lt;li>Parse request body and extract &lt;code>model&lt;/code> and &lt;code>prompt&lt;/code> (or fallback from &lt;code>messages&lt;/code>).&lt;/li>
&lt;li>Record local hotness statistics (&lt;code>model + prompt&lt;/code>) in the token manager.&lt;/li>
&lt;li>Try cache-aware routing:
&lt;ul>
&lt;li>read token cache for prompt&lt;/li>
&lt;li>call LMCache &lt;code>/lookup&lt;/code>&lt;/li>
&lt;li>set a preferred endpoint hint in context (&lt;code>llm_preferred_endpoint_id&lt;/code>)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Start an async cache-management goroutine (best-effort):
&lt;ul>
&lt;li>call vLLM &lt;code>/tokenize&lt;/code>&lt;/li>
&lt;li>call LMCache &lt;code>/lookup&lt;/code> if needed&lt;/li>
&lt;li>execute strategy decisions (&lt;code>compress&lt;/code> / &lt;code>pin&lt;/code> / &lt;code>evict&lt;/code>)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Continue the filter chain immediately (main request is not blocked by cache management).&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h3 id="routing-contract-important">Routing Contract (Important)&lt;/h3>
&lt;p>Current cache-aware routing uses &lt;strong>instance id matching&lt;/strong>:&lt;/p></description></item><item><title>register service</title><link>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/registry/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://deploy-preview-3202--dubbo.netlify.app/en/overview/reference/pixiu/ai/registry/</guid><description>&lt;h2 id="llm-service-discovery-and-registration">LLM Service Discovery and Registration&lt;/h2>
&lt;p>This document aims to guide LLM service providers on how to dynamically register their service instances with the LLM Gateway via a Nacos registry. By following these guidelines, the gateway will be able to automatically discover your service and apply appropriate routing, retry, and fallback strategies based on the metadata you provide.&lt;/p>
&lt;h3 id="registration-mechanism-overview">Registration Mechanism Overview&lt;/h3>
&lt;p>The core mechanism of service discovery is that your LLM service registers as a &lt;strong>Nacos instance&lt;/strong> and provides a specific set of &lt;strong>metadata&lt;/strong> upon registration. The LLM Gateway listens for service changes in Nacos, reads this metadata, and dynamically converts it into a fully functional gateway &lt;code>endpoint&lt;/code> configuration.&lt;/p></description></item></channel></rss>