-
Notifications
You must be signed in to change notification settings - Fork 14k
SimpleMCP - server/publc_simplechat tiny client updated with reasoning, vision, builtin clientside tool calls, markdown and mcpish toolcall support [WIP] #17853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Expose pdf2text tool call to ai server and handshake with simple proxy for the same.
Make the description bit more explicit with it supporting local file paths as part of the url scheme, as the tested ai model was cribbing about not supporting file url scheme. Need to check if this new description will make things better. Convert the text to bytes for writing to the http pipe. Ensure CORS is kept happy by passing AccessControlAllowOrigin in header.
Allow user to limit the max amount of result data returned to ai after a tool call. Inturn it is set by default to 2K. Update the pdf2text tool description to try make the local file path support more explicit
Needed to tweak the description further for the ai model to be able to understand that its ok to pass file:// scheme based urls Had forgotten how big the web site pages have become as also the need for more ResultDataLength wrt one shot PDF read to get atleast some good enough amount of content in it with large pdfs
Half asleep as usual ;)
This makes the logic more generic, as well as prepares for additional parameters to be passed to the simpleproxy.py helper handshakes. Ex: Restrict extracted contents of a pdf to specified start and end page numbers or so.
As I was seeing the truncated message even for stripped plain text web acces, relooking at that initial go at truncating, revealed a oversight, which had the truncation logic trigger anytime the iResultMaxDataLength was greater than 0, irrespective of whether the actual result was smaller than the allowed limit or not, thus adding that truncated message to end of result unnecessarily. Have fixed that oversight Also recent any number of args based simpleprox handshake helper in toolweb seems to be working (atleast for the existing single arg based calls).
Initial go, need to review the code flow as well as test it out
This is a initial go wrt the new overall flow, should work, but need to cross check.
Copy validate_url and build initial skeleton
Check if the specified scheme is allowed or not. If allowed then call corresponding validator to check remaining part of the url is fine or not
Add --allowed.schemes config entry as a needed config. Setup the url validator. Use this wrt urltext, urlraw and pdf2text This allows user to control whether local file access is enabled or not. By default in the sample simpleproxy.json config file local file access is allowed.
Also trap any exceptions while handling and send exception info to the client requesting service
also move debug dump helper to its own module also remember to specify the Class name in quotes, similar to refering to a class within a member of th class wrt python type checking.
Its not necessary to request a page number range always. Take care of page number starting from 1 and underlying data having 0 as the starting index
Added logic to help get a file from either the local file system or from the web, based on the url specified. Update pdfmagic module to use the same, so that it can support both local as well as web based pdf. Bring in the debug module, which I had forgotten to commit, after moving debug helper code from simpleproxy.py to the debug module
This also indirectly adds support for local file system access through the web / fetch (ie urlraw and urltext) service request paths.
Make it a details block and update the content a bit
Usage Note * Cleanup / fix some wording. * Pick chat history handshaked len from config Ensure the settings info is uptodate wrt available tool names by chaining a reshowing with tools manager initialisation.
Rename path and tags/identifiers from Pdf2Text to PdfText Rename the function call to pdf_to_text, this should also help indicate semantic more unambiguously, just in case, especially for smaller models.
Chances are for ai models which dont support tool calling, things will be such that the tool calls meta data shared will be silently ignored without much issue. So enabling tool calling feature by default, so that in case one is using a ai model with tool calling the feature is readily available for use. Revert SlidingWindow ChatHistory in Context from last 10 to last 5 (2 more then origianl, given more context support in todays models) by default, given that now tool handshakes go through the tools related side channel in the http handshake and arent morphed into normal user-assistant channel of the handshake.
helps ensure only service paths that can be serviced are enabled Use same to check for pypdf wrt pdftext
Define a typealias for HttpHeaders and use it where ever needed. Inturn map this to email.message.Message and dict for now. If and when python evolves Http Headers type into better one, need to replace in only one place. Add a ToolManager class which * maintains the list of tool calls and inturn allows any given tool call to be executed and response returned along with needed meta data * generate the overall tool calls meta data * add ToolCallResponseEx which maintains full TCOutResponse for use by tc_handle callers Avoid duplicating handling of some of the basic needed http header entries. Move checking for any dependencies before enabling a tool call into respective tc??? module. * for now this also demotes the logic from the previous fine grained per tool call based dependency check to a more global dep check at the respective module level
Build the list of tool calls Trap some of the MCP post json based requests and map to related handlers. Inturn implement the tool call execution handler. Add some helper dataclasses wrt expected MCP response structure TOTHINK: For now maintain id has a string and not int, with idea to map it directly to callid wrt tool call handshake by ai model. TOCHECK: For now suffle the order of fields wrt jsonrpc and type wrt MCP response related structures, assuming the order shouldnt matter. Need to cross check.
Fix a oversight wrt ToolManager.meta, where I had created a dict of name-keyed toolcall metas, instead of a simple list of toolcall metas. Rather I blindly duplicated structure I used for storing the tool calls in the tc_switch in the anveshika sallap client side code. Add dataclasses to mimic the MCP tools/list response. However wrt the 2 odd differences between the MCP structure and OpenAi tools handshake structure, for now I have retained the OpenAi tools hs structure. Add a common helper send_mcp to ProxyHandler given that both mcp_toolscall and mcp_toolslist and even others like mcp_initialise in future require a common response mechanism. With above and bit more implement initial go at tools/list response.
Given that there could be other service paths beyond /mcp exposed in future, and given that it is not necessary that their post body contain json data, so move conversion to json to within mcp_run handler. While retaining reading of the body in the generic do_POST ensures that the read size limit is implicitly enforced, whether /mcp now or any other path in future.
By default bearer based auth check is done always whether in https or http mode. However by updating the sec.bAuthAlways config entry to false, the bearer auth check will be carried out only in https mode.
As expected dataclass field member mutable default values needing default_factory. Dont forget returning after sending error response. TypeAlias type hinting flow seems to go beyond TYPE_CHECKING. Also email.message.Message[str,str] not accepted, so keep things simple wrt HttpHeaders for now.
Also enforce need for kind of a sane Content-Length header entry in our case. NOTE: it does allow for 0 or other small content lengths, which isnt necessarily valid.
Given toolcall.py maintains ToolCall, ToolManager and MCP related types and base classes, so rename to toolcalls.py Also add the bash script with curl used for testing the tools/list mcp command. Remove the sample function meta ref, as tools/list is working ok.
Add logic to fetch tools/list from mcp server and pass it to tools manager.
Also fix some minor oversights wrt tools/list
Setup to test initial go of the mcpish server and client logics
Move the search web tool call also from previous js client + python simpleproxy based logic to the new simplemcp based logic, while following the same overall logic of reusing the HtmlText's equiv logic with now predefined and user non-replacable (at runtime) tagDrops and template urls
Update the documentation a bit wrt switch from simpleproxy to simplemcp with mcp-ish kind of handshake between the chat client and simplemcp. Rename proxyUrl and related to mcpServerUrl and mcpServerAuth. Now include the path in the url itself, given that in future, we may want to allow the chat client logic to handshake with other mcp servers, which may expose their services through a different path or so. Drop SearchEngine related config entries from chat session settings, given that its now controlled directly in SimpleMCP.
|
Hi @ggerganov I understand, you have a different view to myself wrt this, but I feel, you should rethink things once, given all the feature additions I have done, compared to last time you looked at it, and also given slightly different philosophy between this chat client and the default web ui one, some of which I have noted below. If you look at this PR, you will notice that this alternate client ui continues to use a pure html + css + js based flow (also avoiding dependence on external libraries in general) and now supports reasoning, vision, tool calling (with a bunch of useful built in client side based tool calls with no need for any additional setup, ++) and minimal mcp client capability. In turn all fitting within 50KB of compressed source code size (including the python simplemcp.py for web access and related tool calls). Also the logical ui elements have their own unique id/class, if one wants to theme. While the default web ui is around 1.2 MB or so compressed, needs one to understand svelte framework (in addition to html/css/js) and needs one to track the different bundled external modules. Also currently it doesnt support tool calling, and the plan is more towards server side / back end MCP based tool call support, if I understand correctly. Given the above significant differences, I feel it makes more sense to continue this updated lightweight alternate chat client + ui option within llama.cpp itself, parallel to the default webui. My embedded background also biases me towards simple yet flexible and functional options. Eitherway the final decision is up to you and the team of open source developers who work on llama.cpp proactively, rather than once in a bluemoon me, as to whether you would prefer to apply these into llama.cpp itself or not. Do let me know your thoughts. NOTE: When I revisited ai after almost a year++ wanting to explore some of the recent ai developments, I couldnt find any sensible zero or minimal setup based tool calling supported open source ai clients out there including the default bundled web ui, so also I started on this series of patches/PRs. |
|
@hanishkvc Appreciate the dedication, but I still think your client should be moved to a separate project. It's better to focus our efforts on the official WebUI as it is more feature-complete, secure and has wider developer adoption. |
Had forgotten to update docs wrt renamed --op.configFile arg Remove the unneeded space from details.md, which was triggering the editorconfig check at upstream github repo.
Hi @ggerganov Thanks for getting back. One suggestion and request I have is that the webui team should allow tool calling support to be added to web-ui. Then there is also the issue of whether tool calling / mcp would be supported through the back end ie through llama engine or llama server or to let it be supported through the chat client logic (like what SimpleChat+SimpleMCP does). There is use cases for both kind of flows, and based on webui team's previous comments it appeared like they are waiting for the backend to be updated wrt tool calling / mcp, that along with me wanting to experiment some aspects of it now and also wanting the flexibility of client side based tool calling is what led to this series of PRs. What is your thoughts on where to place tool calling / mcp handshake ie at the backend or at the chat client end or both? You mentioned about security, any specific reason why you feel llama-server+webui+(what ever mcp solution is finally employed) is more secure compared to llama-server+simplechat+simplemcp. Interested to understand your perspective there. I could be wrong, but chances are the architecture simplechat+simplemcp follows and the principle of minimal or no external dependencies along with https+bearer auth in turn along with option to place tool calling provider into a seperate vm or so, if needed, and inturn basics of most of these already included in this patch set, should ideally provide a fairly secure configurable environment / setup. Is there some aspect wrt security I have missed out? One place where maybe I am purposefully being bit contrarian wrt security is with using python's built in socket and http mechanism to build the https server logic, instead of any 3rd party logic for the same and or say go lang based excellent built in standard modules to provide the same logic or so, but given that over time the core python standard modules is what will be the most checked, tested and fixed option, along with it being a readily experimented interpreted runtime with source directly being used as is with minimal inbetween transforms etal is the reason for me picking it over a 3rd party module or build things around rust (the standard bundled module set and its management is not tightly coupled enough) or go. Also with external_ai tool call mechanism which I have included even ai could be used to validate a tool call before triggering the same, if needed in future or so. Interested to hear your thoughts. |
|
I think the tool calls should be done on the client - don't think there is a plan to do them on the server-side. The WebUI will likely soon add official support for tool calling and MCP. About security - I noticed that your client has server-side PDF parsing (if I understood this correctly) which I think is less secure than the client-side PDF package that we use in the WebUI. Overall, I'm not very familiar with Web programming best practices and can't strongly comment on which approach/framework is better. I can appreciate the minimalism of your implementation, but on the other hand, I consider it much healthier for the project to have a rapidly evolving WebUI without compromises. The Svelte WebUI and the team behind it so far are doing a great job, so IMO it's better to focus on that. |
With this and other PRs in this series, the alternate tiny tools/server/public_simplechat chat client has been updated to support
Using this client ui along with llama-server one can get the local ai to fetch and summarise latest news or get the latest research papers / details from arxiv / ... for a topic of interest and summarise the same or generate javascript code snippets and test them out or use it to validate mathematical statements ai might make and or answer queries around these or ... its up to you and the ai model ... ai model can even call into system prompt based self modified variants of itself, if it deems it necessary ...
Remember to cross check the tool calls before allowing their execution and similarly cross checking responses before submitting them to the ai model, just to be on safe side.
One can peek into the reasoning from ai models that support the same. And for ai models that support vision, one can send images to explore the same.
One could get going with (update the arguments as needed)
build/bin/llama-server -m ../llama.cpp.models/gpt-oss-20b-mxfp4.gguf --jinja --path tools/server/public_simplechat/ --ctx-size 64000 --n-gpu-layers 12 -fa on
NOTE: even default context size should be good enough for simple stuffs. Explicitly set the --ctx-size if working with many web site / pdf contents or so as needed.
If one needs the additional power/flexibility got with web search, web fetch and pdf related tool calls, then also run
cd tools/server/public_simplechat/local.tools; python3 ./simplemcp.py --op.configFile simplemcp.json
NOTE: Remember to edit simplemcp.json with the list of sites you want to allow access to, as well as to disable local file access, if needed.
Look into included readme.md, details.md and changelog.md for additional info. Prev PR in this series #17506
All features (except for pdf - uses pypdf) are implemented internally without depending on any external libraries/modules, so also avoiding any need to track multiple external dependencies, and inturn should fit within ~50KB of compressed size. This is created using pure html+css+js in general, with additionally python for simplemcp to bypass the cors++ restrictions in browser environment for direct web access.
NOTE: The MCP ish handshake between the chat client and simplemcp currently implemented after glancing through the architecture page in mcp standard's website and the sample json shown there along with some logical guess work. Need to look into the actual MCP standard later, if needed. Also noticed some minimal changes between the MCP handshake structures and the OpenAi rest api handshake structures, for now have followed the OpenAI related structures with those tiny differences even in the mcp handshake.