-
Notifications
You must be signed in to change notification settings - Fork 14k
Server: router per model config #17859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Server: router per model config #17859
Conversation
Replace flat directory scan with recursive traversal using std::filesystem::recursive_directory_iterator. Support for nested vendor/model layouts (e.g. vendor/model/*.gguf). Model name now reflects the relative path within --models-dir instead of just the filename. Aggregate files by parent directory via std::map before constructing local_model
ngxson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks interesting, I will clean this up a bit and push a commit
tools/server/server-config.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be nice if we can move part of this file into common/preset.cpp|h, so it can be reused by other tools
tools/server/server-models.cpp
Outdated
| if (value == "false") { | ||
| continue; | ||
| } | ||
|
|
||
| if (value == "true" || value.empty()) { | ||
| child_env.push_back(key + "="); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leaving the original value for bool should be good? We can already handle these values using is_falsey / is_truthy in arg.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll simplify the bool handling to pass through the original values (=true/=false) and let is_truthy/is_falsey handle the conversion
|
Here is a line-oriented approach for the parser: static const auto ini_parser = build_peg_parser([](auto & p) {
// newline ::= "\r\n" / "\n" / "\r"
auto newline = p.rule("newline", p.literal("\r\n") | p.literal("\n") | p.literal("\r"));
// ws ::= [ \t]*
auto ws = p.rule("ws", p.chars("[ \t]", 0, -1));
// comment ::= [;#] (!newline .)*
auto comment = p.rule("comment", p.chars("[;#]", 1, 1) + p.zero_or_more(p.negate(newline) + p.any()));
// eol ::= ws comment? (newline / EOF)
auto eol = p.rule("eol", ws + p.optional(comment) + (newline | p.end()));
// ident ::= [a-zA-Z_] [a-zA-Z0-9_.-]*
auto ident = p.rule("ident", p.chars("[a-zA-Z_]", 1, 1) + p.chars("[a-zA-Z0-9_.-]", 0, -1));
// value ::= (!eol-start .)*
auto eol_start = p.rule("eol-start", ws + (p.chars("[;#]", 1, 1) | newline | p.end()));
auto value = p.rule("value", p.zero_or_more(p.negate(eol_start) + p.any()));
// header-line ::= "[" ws ident ws "]" eol
auto header_line = p.rule("header-line", "[" + ws + p.tag("section-name", p.chars("[^]]")) + ws + "]" + eol);
// kv-line ::= ident ws "=" ws value eol
auto kv_line = p.rule("kv-line", p.tag("key", ident) + ws + "=" + ws + p.tag("value", value) + eol);
// comment-line ::= ws comment (newline / EOF)
auto comment_line = p.rule("comment-line", ws + comment + (newline | p.end()));
// blank-line ::= ws (newline / EOF)
auto blank_line = p.rule("blank-line", ws + (newline | p.end()));
// line ::= header-line / kv-line / comment-line / blank-line
auto line = p.rule("line", header_line | kv_line | comment_line | blank_line);
// ini ::= line* EOF
auto ini = p.rule("ini", p.zero_or_more(line) + p.end());
return ini;
});I assume the changes were because of the weirdness in consuming spaces/comments. This should alleviate those concerns. And the visitor can really be something as simple as this: std::map<std::string, std::map<std::string, std::string>> cfg;
std::string current_section = "default";
std::string current_key;
ctx.ast.visit(result, [&](const auto & node) {
if (node.tag == "section-name") {
current_section = std::string(node.text);
cfg[current_section] = {};
} else if (node.tag == "key") {
current_key = std::string(node.text);
} else if (node.tag == "value" && !current_key.empty()) {
cfg[current_section][current_key] = std::string(node.text);
current_key.clear();
}
}); |
PEG parser usage improvements: - Simplify parser instantiation (remove arena indirection) - Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping) - Fix last line without newline bug (+ operator instead of <<) - Remove redundant end position check Feature scope: - Remove auto-reload feature (will be separate PR per @ngxson) - Keep config.ini auto-creation and template generation - Preserve per-model customization logic Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com>
Complete rewrite of INI parser grammar and visitor: - Use p.chars(), p.negate(), p.any() instead of p.until() - Support end-of-line comments (key=value # comment) - Handle EOF without trailing newline correctly - Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*) - Simplified visitor (no pending state, no trim needed) - Grammar handles whitespace natively via eol rule Business validation preserved: - Reject section names starting with LLAMA_ARG_* - Accept only keys starting with LLAMA_ARG_* - Require explicit section before key-value pairs Co-authored-by: aldehir <aldehir@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as parsing is concerned!
I will need to add an expect() helper to provide helpful error messages to users when they make a mistake. I can do that separately in an another PR.
Children now receive minimal CLI args (executable, model, port, alias) instead of inheriting all router args. Global settings pass through LLAMA_ARG_* environment variables only, eliminating duplicate config warnings. Fixes: Router args like -ngl, -fa were passed both via CLI and env, causing 'will be overwritten' warnings on every child spawn
|
Now it in a basic working state, with the new line-based PEG parser, I'm testing with my entire per models configuration on the server to test some edge case, and then there are the @ngxson refactoring to do. |
|
Missing sampling parameters need .set_env() in common/arg.cpp (--temp, --top-p, --top-k, --min-p have no LLAMA_ARG_ env vars yet). Successfully migrated llama-swap config (YAML) to config.ini via LLM: llama-server preserved all custom parameters (ctx-size, n-cpu-moe, mmproj, -m ....Q6_K), applied global CLI defaults (-ngl 999, -fa, --mlock, -ctk/-ctv etc...) to all models, and automatically reorganized sections/keys alphabetically to maintain normalized format |
Hmm yeah I didn't notice that some env vars are missing. I think it will be cleaner if we default to using the longest arg (for example, Internally, the parser can accept all 3 forms: env, short arg and long arg ; there is no chance that they will collide anyway. I'll push the change for this |
|
Yes look it just need missing .set_env("LLAMA_ARG_TEMP")); etc... I wait your change while I run some tests |
tools/server/README.md
Outdated
| llama-server --models-dir ./models_directory | ||
| ``` | ||
|
|
||
| The directory is scanned recursively, so nested vendor/model layouts such as `vendor_name/model_name/*.gguf` are supported. The model name in the router UI matches the relative path inside `--models-dir` (for example, `vendor_name/model_name`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For visibility, I will remove recursive support from this PR because it's not related to config support - it should be added later via a dedicated PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes no worries, (I have to keep it on my side otherwise it breaks my integration server) -> I would adapt the configuration on my side to test this feature-atomic PR if necessary
|
I moved most of the code inside We're now using the term "preset", so I think it's easier to make the file name Since I'm now using the same We don't yet support repeated args or args with 2 values (like API endpoint Things that still need to improve:
|
|
|
||
| Alternatively, you can also add GGUF based preset (see next section) | ||
|
|
||
| ### Model presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ServeurpersoCom I updated the docs with an example - lmk if this works in your case
- Sanitize model names: replace / and \ with _ for display - Recursive directory scan with relative path storage - Convert relative paths to absolute when spawning children - Filter router control args from child processes - Refresh args after port assignment for correct port value - Fallback preset lookup for compatibility - Fix missing argv[0]: store server binary path before base_args parsing
tools/server/server-models.cpp
Outdated
| first_shard_file = file; | ||
| } else { | ||
| model_file = file; | ||
| std::function<void(const std::string &, const std::string &)> scan_subdir = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the recursive implementation - it's unrelated to the current PR, and it's also unsafe as it doesn't handle the case where there's a circular symlink or circular mount points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can retrieve the rest (except the recursion) and push --force; I won't touch the branch before tomorrow/rebase/test.
A two-level browsing system will be perfect for all cases (separate PR)
|
Hey guys! Sorry to interrupt, but are the One more thing: maybe it's better to put the config in Thank you so much for what you're doing! |
No worries! with the last refactor the LLAMA_ARG_ prefixes are optional: you can use the short argument forms (e.g., ngl, c) or long forms with dashes (e.g., n-gpu-layers, ctx-size) instead. All three formats are supported. Regarding config location: the preset file path is fully customizable via --models-preset , so you can place it wherever you prefer, including ~/.config/llama.cpp/presets.ini if that fits your workflow better. This is a WIP, I update the first message soon |
Co-authored-by: aldehir <hello@alde.dev>
Make sure to read the contributing guidelines before submitting a PR
Summary
This PR implements INI-based per-model configuration for llama-server router mode, as discussed in #17850.
Motivation
This POC addresses multi-model inference servers targeting small/medium teams with declarative, user-friendly configuration and zero operational friction.
Implementation
Core Features
Technical Details
Example config.ini
Use Case Example
Small dev team runs inference server with 10+ models. Sysadmin sets global defaults via router CLI:
llama-server --models-dir ./models -ngl 999 -fa -ctk q8_0 -ctv q8_0
Then fine-tunes per-model settings in config.ini:
With this system, you can override any parameter per model to optimize each configuration for your GPU, and reset by just deleting the ini file! It also allows beginners to discover llama.cpp arguments as they go along.
Testing
Tested with personal GGUF collection:
Future Work
This is a POC. Potential improvements:
Related
#17850
#17470
#10932