common : implement parser combinators for chat parsing [WIP] #17136

aldehir · 2025-11-10T03:28:32Z

Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.

Problem

Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are particularly challenging to parse. For example, Qwen3-Coder outputs:

<tool_call>
<function={name}>
<parameter={arg-name}>
{arg_value as json or string}
</parameter>
...
</function>
</tool_call>

Supporting this format requires the parser to know the type of each argument based on the provided schema.

Proposal

I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should handle model output effectively. This PR implements a proof-of-concept.

Here's an example from test/test-chat-parser-combinator.cpp:

// Parser for a fictitious model that outputs:
//
//   <think>
//   ... reasoning content ...
//   </think>
//   ... content ...
//   <tool_call>
//   <name>tool_name</name>
//   <args>{ ... json args ... }</args>
//   </tool_call>
//
auto parser = build_parser([](parser_builder & p) {
    auto reasoning = p.add_rule("reasoning",
        p.literal("<think>")
        << p.group("reasoning-content", p.until("</think>"))
        << p.literal("</think>"));

    auto content = p.add_rule("content",
        p.group("content", p.until("<tool_call>")));

    auto json = p.json();

    auto tool_call_name = p.add_rule("tool-call-name",
        p.literal("<name>")
        << p.group("tool-name", p.one_or_more(p.char_class("[a-zA-Z\\-_]")))
        << p.literal("</name>"));

    auto schema = nlohmann::ordered_json::parse(R"({"type": "object"})");

    auto tool_call_args = p.add_rule("tool-call-args",
        p.literal("<args>")
        << p.group("tool-args", p.schema(json, "get_weather", schema))
        << p.literal("</args>"));

    auto tool_call = p.add_rule("tool-call",
        p.literal("<tool_call>")
        << tool_call_name
        << tool_call_args
        << p.literal("</tool_call>"));

    return reasoning << p.optional(content) << p.optional(tool_call);
});

std::string input = R"(<think>I need to call get_weather with city = New York</think><tool_call><name>get_weather</name><args>{"city": "New York"}</args></tool_call>)";
parser_context ctx{input, parse_cache()};

auto result = parser.parse(ctx);

assert_equals(true, result.is_success());
assert_equals(input.size(), result.end);
assert_equals(std::string("I need to call get_weather with city = New York"), *result.group("reasoning-content", ctx.input));
assert_equals(std::string("get_weather"), *result.group("tool-name", ctx.input));
assert_equals(std::string(R"({"city": "New York"})"), *result.group("tool-args", ctx.input));

The parser supports partial parsing for streaming output:

input = R"(<think>I need to call get_weather</think><tool_call><name>get_weather</name><args>{"cit)";
ctx = parser_context{input, parse_cache(), /* .is_input_complete = */ false};
result = parser.parse(ctx);

assert_equals(true, result.is_success());
assert_equals(std::string("I need to call get_weather"), *result.group("reasoning-content", ctx.input));
assert_equals(std::string("get_weather"), *result.group("tool-name", ctx.input));
assert_equals(std::string(R"({"cit)"), *result.group("tool-args", ctx.input));

The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both tool_choice = auto and tool_choice = required.

array ::= "[" space ( value ("," space value)* )? "]" space
boolean ::= ("true" | "false") space
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
content ::= ([^<] | "<" [^t] | "<t" [^o] | "<to" [^o] | "<too" [^l] | "<tool" [^_] | "<tool_" [^c] | "<tool_c" [^a] | "<tool_ca" [^l] | "<tool_cal" [^l] | "<tool_call" [^>])*
decimal-part ::= [0-9]{1,16}
get-weather ::= object
integral-part ::= [0] | [1-9] [0-9]{0,15}
null ::= "null" space
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
object ::= "{" space ( string ":" space value ("," space string ":" space value)* )? "}" space
reasoning ::= "<think>" space ([^<] | "<" [^/] | "</" [^t] | "</t" [^h] | "</th" [^i] | "</thi" [^n] | "</thin" [^k] | "</think" [^>])* space "</think>"
root ::= reasoning space content? space tool-call?
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space
tool-call ::= "<tool_call>" space tool-call-name space tool-call-args space "</tool_call>"
tool-call-args ::= "<args>" space get-weather space "</args>"
tool-call-name ::= "<name>" space [a-zA-Z\-_]+ space "</name>"
value ::= object | array | string | number | boolean | null

Specifics

This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:

// Matches an exact literal string.
//   S -> "hello"
parser literal(const std::string & literal);

// Matches a sequence of parsers in order, all must succeed.
//   S -> A B C
parser sequence(std::initializer_list<parser> parsers);

// Matches the first parser that succeeds from a list of alternatives.
//   S -> A | B | C
parser choice(std::initializer_list<parser> parsers);

// Matches one or more repetitions of a parser.
//   S -> A+
parser one_or_more(const parser & p);

// Matches zero or more repetitions of a parser, always succeeds.
//   S -> A*
parser zero_or_more(const parser & p);

// Matches zero or one occurrence of a parser, always succeeds.
//   S -> A?
parser optional(const parser & p);

// Negative lookahead: succeeds if child parser fails, consumes no input.
//   S -> !A
parser negate(const parser & p);

// Matches any single character.
//   S -> .
parser any();

// Matches a single character from a character class or range.
//   S -> [a-z] or S -> [^0-9]
parser char_class(const std::string & classes);

// Captures the matched text from a parser and stores it with a name.
//   S -> <name:A>
parser group(const std::string & name, const parser & p);

// References a named rule for recursive or reusable grammar definitions.
//   expr -> term | expr "+" term
parser rule(const std::string & name);

// Matches zero or more whitespace characters (space, tab, newline).
//   S -> [ \t\n]*
parser space();

// Matches all characters until a delimiter is found (delimiter not consumed).
//   S -> (!delim .)*
parser until(const std::string & delimiter, bool consume_spaces = true);

// Creates a complete JSON parser supporting objects, arrays, strings, numbers, booleans, and null.
//   value -> object | array | string | number | true | false | null
parser json();

// Wraps a parser with JSON schema metadata for grammar generation.
// Used internally to convert JSON schemas to GBNF grammar rules.
parser schema(const parser & p, const std::string & name, const nlohmann::ordered_json & schema);

The operators +, |, and ~ construct sequence, choice, and negate parsers respectively. The << operator includes a space rule between parsers.

Drawbacks

Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example, p.zero_or_more(~(space + p.literal("</think>")) + p.any()) matches any character that isn't followed by </think>. The p.until("</think>") parser is intended to simplify this.
Packrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity
Each model still requires a custom parser, though they share a common framework that simplifies implementation
Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls

To do

Basic implementation
Support parsing of partial input for streaming
Implement a JSON parser using parser combinators to replace the current healing system
Implement content() and reasoning() parsers to populate content/reasoning fields.
Implement tool(), tool_name(), tool_args(), as well as tool_arg_name() and tool_arg_value() for models such as Qwen3-Coder.
Construct a GBNF grammar from the final parser
Construct a lazy GBNF grammar from the final parser
Implement json-schema-to-grammar support. The JSON parser will parse any JSON, but the generated GBNF grammar should still be constructed from the user-provided schema.
Allow building of the parser during chat param initialization.

pwilkin · 2025-11-10T12:00:28Z

Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs?

aldehir · 2025-11-10T15:47:05Z

Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs?

Sure. I've never managed permissions on a GitHub repo, but let me know if you can't push.

The interface isn't solidified, so hammer away. I do want to clean up the header and move stuff into the source file. Figured I'd handle that as I get further along.

The partial parsing works, but does require careful attention if editing. The idea is to "succeed" if the parse tree is partially traversed and the input is marked as incomplete. With some caveats: if a literal is partially matched, it will propagate a result indicating we need more input. I intend to add a regex parser that uses the builtin partial regex matching support, which should do the same thing. This allows us to collect the results when sending a streaming response.

I need to clean up the caching. Initially, I thought, maybe we could reuse the cache as we get more and more input. I'm finding it very difficult to find the correct time to cache. So I'm thinking about nixing that idea and just provide a cache per parsing run--as the packrat algorithm originally intended. Then we can profile if caching is beneficial or not on a real example. I suspect there shouldn't be a whole lot of backtracking, so the memory cost might not be worth it if the gains are minuscule.

pwilkin · 2025-11-10T17:58:14Z

Aight, let me bounce my original idea - what if we just created a GBNF parser builder and used that to parse the messages? Then we have both problems (tool call / reasoning and compatibility with normal parsing) done in one go. Unless (haven't looked into it) it would just be too inefficient for normal content parsing?

Because right now it feels like we're adding another intermediate abstraction while GBNF is already implemented in GGML - so maybe just use a builder as an abstraction layer to create all the needed objects and add any missing partial parse support?

This is just an idea, not very fixated on it, just thought I'd share it. Regarding memory coatsnand the packrat parser, I think O(n) with typical LLM inputs is negligible, even with super long contexts we're looking at like a few MB overhead at most.

aldehir · 2025-11-10T18:52:49Z

Sounds like you're thinking of a parser generator. Something like yacc, bison, or ANTLR. The problem I see with those solutions is they require building a parse table upfront, which is less intuitive than building a parse tree such as in this PR. You could create a recursive descent parser but that would have to be done at compile time. If you did it at runtime, I think the solution would look a lot like this!

I haven't examined the GBNF code with a scalpel, but taking a brief look it seems like it uses a pushdown automata and may be challenging to extract content. Not that we would want to, since it is part of the core and not common. I believe there is a desire to keep the chat parsing isolated in common.

I also think you lose the expressiveness of being able to define the grammar in C++. For example, with this solution we could add a execute() parser to take in a user lambda and run when the parse subtree succeeds. You could define prune() that removes parts of the tree on a condition, such as if there no tools are provided. Not saying we want to do that, just to demonstrate the flexibility offered.

The solutions I mentioned above do this by defining their own language to insert code--not pretty in my experience.

That said, I am open to ideas. If you have a clearer picture of what that looks like, I'm happy to review. I understand inserting a new abstraction is a tough ask. I wanted to roll out a PoC to hopefully show value.

pwilkin · 2025-11-10T20:40:23Z

@aldehir Nah, you're probably right. I looked at the GBNF code and in fact it would take too much effort to extract the parsed content from there. We're better off just doing it your way. I'll try to code some of the missing pieces.

aldehir · 2025-11-10T22:13:59Z

@pwilkin great! If you have any questions, feel free to ask.

common : implement parser combinators to simplify chat parsing

c822e73

github-actions bot added the testing Everything test related label Nov 10, 2025

DajanaV mentioned this pull request Nov 10, 2025

UPSTREAM PR #17136: common : implement parser combinators for chat parsing [WIP] auroralabs-loci/llama.cpp#153

Open

8 tasks

aldehir added 4 commits November 9, 2025 22:34

add virtual destructor to parser_base

e6153bb

fix memory leak from circular references of rules

4ced999

implement gbnf grammar building

2a9a13d

remove unused private variable

2286532

aldehir added 7 commits November 10, 2025 20:17

create a base visitor and implement id assignment as a visitor

3e6662f

fix const ref for grammar builder

76cf0b5

clean up types, friend classes, and class declarations

9c7b3e8

remove builder usage from until_parser

f02e2b0

Use a counter class to help assign rule ids

66cf038

cache everything

2b3caef

add short description for each parser

adac6ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

common : implement parser combinators for chat parsing [WIP] #17136

common : implement parser combinators for chat parsing [WIP] #17136

aldehir commented Nov 10, 2025 •

edited

Loading

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025 •

edited

Loading

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

common : implement parser combinators for chat parsing [WIP] #17136

Are you sure you want to change the base?

common : implement parser combinators for chat parsing [WIP] #17136

Conversation

aldehir commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Proposal

Specifics

Drawbacks

To do

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025

Uh oh!

pwilkin commented Nov 10, 2025

Uh oh!

aldehir commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aldehir commented Nov 10, 2025 •

edited

Loading

aldehir commented Nov 10, 2025 •

edited

Loading