Skip to content

Conversation

@mzuenni
Copy link
Collaborator

@mzuenni mzuenni commented Nov 17, 2025

solves #312
@thorehusfeldt do you mind adjusting the schemas?

@mzuenni mzuenni requested a review from mpsijm November 17, 2025 21:58
@mzuenni mzuenni marked this pull request as ready for review November 19, 2025 16:22
@RagnarGrootKoerkamp
Copy link
Owner

Should we also allow this on the answer files? to ensure a testcase is (im)possible as intended.

@thorehusfeldt
Copy link
Collaborator

Consider bumping the generator framework version.

The sample generators.yaml script linked from doc might want to include

version: 2025-12 

and the default in the CUE schema generators key (and presumably the JSON file) updated accordingly.

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 3, 2025

@RagnarGrootKoerkamp :

Should we also allow this on the answer files?

But ans: possible and ans: impossible can already be specified. I guess what you are proposing would be to support “make sure the answer is not impossible", for instance (by

ans-match: \d+

Hm. I find this useful, but now it’s getting ugly.

An idea for syntax that is consistent with the current proposal:

match:
 in: foo
 ans: bar
---
match:
  in: [42, forty-two] 
  ans: bar
---
match: \w+\s\w+ # same as match: { in: \w+\s\w+ }
---
# same as match: { in: [42, forty-two] }:
match:
  - 42
  - forty-two

I guess the schema is

match: string | [...string] | {
  in: string | [...string]
  ans: string | [...string]
}

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 3, 2025

even though this might be less yaml like i would prefer to not nest these and go for something like in.match or match.in?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

Just be make sure I was clear: I propose to retain

match: \d+

as a valid expression, and expect it to be the widest-used form. I propose that the above is the same as

match:
  in: \d+

(which, thanks to standard YAML syntax, can also be written as a one-liner, match: { in: \d+ }, but which is not the same as match.in: \d+. )

The situation in which the “mapping” form would mainly arise is when you want to specify something about ans, like “the answer is not impossible”. Not sure what kinds of conventions will arise among authors, but here are some suggestions:

match: { ans: ^[^i] }
---
match: { ans: \d+ }
---
match: { ans: ^(?!impossible$).* }

I would advise against introducing more keys in the top-level mapping (such asmatch, match.in, and match.ans); tool support for YAML is just better when we stick to YAML conventions.


The main alternatives I can see to my proposal would be to add pattern to in and ans. (I use pattern here instead of match just to keep the proposals syntactically separate.)

[in|ans]: string | {
  value: string
  pattern: string | [...string]
}

so you’d have expressions like this:

generate: make_random_tree -n 100 --balanced {seed:0}
in:
  pattern: \d+
ans: impossible

This doesn’t smell right to me, but it’s just a hunch.

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

I notice that we already have a plethora of stuff, namely

["in" | "in.statement" | "in.download" |
    "ans" | "ans.statement" | "ans.download" |
    "out"]: string

The current semantics is that the key: value pair means "<testcasename>.<key> must equal value". What we’re looking for in the current proposal is a semantics that says "<testcasename>.<key> should obey constraint".

This is a case against introducing keys like in.match, by the way. You’d need ans.statement.match etc.

My hunch is that the cleanest way is to enrich the right-hand side, instead of introducing more left-hand sides of such expressions.

I think what I’m saying is

let extension = "in" | "in.statement" | "in.download" |  "ans" | "ans.statement" | "ans.download" | "out"
[extension]:  string # as we have now
match: string | { [extension]: string } # default string same as { in: string }

allowing

in: foo
match:
  ans.statement: \d\w+ 

Alternatively,

let extension = "in" | "in.statement" | "in.download" |  "ans" | "ans.statement" | "ans.download" | "out"
[extension]:  string  | { match: string }
in: foo
ans.statement:
  match: \d\w+ 

Dream state

The dream state would be what CUE already supports out-of-the box:

ans: "impossible"  # ans must equal impossible
---
in: number & >0 # in must be a number, and strictly larger than 0
---
ans: "yes" | "no" # ans  must be either "yes" or "no"
in.statement: =~"^\w\w$" # in.statement has two letters
in: !~"impossible" # in does not contain impossible
in: in.statement # in and in.statement are identical

In other words, there’s a whole grammar on the right hand side supporting |, &, literal match, and =~ and !~ for regex match and unmatch.

Note that explicit creation and constraint checking are the same: CUE just unifies everything it knows about, say .in (including whatever copy or generate may have produced) and expects the result to be a singleton. Otherwise it complains. Specifying a constraint is the same as specifying a value (the latter is just a constraint with a singleton valid instantiation.)

This would be sah-weet!

@RagnarGrootKoerkamp
Copy link
Owner

Interesting idea to do in: {match: ...}, sounds reasonable as well to me, but no strong opinion either way.

Should it be matches instead of match maybe? As in the .ans matches X Y Z.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 4, 2025

The main alternatives I can see to my proposal would be to add pattern to in and ans. (I use pattern here instead of match just to keep the proposals syntactically separate.)

I don't like that, also feels weird in combination with generated testcases...

["in" | "in.statement" | "in.download" |
   "ans" | "ans.statement" | "ans.download" |
   "out"]: string

I dont think we need this for something else as .in and .ans since this is only intended to additionally check generated files. The others are already hardcoded typically?

and =~ and !~ for regex match and unmatch.

Unmatch would certainly be nice...

match: string | [...string] | {
  in: string | [...string]
  ans: string | [...string]
}

I am fine with that, even though I like to not nest things... :D

The question is if/how we want to support unmatch than?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

The question is if/how we want to support unmatch than?

The current proposal already supports “unmatching”, since regexen support that. Here are the three examples from upthread again, for a problem with output impossible or some numbers:

match: { ans: ^[^i] }
---
match: { ans: \d+ }
---
match: { ans: ^(?!impossible$).* } 

CUE of course would make this nicer to look at:

ans: !~"impossible"

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 4, 2025

match: { ans: ^(?!impossible$).* }

I don't think that one is right? (\A(?!.*^impossible$).*\Z would work but is not very nice...) we could say that if the string starts with ! we do unmatch and if it starts with = we do a match

@thorehusfeldt
Copy link
Collaborator

The only thing I’m unsure about for my negative lookahead regex is what do to with a possibly trailing newline. (I don’t understand the specification well enough.) So maybe it should be ^(?!impossible).* Otherwise I’m pretty sure it’s fine.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

I think the issue is that we don't do a full match but a search, and you pattern still matches a suffix not containing impossible? Anyway: yes it can be expressed... the question is do we want simpler syntax for this? ^^'

@thorehusfeldt
Copy link
Collaborator

we don't do a full match but a search

Now I understand. That’s what ^ is for in my expression. (You prefer \A.)

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

but ^ matches any start of line \A matches start of the first line

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 5, 2025

Hear me out. This actually works:

  1. Syntax

Do allow certain CUE-expressions as the right hand sides of in:, ans:, etc. To be precise, allow string expressions.

For instance, we can do

in: "impossible" # just like we always have
---
in: =~"\\d\\d" # two digits
---
in: "foo"  | "bar"
---
in: "^[a-z]+$" & !~"^impossible$" # alphabetic word, but not impossible 

and a thousand other things. CUE is quite expressive. The main use case are disjuntions and regex match and unmatch.

What is new is that the right hand side is now a constraint. If no in-key is present, it defaults to in: string.

  1. Semantics

For a generator rule, various files can be created. generate, copy, or the default submissions producing ans.

Whatever has been produced (maybe nothing) is now unified using CUE and produce a concrete value (i.e., a concrete string). In the simplest case, the expression

ans: "impossible"

means that “the output of the default submissions will be unified with impossible. In this special case, this means that the two string need to be the same. This is exactly the behaviour that we already have.

But if we had

ans: "yes" | "maybe"

the output of the default submission could be yes or maybe, since both those string unifify with the ans-expression.

  1. Implementation

The CUE CLI already does this. You can set up a very small CUE snippet:

input: string   // will be filled from CLI with concrete value
expr: "foo" | "bar"  // the value of an ans-key in generators.yaml
ok:   input & expr

cue cmd --inject input=foobarbaz exactly replaces input with "foobarbaz", and then CUE does it magic by trying to unify ok. The result of the command is either an error (in this case it would be because "foobarbaz" does not satisfy the rule ), or the unified string.

The only reason to not do this is that it increases the dependencies of BAPCtools. (Which is a good enough reason, I think.)

Still, cool AF. Backwards compatible.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

I actually don't understand what you want to suggest? ^^'

suppose one of your examples is the actual generators.yaml:

data:
  secret:
    - testcase:
        generate: gen.py
        in: "foo"  | "bar"

what is supposed to happen (in our implementation)? do we first run cue on the yaml? do we parse the yaml?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 5, 2025

what is supposed to happen (in our implementation)? do we first run cue on the yaml? do we parse the yaml?

Maybe this is too much of a rabbit hole, but: yes.

gen.py generates testcase.in. (Of type string.) Say its value is "foo". Now, from CUE’s perspective, all the different value of in are unified. There are only two such values:

in: "foo"         # generated by from gen.py
in: "foo" | "bar" # the rule

This unifies nicely (to "foo", a concrete string) and CUE is happy.

Had gen.py generated the string "baz" then CUE would have tried to unify this:

in: "baz"         # from gen.py
in: "foo" | "bar" # the rule

And err.

In the special case where the rule is a concrete string, the two strings must be the same. (This is the current behaviour of BAPCtools.)

In the special case where nothing else produces .in (because there is no generate and no copy in the rule), then the in rule itself must be a concrete string, like in: "foo" but not in: "foo" | "bar" (This is the current behaviour of BAPCtools.)

CUE does not distinguish between “a constraint” (like "foo" | "bar") and “a constraint that is so tight that only a singleton value satisfies it”. Types are values. It’s really cute.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

Maybe this is too much of a rabbit hole, but: yes.

but what exactly is yes supposed to mean? How would I implement this?

data:
  secret:
    - testcase:
        generate: gen.py
        in: "foo"  | "bar"

right now this is neither a valid .yaml file nor a valid .cue file? So I am unsure what I would need to implement to get what you want ^^'
in: "foo" | "bar" is not valid yaml and raises an error when parsed as yaml but running the whole thing as cue is meaningless because cue does not know what running a generator should mean?

Is our goal to write a cue file from whats specified within the yaml? which would look like this:

in: <output of generator>
in: "foo"  | "bar"

or would we want a separate generators.cue? which only contains the part related to in: "foo" | "bar" and than that part is not present in the generators.yaml?

@thorehusfeldt
Copy link
Collaborator

right now this is neither a valid .yaml file nor a valid .cue file?

Exactly. That’s why I’m not pushing hard for this (and call it dream-mode or rabbit hole). We’d need start having annoying conversations about writing

in: '"foo" | "bar"'

and understand double-escapes in \\w so that YAML’s opinions about what must be quoted (in various '"-orthodoxies) are reliably transformed into CUE’s. I think it’s doable, and in the long run CUE would be a much better “generator configuration language” than YAML. But that’s Zukunftsmusik.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 7, 2025

in: '"foo" | "bar"'

Yeah, i am not really a fan of those double quote thingies... especially since the type of quotes already seem to have meaning in yaml ^^' So for now I would prefer to just go with standard regex notation.

But there is still the open question on how to handle .in/.ans files. You proposed to enrich the match entry to possibly be a map, but we could also go the other way around by enriching the regex like this (not sure if I got the cue syntax right):

#Matcher: string | {
    pattern: string
    extension?: "ans" | "in"  //defaults to "in"
    unmatch?: bool            //defaults to false
}

which would then allow us to write:

match:
  - '^1 1$'  #shorthand for "pattern: '^1 1$'", matches a selfloop at vertex 1
  - pattern: '^possible$'
    unmatch: True
    extension: 'ans'  #the answer must not contain possible
  - pattern: '^impossible$'
    extension: 'ans'  #the answer must contain impossible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants