Skip to content

Conversation

@alecgrieser
Copy link
Collaborator

@alecgrieser alecgrieser commented Nov 10, 2025

This reintroduces the work that was reverted (namely #3696 and #3706; see also #3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation.

Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields.

From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the Type system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the Type information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say.

This borrows and expands the tests from #3696 and #3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.

This reintroduces the work that was reverted (namely FoundationDB#3696 and FoundationDB#3706; see also FoundationDB#3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation.

Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields.

From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the `Type` system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the `Type` information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say.

This borrows and expands the tests from #3969 and FoundationDB#3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.
@alecgrieser alecgrieser added the enhancement New feature or request label Nov 10, 2025
@alecgrieser alecgrieser requested a review from hatyo November 13, 2025 10:06
@alecgrieser alecgrieser marked this pull request as ready for review November 13, 2025 10:06
@github-actions
Copy link

📊 Metrics Diff Analysis Report

Summary

  • New queries: 50
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/valid-identifiers.metrics.yaml: 50

@hatyo
Copy link
Contributor

hatyo commented Nov 13, 2025

The inconsistency between table names, and the corresponding (qualified) field names seems confusing. Would it be possible to fix it? (see below, foo__2tableA when used a type in the type filter vs. foo.tableA when used as a field qualifier. Although I am not sure this is what we want, considering this makes the plan diverge from the actual user SQL, we could do it the other way around as well, i.e. printing the user-defined type names instead of storage names in the TypeFilter and so on, which seems better I guess?

If I am not mistaken, that shouldn't be difficult, maybe we can just print the storage name of the FieldValue's FieldAccessor(s) in FieldValue#toString.

image

@alecgrieser
Copy link
Collaborator Author

Yes, we could address that. There are two ways you could address that:

  1. Use the storage name in the FieldAccessor's path, so that you get storage names everywhere, as you said
  2. Use the user-visible names everywhere, in which case we'd want to adjust the way we handle the inputs in the FUSE and LogicalTypeFilterExpressions (and also, while we're there, InsertExpressions, UpdateExpressions, and DeleteExpressions).

I actually kind of prefer the latter. One thing I'm not super happy about is that the user still needs to be a bit careful about which String they supply when things are a string. If the various expressions that cared about type names took, like, a TypeIdentifier that had both, then they could use one for everything except interfacing with the Record Layer. And that would include toString.

One reason to keep the user-visible names during FieldValue's toString is that it makes it easier for the user to make sense of a plan. You could imagine in a future where we completely separate the type name from the internal storage name that it would be hard for a user to understand _._uuid1._uuid2 instead of _.parent.child. Which is why I'd like to standardize the other way

Copy link
Contributor

@hatyo hatyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alecgrieser
Copy link
Collaborator Author

I'll go ahead and address the table name display inconsistency in a follow up PR.

@alecgrieser alecgrieser merged commit 4873b51 into FoundationDB:main Nov 13, 2025
13 of 14 checks passed
@alecgrieser alecgrieser deleted the reintroduce-protobuf-escaping branch November 13, 2025 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants