Skip to content

Conversation

@chriscollins3456
Copy link
Collaborator

Note: This still needs tests right now

This is a POC PR that adds aspect mappings to GraphQL fields for the Dataset entity. The goal of this PR is to pave a path forward for us to optimize our entity hydration SQL queries by only fetching the aspects that we need based on the GraphQL query, as opposed to fetching all aspects on every hydration of an entity like we do right now.

In order to do this, I added a few new directives to our graphql schema to map the aspects that populate those fields. This should only be necessary on top level fields for a given entity, like I have in entity.graphql here for Datasets.

When we go to hydrate a Dataset, check which fields are being queried for in the graphql query, then check the fields on a Dataset, and get only the required aspects for those fields.

If we're ever in a position where someone adds a new field and forgets to add the aspect directive, I added the conservative fallback to fetch all aspects like before.

@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Nov 11, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Nov 11, 2025
Comment on lines 1931 to +1932
fineGrainedLineages: [FineGrainedLineage!]
@aspectMapping(aspects: ["upstreamLineage"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscollins3456 do we actually need upstreamLineage here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would hope this would come from elastic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually needed i'm pretty sure in our lineage UI in order to show CLL. I can double check with @asikowitz though to be sure.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Nov 11, 2025
@david-leifker
Copy link
Collaborator

Seems like a reasonable approach, annotate the graphql schema with the mapping to aspects required. Some manual/AI work to setup, but I think the it makes a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops PR or Issue related to DataHub backend & deployment pending-submitter-response Issue/request has been reviewed but requires a response from the submitter product PR or Issue related to the DataHub UI/UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants