Skip to content

Conversation

@sgomezvillamor
Copy link
Contributor

@sgomezvillamor sgomezvillamor commented Nov 10, 2025

Problem

BigQuery lineage was missing due to a case sensitivity bug in the temp table inference logic. When convert_urns_to_lowercase: true is configured, legitimate production tables were incorrectly classified as temp tables, preventing lineage generation.

Root Cause

The BigQuery connector had inconsistent case handling:

  • discovered_tables set was correctly normalized to lowercase during initialization
  • But temp table lookups used un-normalized table references for comparison
  • This caused case mismatches (e.g., PRD_NAP_BASE_VWS vs prd_nap_base_vws), triggering false positive temp table inference

Fix Applied

Applied consistent case normalization in both is_temp_table() and is_allowed_table() methods:

  • Both methods now use standardize_identifier_case() when checking against discovered_tables
  • Ensures the convert_urns_to_lowercase configuration is respected across all table lookups
  • Tables are now correctly recognized and lineage is properly generated

Changes

  • queries_extractor.py: Fixed case normalization in temp table and allowed table lookups
  • Added debug logging to help troubleshoot similar issues in the future

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 10, 2025
@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 76.47059% with 4 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ngestion/source/bigquery_v2/bigquery_schema_gen.py 81.25% 3 Missing ⚠️
.../ingestion/source/bigquery_v2/queries_extractor.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Fix case sensitivity bug in is_allowed_table() discovered_tables lookup
- Both is_temp_table() and is_allowed_table() now consistently use standardize_identifier_case()
- Resolves 'not allowed table' debug messages for legitimate production tables
- Ensures convert_urns_to_lowercase config is respected across all table lookups
@github-actions github-actions bot requested a deployment to datahub-wheels (Preview) November 18, 2025 11:49 Abandoned
… inference logs

- Removed overly verbose table discovery debug logs from bigquery_schema_gen.py
- Simplified temp table inference logging in queries_extractor.py
- Kept the essential 'Inferred as temp table' message for troubleshooting
- Case normalization fixes remain in place
@sgomezvillamor sgomezvillamor changed the title debug(bigquery): add debug logging for temp table inference issue fix(bigquery): apply case normalization consistently for temp table inference Nov 22, 2025
):
logger.debug(f"inferred as temp table {name}")
logger.debug(
f"Inferred as temp table {name} (is_allowed?{self.filters.is_allowed(table)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?
A parenthesis is missing for sure, but I'm unsure if you wanted is_allowed there as a string with a question mark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using question mark for booleans 😅

indeed, parenthesis was missed, nice catch!

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Nov 24, 2025
@sgomezvillamor sgomezvillamor merged commit bcabe9f into master Nov 24, 2025
62 checks passed
@sgomezvillamor sgomezvillamor deleted the debug/bigquery-temp-table-inference branch November 24, 2025 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants