-
Notifications
You must be signed in to change notification settings - Fork 3.3k
fix(bigquery): apply case normalization consistently for temp table inference #15252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- Fix case sensitivity bug in is_allowed_table() discovered_tables lookup - Both is_temp_table() and is_allowed_table() now consistently use standardize_identifier_case() - Resolves 'not allowed table' debug messages for legitimate production tables - Ensures convert_urns_to_lowercase config is respected across all table lookups
… inference logs - Removed overly verbose table discovery debug logs from bigquery_schema_gen.py - Simplified temp table inference logging in queries_extractor.py - Kept the essential 'Inferred as temp table' message for troubleshooting - Case normalization fixes remain in place
| ): | ||
| logger.debug(f"inferred as temp table {name}") | ||
| logger.debug( | ||
| f"Inferred as temp table {name} (is_allowed?{self.filters.is_allowed(table)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
A parenthesis is missing for sure, but I'm unsure if you wanted is_allowed there as a string with a question mark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using question mark for booleans 😅
indeed, parenthesis was missed, nice catch!
Problem
BigQuery lineage was missing due to a case sensitivity bug in the temp table inference logic. When
convert_urns_to_lowercase: trueis configured, legitimate production tables were incorrectly classified as temp tables, preventing lineage generation.Root Cause
The BigQuery connector had inconsistent case handling:
discovered_tablesset was correctly normalized to lowercase during initializationPRD_NAP_BASE_VWSvsprd_nap_base_vws), triggering false positive temp table inferenceFix Applied
Applied consistent case normalization in both
is_temp_table()andis_allowed_table()methods:standardize_identifier_case()when checking againstdiscovered_tablesconvert_urns_to_lowercaseconfiguration is respected across all table lookupsChanges
queries_extractor.py: Fixed case normalization in temp table and allowed table lookups