Skip to content

Conversation

@codeloop
Copy link
Member

@codeloop codeloop commented Jan 24, 2025

odsc-65115
Add the column name transformations, also save a mapping for reverse transformation as per model requirement.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jan 24, 2025
@codeloop codeloop changed the title Make column name LGB compatible Transform columns names to make them compatible across models Jan 27, 2025
@codeloop codeloop changed the title Transform columns names to make them compatible across models odsc-65115 : Transform columns names to make them compatible across models Jan 27, 2025
@codeloop codeloop marked this pull request as ready for review January 28, 2025 04:50
@codeloop codeloop enabled auto-merge January 28, 2025 04:50
@codeloop codeloop requested a review from prasankh January 28, 2025 04:51
"""
clean_df = self._remove_trailing_whitespace(data)
# clean_df = self._normalize_column_names(clean_df)
if hasattr(self.dataset_info, 'horizon'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a proxy for "if it's forecasting"?
If so it may be better to use kind/type for future compatibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the transformations we use OperatorSpec, incase of the forecasting it is ForecastOperatorSpec, type is available in the OperatorConfig which is not required here and adding type to spec will make it redundant, so I have doing that. Updated this to us operatorspec instance type check for better readability .

self.target_category_columns = [
self.raw_column_names.get(col, col) for col in self.target_category_columns
]
df.columns = df.columns.str.replace(" ", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be doing this replace 2x? Is there a way to do it once so we ensure it's done the same way

)

if self.raw_column_names:
expected_names.extend(list(self.raw_column_names.values()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these guaranteed to be in historical data?

Copy link
Member Author

@codeloop codeloop Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we are making this transformation to the historical data columns, expected columns has to be extended using the same post transformed columns

@codeloop codeloop merged commit ccabc04 into main Feb 4, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants