-
Notifications
You must be signed in to change notification settings - Fork 726
feat: cleanup old inaccurate activities & maintainers for previously processed fork repos [CM-749] #3563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventional Commits FTW!
| }) | ||
|
|
||
| constructor() { | ||
| constructor(token?: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we taking the token as param ? If we have some use case in which we do not have this env variable probably we must also send the host ? @mbani01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the param to be able to pass another token with different permissions than the one used in env var
| const maintainersDeletedPostgres = await deleteMaintainersFromPostgres( | ||
| clients.postgres, | ||
| repo.id, | ||
| repo.url, | ||
| dryRun, | ||
| ) | ||
| const maintainersDeletedTinybird = await deleteMaintainersFromTinybird( | ||
| tinybird, | ||
| repo.id, | ||
| repo.url, | ||
| dryRun, | ||
| ) | ||
| const maintainersDeletedSnowflake = await deleteMaintainersFromSnowflake( | ||
| clients.snowflake, | ||
| repo.id, | ||
| repo.url, | ||
| dryRun, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these deletions are not in transaction, what if some deletion goes well and some other not ? (e.g. postgress deletion successful, tinybird deletion failed ) can we simply rerun the script ?
| * Execute raw SQL query on Tinybird | ||
| * Useful for queries that don't go through named pipes (e.g., ALTER TABLE, direct SELECT) | ||
| */ | ||
| async rawSql<T = unknown>(query: string): Promise<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could thing to reuse this rawSql function in the pipeSql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! done ✅
epipav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall - before merging, let's
- double check that delete jobs have the proper delete_condition set with this call
- Reduce the batch size a bit more, maybe to 200
joanagmaia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mouad, I saw that the table names were not correct for Snowflake. Also, in Snowflake we will need to delete from both Kafka tables and Fivetran tables.
Can you try to access https://app.snowflake.com/jnmhvwd/xpb85243/#/homepage with your contractor email? You should have access. You will also need to update the columns you are reading for these tables according to the data schema.
In terms of credentials, we will use COMMUNITY_MANAGEMENT_USER , we are just waiting on delete permissions to all of these tables. Thread -> https://linuxfoundation.slack.com/archives/C06GFSRQQER/p1762177471448389
https://linuxfoundation.slack.com/archives/C02RMNFE9FF/p1762537289862279
| if (dryRun) { | ||
| log.info(`[DRY RUN] Querying maintainers from Snowflake for repository: ${repoUrl}`) | ||
| try { | ||
| const query = `SELECT COUNT(*) as count FROM "maintainersInternal" WHERE "repoId" = '${repoId}'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| try { | ||
| // Delete from activityRelations first (foreign key dependency) | ||
| log.info('Deleting from activityRelations table...') | ||
| const activityRelationsQuery = `DELETE FROM activityRelations WHERE activityId IN (${idsString})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // Delete from activities table | ||
| log.info('Deleting from activities table...') | ||
| const activitiesQuery = `DELETE FROM activities WHERE id IN (${idsString})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces enhancements to the Tinybird client library, adds new utility scripts, and updates dependencies to support Snowflake integration. The most significant improvements are the addition of new methods for interacting with Tinybird datasources and the refactoring of header management for API requests.
Tinybird client enhancements:
getHeadersmethod to centralize and simplify HTTP header creation for Tinybird API requests intinybirdClient.ts.getHeadersmethod for both GET and POST requests, improving maintainability. [1] [2]rawSqlmethod to execute raw SQL queries directly against Tinybird, supporting operations outside of named pipes.deleteDatasourcemethod to allow selective deletion of data from a Tinybird datasource using the official delete API.Script and dependency updates:
cleanup-fork-activitiesto thescript_executor_workerpackage for maintenance tasks.pnpm-lock.yamlandpackage.jsonto include@crowd/snowflake, preparing the project for Snowflake integration. [1] [2]