Skip to content

Commit 4367098

Browse files
authored
fix: Pass snapshot_id as kwarg to RemoveStatisticsUpdate (apache#2694)
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> Closes apache#2558 # Rationale for this change When removing snapshots with statistics, `RemoveStatisticsUpdate` was being instantiated with a positional argument, which, as suggested by @vndv, "violates Pydantic's BaseModel requirement that all fields be passed as keyword arguments". Shout out to @vndv for catching this 🚀 This caused a `TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given` when calling `table.maintenance.expire_snapshots().older_than(...).commit()`. ## Are these changes tested? Yes, and a new test was added. Existing tests only tested the `RemoveStatisticsUpdate` directly, but didnt test the code path through `RemoveSnapshotsUpdate` that triggers the bug. Added `test_update_remove_snapshots_with_statistics` to `test_expire_snapshots.py` to extend coverage for the condition. ## Are there any user-facing changes? No. <!-- In the case of user-facing changes, please add the changelog label. -->
1 parent bb41a6d commit 4367098

File tree

2 files changed

+38
-1
lines changed

2 files changed

+38
-1
lines changed

pyiceberg/table/update/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -528,7 +528,7 @@ def _(update: RemoveSnapshotsUpdate, base_metadata: TableMetadata, context: _Tab
528528
if ref.snapshot_id in update.snapshot_ids
529529
)
530530
remove_statistics_updates = (
531-
RemoveStatisticsUpdate(statistics_file.snapshot_id)
531+
RemoveStatisticsUpdate(snapshot_id=statistics_file.snapshot_id)
532532
for statistics_file in base_metadata.statistics
533533
if statistics_file.snapshot_id in update.snapshot_ids
534534
)

tests/table/test_expire_snapshots.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
import pytest
2424

2525
from pyiceberg.table import CommitTableResponse, Table
26+
from pyiceberg.table.update import RemoveSnapshotsUpdate, update_table_metadata
2627
from pyiceberg.table.update.snapshot import ExpireSnapshots
2728

2829

@@ -280,3 +281,39 @@ def worker2() -> None:
280281

281282
assert results["expire1_snapshots"] == expected_1, "Worker 1 snapshots contaminated"
282283
assert results["expire2_snapshots"] == expected_2, "Worker 2 snapshots contaminated"
284+
285+
286+
def test_update_remove_snapshots_with_statistics(table_v2_with_statistics: Table) -> None:
287+
"""
288+
Test removing snapshots from a table that has statistics.
289+
290+
This test exercises the code path where RemoveStatisticsUpdate is instantiated
291+
within the RemoveSnapshotsUpdate handler. Before the fix for #2558, this would
292+
fail with: TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given
293+
"""
294+
# The table has 2 snapshots with IDs: 3051729675574597004 and 3055729675574597004
295+
# Both snapshots have statistics associated with them
296+
REMOVE_SNAPSHOT = 3051729675574597004
297+
KEEP_SNAPSHOT = 3055729675574597004
298+
299+
# Verify fixture assumptions
300+
assert len(table_v2_with_statistics.metadata.snapshots) == 2
301+
assert len(table_v2_with_statistics.metadata.statistics) == 2
302+
assert any(stat.snapshot_id == REMOVE_SNAPSHOT for stat in table_v2_with_statistics.metadata.statistics), (
303+
"Snapshot to remove should have statistics"
304+
)
305+
306+
# This should trigger RemoveStatisticsUpdate instantiation for the removed snapshot
307+
update = RemoveSnapshotsUpdate(snapshot_ids=[REMOVE_SNAPSHOT])
308+
new_metadata = update_table_metadata(table_v2_with_statistics.metadata, (update,))
309+
310+
# Verify the snapshot was removed
311+
assert len(new_metadata.snapshots) == 1
312+
assert new_metadata.snapshots[0].snapshot_id == KEEP_SNAPSHOT
313+
314+
# Verify the statistics for the removed snapshot were also removed
315+
assert len(new_metadata.statistics) == 1
316+
assert new_metadata.statistics[0].snapshot_id == KEEP_SNAPSHOT
317+
assert not any(stat.snapshot_id == REMOVE_SNAPSHOT for stat in new_metadata.statistics), (
318+
"Statistics for removed snapshot should be gone"
319+
)

0 commit comments

Comments
 (0)