Skip to content

feat(breaking): require columns in .insert() to be a subset of the destination columns #11731

@NickCrews

Description

@NickCrews

Currently, Backend.insert() has what I view as a footgun, where if the column names don't match it falls back to relying on column ordering.

def _build_insert_from_table(
self, *, target: str, source, db: str | None = None, catalog: str | None = None
):
compiler = self.compiler
quoted = compiler.quoted
# Compare the columns between the target table and the object to be inserted
# If source is a subset of target, use source columns for insert list
# Otherwise, assume auto-generated column names and use positional ordering.
target_cols = self.get_schema(target, catalog=catalog, database=db).keys()
columns = (
source_cols
if (source_cols := source.schema().keys()) <= target_cols
else target_cols
)

I think we should require that set(inserted_column_names) <= set(target_table_column_names), and error if this is not the case. The use can do this casting themselves if they want columns to be matched based on position.

I don't think we should worry about a nice deprecation path, of first warning for a full version before breaking. I think this is used rarely enough we can just YOLO it and break people suddenly.

No where else in ibis that I am aware of relies on the ordering of columns.

Started from #11624. This is relevant when trying to add an .upsert() method: we want this to have the same semantics as .insert(). For now we are going to continue that PR with the matching behavior to .insert(). We can change both methods later if this suggestion is accepted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    breaking changeChanges that introduce an API break at any levelfeatureFeatures or general enhancementsuxUser experience related issues

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions