Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 7, 2025

Motivation

Closes the issue where dy.Array columns could not use validation rules on inner elements (e.g., dy.Array(dy.Float64(nullable=False), shape=1024) raised "Validation rules on the inner type of Array are not yet supported"). Polars now supports .arr.eval(), making this implementable.

Changes

Implementation (dataframely/columns/array.py)

  • Added validation_rules() method using expr.arr.eval(inner_expr).arr.all() to validate inner elements, mirroring the List implementation
  • Added validate_dtype() method comparing constructed dtypes directly to handle both flat and nested array representations
  • Removed initialization check that blocked inner validation rules
  • Updated docstring and added PolarsDataType import

Tests (tests/column_types/test_array.py)

  • Changed test_array_with_rules from expecting ValueError to validating that inner rules work
  • Added test_nested_array_with_rules for nested array validation
  • Added validation_mask import

Example usage:

class EmbeddingSchema(dy.Schema):
    doc_id = dy.String(primary_key=True, nullable=False)
    # Now supports validation on inner Float64 elements
    embedding = dy.Array(dy.Float64(nullable=False, min=-1.0, max=1.0), 
                        nullable=False, shape=768)
Original prompt

This section details on the original issue you should resolve

<issue_title>Array validation</issue_title>
<issue_description>Firstly, thank you for your work on this incredibly useful tool.

I am encountering an issue where I'm trying to use dataframely in a pipeline with vectorised documents, and struggling to use dy.Array in a schema to validate a column of vectors.

I wondering if I am just doing it wrong, but also I am slightly confused by the information in the Array docstring and the error messages I get back

A minimal example, tested with dataframely v1.4.0 and v2.0.0

import dataframely as dy
import numpy as np
import polars as pl

embeddings_df = pl.DataFrame({"some_id": range(10), "embedding": np.random.rand(10, 1024)})

# This one will validate, but only to a list
class EmbeddedDocumentsSchema(dy.Schema):
    some_id = dy.String(primary_key=True, nullable=False)
    embedding = dy.List(dy.Float64(nullable=False), nullable=False)

# this one complains that Validation rules on the inner type of Array are not yet supported.
class EmbeddedDocumentsSchema(dy.Schema):
    some_id = dy.String(primary_key=True, nullable=False)
    embedding = dy.Array(dy.Float64(nullable=False), nullable=False, shape=1024)

Now, I could always add a length check to the first schema, then cast to array later, but I would prefer not to have to do that.

It is also possible that I have misunderstood how dy.Array is supposed to be used, but the docstring and error messaging suggest it is only partially implemented? If so, is there somewhere where I can track progress on this so I know when I can switch over if and when it does get supported?</issue_description>

<agent_instructions>- Implement the validation_rules method for the Array column just like for the List column, performing only the validation relevant to the array. Inner validation should be performed using arr.eval.

  • Adjust the tests which currently perform extra logic to ensure that an array's inner element does not have a validation rule and align these tests with the existing ones. This probably involves parametrizing a test which already works for list columns a little differently.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@borchero Thanks for bringing this up! An older version of polars did not support `.arr.eval` which is the reason this is currently unsupported. I just checked and this exists now, hence, we can also add the inner validation for arrays :)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits November 7, 2025 18:41
…port

Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix array validation for embedding column Implement inner validation rules for Array columns Nov 7, 2025
Copilot AI requested a review from borchero November 7, 2025 18:51
Copilot finished work on behalf of borchero November 7, 2025 18:51
@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (355acf8) to head (497d622).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #222   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3007      3019   +12     
=========================================
+ Hits          3007      3019   +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@borchero borchero changed the title Implement inner validation rules for Array columns feat: Implement inner validation rules for Array columns Nov 12, 2025
@github-actions github-actions bot added the enhancement New feature or request label Nov 12, 2025
@borchero borchero marked this pull request as ready for review November 12, 2025 17:25
@borchero borchero merged commit 1660d88 into main Nov 13, 2025
31 checks passed
@borchero borchero deleted the copilot/fix-array-validation-issue branch November 13, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array validation

3 participants