Skip to content

Conversation

@swarnaprakash
Copy link

@swarnaprakash swarnaprakash commented Nov 5, 2025

Pull Request check-list

  • [Y] Do tests and lints pass with this change? Ran linter and corrected changed lines
  • [Y ] Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)? See https://github.com/swarnaprakash/valkey-py/actions/runs/19217879094 . The failed tests are unrelated to this change
  • [Y] Is the new or changed code fully tested? added tests for the new methods and enabled it (even though existing tests don't pass)
  • [Y] Is a documentation update included (if this change modifies existing APIs, or introduces new ones)? T_he docstring is updated withe new parameters_
  • [Y] Is there an example added to the examples folder (if applicable)? NA

Description of change

See #242

The Result class in valkey/commands/search/result.py inappropriately applies UTF-8 decoding to all field values, including binary vector data. This corrupts VECTOR field embeddings and makes valkey-py unsuitable for vector search applications.

The change Adds preserve_bytes and binary_fields parameters to search methods to prevent UTF-8 decoding from corrupting VECTOR field embeddings and other binary data.
The Result class was inappropriately applying UTF-8 decoding to all field values, including binary vector embeddings. This corrupted FLOAT32 vector data and made valkey-py unsuitable for vector search applications.

Adds preserve_bytes and binary_fields parameters to search methods to prevent
UTF-8 decoding from corrupting VECTOR field embeddings and other binary data.

The Result class was inappropriately applying UTF-8 decoding to all field values,
including binary vector embeddings. This corrupted FLOAT32 vector data and made
valkey-py unsuitable for vector search applications.

Changes:
- Add preserve_bytes parameter to search() methods (default: False for backward compatibility)
- Add binary_fields parameter for selective field preservation
- Implement to_string_or_bytes() utility for conditional binary preservation
- Update Result class to handle binary preservation during field processing
- Add comprehensive tests for binary preservation functionality

The fix maintains full backward compatibility while enabling proper vector search
support when preserve_bytes=True is specified.

Fixes vector search corruption where binary embeddings were being decoded as
UTF-8 strings with 'ignore' error handling, silently dropping bytes and
corrupting the vector data.

Signed-off-by: Swarnaprakash Udayakumar <swarnap@amazon.com>
@mkmkme
Copy link
Collaborator

mkmkme commented Nov 11, 2025

@amirreza8002 is this by any chance something you got to work with?
@bogdanp05 I understand it's very unlikely you have worked with this before, but could you also have a look when you have time please?

@mkmkme
Copy link
Collaborator

mkmkme commented Nov 11, 2025

Also keep in mind approving this PR is slightly more problematic due to the fact valkey-search is not yet supported in the CI and the pytest skips that. I'll try to see if it can be easily fixed.

@swarnaprakash
Copy link
Author

Also keep in mind approving this PR is slightly more problematic due to the fact valkey-search is not yet supported in the CI and the pytest skips that. I'll try to see if it can be easily fixed.

@mkmkme I can make an attempt to fix the tests. First I want to clarify the expectation. Is valkey-py expected to be backward compatible with redis? specifically w.r.t search module is the expectation still backward compatibility with Redis search. I understand redis supports different data types (TEXT instead of TAG for example). Wondering if I can just delete/modify these or I need to keep them and make changes such that both redis search and valkey-search are supported

@amirreza8002
Copy link
Contributor

sorry not in my area
i might take a look if i find the time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants