Skip to content

Conversation

@GaidamakUA
Copy link

Motivation

It's really awkward to use bytes for unicode characters

Changes

Replace len_bytes() with len_chars().

It's really awkward to use bytes for unicode characters
@github-actions
Copy link

github-actions bot commented Nov 6, 2025

Thank you for opening this pull request! 👋🏼

This repository requires pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details
No release type found in pull request title "Update string.py to use len_chars() instead of len_bytes()". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

@borchero
Copy link
Member

borchero commented Nov 7, 2025

I feel you how it can be a little surprising. I think there are two important questions to answer:

  • How does this affect the conversion to SQLAlchemy columns? Database systems sometimes specify VARCHAR and CHAR columns with bytes. This could result in issues during an upload.
  • Can this somehow be done in a backwards-compatible way? Currently, this is breaking and I don't think we can change it until we release v3 (which is not planned yet)

@GaidamakUA
Copy link
Author

GaidamakUA commented Nov 7, 2025

Yeah, you are right. But how do I overcome it? In my opinion, this behaviour is not obvious. Maybe some kind of boolean flag to use characters?

@borchero
Copy link
Member

Potentially, we could introduce a separate max_char_length? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants