You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the first of several functions with the naming style
utf8_to_uv(), and which are designed to be used instead of the
problematic current ones that are like utf8_to_uvchr().
The previous ones basically throw away crucial information in their
returns upon failure, creating hassles for the caller. It is hard to
recover from malformed input with them to keep going to continue
parsing. That is what modern UTF-8 handlers have settled on doing.
Originally I planned to replace just the most problematic one,
utf8_to_uvchr_buf(), but I realized that each level threw away
information, so it would be better to start at the base level one, which
utf8_to_uvchr_buf() eventually calls with a bunch of 0 parameters. The
previous functions all had to disambiguate failure returns. This stops
that at the root.
The new series all return a boolean as to their success, with a
consistent API throughout. The old series had one outlier, again
utf8_to_uvchr_buf(), which had a different calling convention and
returns.
The basic logic in the base level function, which this commit handles,
was sound. It just failed to return relevant information upon failure.
The new API has somewhat different formal parameter names and uses
Size_t instead of STRLEN for one of the parameters. It also passes the
end of string position instead of a length. The latter is problematic
when it could go negative, and instead becomes a huge positive number.
The old base function now merely calls the new one, and throws away the
relevant information, as it always has.
0 commit comments