Skip to content

Conversation

@serge-sans-paille
Copy link
Contributor

Fix #1179

@serge-sans-paille
Copy link
Contributor Author

@AntoinePrv : this is just some mockup to check if we're good with the API

@serge-sans-paille
Copy link
Contributor Author

serge-sans-paille commented Oct 29, 2025

Also cc @JohanMabille & @DiamonDinoia for the API. I didn't want to support the generic conversion, say from uint8 to uint32 with larger arrays, let's start humble

@DiamonDinoia
Copy link
Contributor

DiamonDinoia commented Oct 30, 2025

I would call the API extension, extend or widen because it does not just cast, it zero extend and 1 extend. I think in intrinsics terminology is called widening and convert with widening.

On the implementation:

  1. using std::conditional and std::make_signed halves the number of struct you need.
    2. Proposal load/store masked #1162 introduces lower/upper :)
namespace {
template<class T> struct widen;

template<> struct widen<uint8_t>  { using type = uint16_t; };
template<> struct widen<uint16_t> { using type = uint32_t; };
template<> struct widen<uint32_t> { using type = uint64_t; };

template<class T>
using widen_t_unsigned = typename widen<T>::type;

} // namespace 

template<class T>
struct widen {
    static_assert(std::is_integral<T>::value && !std::is_same<T,bool>::value,
                  "integral non-bool type required");

    using U     = typename std::make_unsigned<T>::type;
    using UWide = detail::widen_t_unsigned<U>;

    using type = typename std::conditional<
        std::is_signed<T>::value,
        typename std::make_signed<UWide>::type,
        UWide
    >::type;
};

template<class T>
using widen_t = typename widen<T>::type;

@serge-sans-paille
Copy link
Contributor Author

@DiamonDinoia yep, widen looks like a good name, thanks!

@serge-sans-paille serge-sans-paille force-pushed the bug/1179 branch 5 times, most recently from 520e96d to b76733d Compare October 30, 2025 13:54
Comment on lines +338 to +362
template <typename T>
struct widen : widen<typename std::make_unsigned<T>::type>
{
};

template <>
struct widen<uint32_t>
{
using type = uint64_t;
};
template <>
struct widen<uint16_t>
{
using type = uint32_t;
};
template <>
struct widen<uint8_t>
{
using type = uint16_t;
};
template <>
struct widen<float>
{
using type = double;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a trait that convert a byte size to a uint/float type could be more generic:

template <>
struct sized_uint<4>
{
    using type = uint32_t;
};

...

And then use with batch<sized_uint_t<2*sizeof(T)>> (or an alias for that).

Copy link
Contributor

@DiamonDinoia DiamonDinoia Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in favor of this. Then we need sized_int and sized_uint and the API will need an std::conditional<std::is_signed<T>, sized_int_t<2*sizeof(T)>, sized_uint_t<2*sizeof(T)>

Copy link
Contributor

@AntoinePrv AntoinePrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time! Looking good

@DiamonDinoia
Copy link
Contributor

My only suggestion would be to merge masked_memory ops. Or merge lower/upper from it and use it here.

@serge-sans-paille
Copy link
Contributor Author

My only suggestion would be to merge masked_memory ops. Or merge lower/upper from it and use it here.

Yup, I'll wait for it to be merged!

@serge-sans-paille serge-sans-paille force-pushed the bug/1179 branch 2 times, most recently from 471c67f to a470cf8 Compare October 30, 2025 21:48
Comment on lines +32 to +33
return { batch<T_out, A>::load_aligned(&out_buffer[0]),
batch<T_out, A>::load_aligned(&out_buffer[batch<T_out, A>::size]) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out_buffer
and out_buffer + batch<T_out, A>::size seems clearer to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I mean doing pointer arithmetic directly instead of using the [] and & operators)

Copy link
Member

@JohanMabille JohanMabille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@serge-sans-paille
Copy link
Contributor Author

serge-sans-paille commented Oct 31, 2025

My only suggestion would be to merge masked_memory ops. Or merge lower/upper from it and use it here.

@DiamonDinoia : Are you fine if I split (haha) that part of your commit in a seperate one so that we don't block each other?

@serge-sans-paille serge-sans-paille force-pushed the bug/1179 branch 2 times, most recently from 6e570d1 to f8136da Compare October 31, 2025 08:32
@DiamonDinoia
Copy link
Contributor

My only suggestion would be to merge masked_memory ops. Or merge lower/upper from it and use it here.

@DiamonDinoia : Are you fine if I split (haha) that part of your commit in a seperate one so that we don't block each other?

Sure, go ahead. I'll rebase once I have time

It was relying on an intrinsic only available with avx512dq
@serge-sans-paille
Copy link
Contributor Author

serge-sans-paille commented Oct 31, 2025

My only suggestion would be to merge masked_memory ops. Or merge lower/upper from it and use it here.

done (as in: I created a separate PR to extract that part of your patch, merged it and rebased my PR on top of it)

Intel + common implementation + test + doc

Fix #1179
@serge-sans-paille serge-sans-paille merged commit 9c7c6de into master Oct 31, 2025
116 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

batch_cast does not use SIMD intrinsic when available on Intel

5 participants