We should create a simd_read_array intrinsic because for sizeof(Simd<T, N>) > sizeof([T; N]) (which can happen until #319 is fixed) read_unaligned is probably UB due to being able to read the bytes beyond the end of the input array -- the padding in the Simd<T, N>.
We need an intrinsic rather than just using memcpy because the intrinsic will generate llvm's load instruction with vector type (llvm guarantees vector load won't read padding if the load's align is small enough), whereas memcpy may end up using less efficient array-typed loads which sometimes use scalar code.
https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/splat.20no.20longer.20compiles.20for.20release.20builds/near/352101044