I'm trying to get into SIMD by implementing a trivial operation: XOR unmasking of a byte stream as required by the WebSocket specification. The implementation in x86 intrinsics is actually very straightforward, but I have a hard time wrapping my head around expressing it in terms of Faster iterators API.
The part I'm having trouble with is getting an input [u8; 4] to cycle within a SIMD vector of u8. I have looked at:
load() which does accept &[u8] as input, but its behavior in case of length mismatch is completely undocumented. It's also not obvious what offset parameter does.
- Casting the input
[u8; 4] to u32, calling vecs::u32s() and then downcasting repeatedly to get a SIMD vector of u8, but Downcast seems to do not at all what I want.
- Getting a SIMD vector of length 4 and arbitrary type inside it, load
[u8; 4] into it (lengths now match, so it should work) then downcast repeatedly until I get a vector of u8 with arbitrary length. Except there seems to be no way to request a SIMD vector of length 4 and arbitrary type.
- After over an hour of head-scratching I've noticed that
From<u32x4> is implemented for u8x16, so I could replace Downcast with it in approach 2 and probably get the correct result, except I have no idea how such conversions interact with host endianness.
I actually expected this to be a trivial task. I guess for someone familiar with SIMD it is, but for the likes of me a snippet in examples/ folder that loads [u8; 4] into a vector would go a long way. Or perhaps even a convenience function in the API that deals with endianness properly, to make it harder to mess up.
I'm trying to get into SIMD by implementing a trivial operation: XOR unmasking of a byte stream as required by the WebSocket specification. The implementation in x86 intrinsics is actually very straightforward, but I have a hard time wrapping my head around expressing it in terms of Faster iterators API.
The part I'm having trouble with is getting an input
[u8; 4]to cycle within a SIMD vector ofu8. I have looked at:load()which does accept&[u8]as input, but its behavior in case of length mismatch is completely undocumented. It's also not obvious whatoffsetparameter does.[u8; 4]tou32, callingvecs::u32s()and then downcasting repeatedly to get a SIMD vector of u8, but Downcast seems to do not at all what I want.[u8; 4]into it (lengths now match, so it should work) then downcast repeatedly until I get a vector of u8 with arbitrary length. Except there seems to be no way to request a SIMD vector of length 4 and arbitrary type.From<u32x4>is implemented foru8x16, so I could replace Downcast with it in approach 2 and probably get the correct result, except I have no idea how such conversions interact with host endianness.I actually expected this to be a trivial task. I guess for someone familiar with SIMD it is, but for the likes of me a snippet in examples/ folder that loads
[u8; 4]into a vector would go a long way. Or perhaps even a convenience function in the API that deals with endianness properly, to make it harder to mess up.