Question

Suppose I have an array:

uint8_t arr[256];

and an element

__m128i x

containing 16 bytes,

x_1, x_2, ... x_16

I would like to efficiently fill a new __m128i element

__m128i y

with values from arr depending on the values in x, such that:

y_1  = arr[x_1]
y_2  = arr[x_2]
   .
   .
   .
y_16 = arr[x_16]

A command to achieve this would essentially be loading a register from a non-contiguous set of memory locations. I have a painfully vague memory of having seen documentation of such a command, but can't find it now. Does it exist? Thanks in advance for your help.

Was it helpful?

Solution

This kind of capability in SIMD architectures is known as load/store scatter/gather. Unfortunately SSE does not have it. Future SIMD architectures from Intel may have this - the ill-fated Larrabee processor was one case in point. For now though you will just need to design your data structures in such a way that this kind of functionality is not needed.

Note that you can achieve the equivalent effect by using e.g. _mm_set_epi8:

y = _mm_set_epi8(arr[x_16], arr[x_15], arr[x_14], ..., arr[x_1]);

although of course this will just generate a bunch of scalar code to load your y vector. This is fine if you are doing this kind of operation outside any performance-critical loops, e.g. as part of initialisation prior to looping, but inside a loop it is likely to be a performance-killer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top