CUDA C v. Thrust, am I missing something?

https://stackoverflow.com/questions/16858010

30-05-2022
|

Pergunta

I just started learning CUDA programming. I was trundling through some simple CUDA C examples and everything was going swimmingly. Then! Suddenly! Thrust! I consider myself versed on C++ functors and was taken aback at the difference between CUDA C and Thrust

I find it hard to believe that

__global__ void square(float *a, int N) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < N) {
        a[idx] = a[idx] * a[idx];
    }
}

int main(int argc, char** argv) {

float *aHost, *aDevice;

const int N = 10;
size_t size = N * sizeof(float);

aHost = (float*)malloc(size);
cudaMalloc((void**)&aDevice, size);

for (int i = 0; i < N; i++) {
    aHost[i] = (float)i;
}

cudaMemcpy(aDevice, aHost, size, cudaMemcpyHostToDevice);

int block = 4;
int nBlock = N/block + (N % block == 0 ? 0:1);

square<<<nBlock, block>>>(aDevice, N);

cudaMemcpy(aHost, aDevice, size, cudaMemcpyDeviceToHost);

for (int i = 0; i < N; i++) {
    printf("%d, %f\n", i, aHost[i]);
}

free(aHost);
cudaFree(aDevice);
}

is equvalent to

template <typename T>
    struct square {
    __host__ __device__ T operator()(const T& x) const {
        return x * x;
    }
}; 

int main(int argc, char** argv) {
    const int N = 10;
    thrust::device_vector<float> dVec(N);
    thrust::sequence(dVec.begin(), dVec.end());
    thrust::transform(dVec.begin(), dVec.end(), dVec.begin(), square<float>());
    thrust::copy(dVec.begin(), dVec.end(), std::ostream_iterator<float>(std::cout, "\n"));
}

Am I missing something? Is the above code being run on the GPU? Thrust is a great tool, but I'm skeptical that it takes care of all the heavy C-style memory management.

Is the Thrust code being executed on the GPU? How can I tell?
How did Thrust eliminate the bizarre syntax of evoking a kernel?
Is Thrust actually evoking a kernel?
Does Thrust automatically handle the thread index computation?

Thanks for your time. Sorry if these are silly questions, but I find it incredulous that the examples I've seen transition instantly from what can be described as a Model T to a M3.

Solução

Roughly: yes, of course. Thrust is a library, so as all of them are born to make it easier. Its great point is avoiding all explicit CUDA code, which looks strange for rest of programmers, providing a friendly C++-like interface.

Thrust uses GPU, but not just GPU. It makes same operations you make if you write your own code, i.e., C/C++ code for allocating memory, copying, set grid and block sizes... and then invokes GPU for executing kernel.

It is a good choice for those who don't want to get inside low level CUDA stuff but to take advantage of GPU parallelism in a simple (but frequent) problem, like vector operations.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow