We are looking for a coder with experience with CUDA and NVIDIA acceleration cards.
I’ve attached the pseudo-code in C language that we want to port to NVIDIA. This function will be called from several threads (16 threads by default, but could be more or less depending on the number of cores and micros installed in the server). The buffers are independent? for each thread, so it is not required any synchronization in order to read or write the buffers among the threads.
The objective of the project is to improve the performance of the application based on the parallelization capabilities provided by NVIDIA Tesla cards.