Computer Archtecture: Data-Level Parallelism in Vector, SIMD, and GPU Architectures
$10-30 USD
W trakcie realizacji
Opublikowano ponad 4 lata temu
$10-30 USD
Płatne przy odbiorze
Consider the possibility of unrolling the loop and mapping multiple iterations to vector operations. Assume that you can use scatter-gather loads and stores (vldi and vsti). How does this affect the way you can write the RV64Vcode for this kernel?