In this post I will do my best to explain all my attempts towards optimizing the above algorithm. All of them were unsuccessful, so these two guys in NVIDIA (Louis Bavoil, Kevin Myers) did a pretty good job. However, I managed to get a few extra percent utilization in the CROP unit(from 32%-33% to 41,4%…