Iteration Point Difference Analysis is a new static analysis framework that
can be used to determine the memory coalescing characteristics of parallel
loops that target GPU offloading and to ascertain safety and profitability of
loop transformations with the goal of improving their memory access
characteristics. This analysis can propagate definitions through control flow,
works for non-affine expressions, and is capable of analyzing expressions that
reference conditionally-defined values. This analysis framework enables safe
and profitable loop transformations. Experimental results demonstrate
potential for dramatic performance improvements. GPU kernel execution time
across the Polybench suite is improved by up to $25.5\times$ on an Nvidia P100
with benchmark overall improvement of up to $3.2\times$. An opportunity
detected in a SPEC ACCEL benchmark yields kernel speedup of $86.5\times$ with
a benchmark improvement of $3.3\times$. This work also demonstrates how
architecture-aware compilers improve code portability and reduce programmer
effort.