Iteration Point Difference Analysis is a new static analysis framework that can be used to determine the memory coalescing characteristics of parallel loops that target GPU offloading and to ascertain safety and profitability of loop transformations with the goal of improving their memory access characteristics. This analysis can propagate definitions through control flow, works for non-affine expressions, and is capable of analyzing expressions that reference conditionally-defined values. This analysis framework enables safe and profitable loop transformations. Experimental results demonstrate potential for dramatic performance improvements. GPU kernel execution time across the Polybench suite is improved by up to $25.5\times$ on an Nvidia P100 with benchmark overall improvement of up to $3.2\times$. An opportunity detected in a SPEC ACCEL benchmark yields kernel speedup of $86.5\times$ with a benchmark improvement of $3.3\times$. This work also demonstrates how architecture-aware compilers improve code portability and reduce programmer effort.