An affine partitioning algorithm to maximize parallelism and minimize communication
Reducing cache misses using hardware and software page placement