Preserving Collective Performance across Process Failure for a Fault Tolerant MPI
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
Joshua Hursey
Fault Tolerance for OpenSHMEM