Preserving Collective Performance across Process Failure for a Fault Tolerant MPI
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
Richard L. Graham
Fault Tolerance for OpenSHMEM