ACM - Normas
Song Fu
Cheng-Zhong Xu
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC ’07
San Jose, CA
On the Reliability of the IBM MVS/XA Operating System
Critical event prediction for proactive management in large-scale computer clusters
International Conference on Dependable Systems and Networks, p
A large-scale study of failures in high-performance computing systems
BlackJack: Hard Error Detection with Redundant Threads on SMT