A large-scale study of failures in high-performance computing systems
International Conference on Dependable Systems and Networks (DSN’06)
B. Schroeder
Exploring event correlation for failure prediction in coalitions of clusters