A large-scale study of failures in high-performance computing systems
International Conference on Dependable Systems and Networks (DSN’06)
G.A. Gibson
Exploring event correlation for failure prediction in coalitions of clusters