Soft errors caused by transient hardware faults can lead to silent data corruptions (SDCs) in scientific applications, potentially impacting correctness and reliability. Traditional fault injection (FI) methods provide accurate vulnerability measurements but are prohibitively time-consuming and resource-intensive. In this work, we propose a function-level prediction framework for SDC vulnerability and criticality in CPU-based scientific applications using Graph Neural Networks (GNNs). Static code features are extracted from LLVM intermediate representation and used to construct function call graphs, enabling GCN, GAT, and GraphSAGE models to capture both intra-function characteristics and inter-function dependencies. The problem is formulated as both regression and classification, predicting continuous vulnerability and criticality scores as well as binary labels. The evaluation is conducted on 30 applications (90 functions) from the PolyBench benchmark suite using leave-one-application-out cross-validation, ensuring that the model is tested on unseen applications. Among the evaluated architectures, GraphSAGE achieves the highest performance (F1 = 0.80, MAE = 0.17), showing strong generalization across diverse workloads. Feature correlation and model-based importance analyses identify the most influential LLVM features, and results demonstrate that the proposed approach provides fine-grained, accurate predictions without the need for exhaustive FI campaigns, enabling more efficient and targeted fault-tolerance strategies.
| Primary Language | English |
|---|---|
| Subjects | Dependable Systems, Deep Learning |
| Journal Section | Research Article |
| Authors | |
| Submission Date | August 19, 2025 |
| Acceptance Date | November 14, 2025 |
| Publication Date | December 31, 2025 |
| Published in Issue | Year 2025 Volume: 12 Issue: 4 |