Research Article
BibTex RIS Cite

Graph Neural Network-Based Prediction of Soft Error Vulnerability and Criticality of Functions in Scientific Applications

Year 2025, Volume: 12 Issue: 4, 979 - 998, 31.12.2025
https://doi.org/10.54287/gujsa.1766028

Abstract

Soft errors caused by transient hardware faults can lead to silent data corruptions (SDCs) in scientific applications, potentially impacting correctness and reliability. Traditional fault injection (FI) methods provide accurate vulnerability measurements but are prohibitively time-consuming and resource-intensive. In this work, we propose a function-level prediction framework for SDC vulnerability and criticality in CPU-based scientific applications using Graph Neural Networks (GNNs). Static code features are extracted from LLVM intermediate representation and used to construct function call graphs, enabling GCN, GAT, and GraphSAGE models to capture both intra-function characteristics and inter-function dependencies. The problem is formulated as both regression and classification, predicting continuous vulnerability and criticality scores as well as binary labels. The evaluation is conducted on 30 applications (90 functions) from the PolyBench benchmark suite using leave-one-application-out cross-validation, ensuring that the model is tested on unseen applications. Among the evaluated architectures, GraphSAGE achieves the highest performance (F1 = 0.80, MAE = 0.17), showing strong generalization across diverse workloads. Feature correlation and model-based importance analyses identify the most influential LLVM features, and results demonstrate that the proposed approach provides fine-grained, accurate predictions without the need for exhaustive FI campaigns, enabling more efficient and targeted fault-tolerance strategies.

References

  • Allamanis, M., Barr, E. T., Devanbu, P., & Sutton, C. (2018). A survey of machine learning for big code and naturalness. ACM Computing Surveys, 51(4). https://doi.org/10.1145/3212695
  • Arslan, S., & Unsal, O. (2021). Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading. Journal of Supercomputing, 77(12), 14130–14160. https://doi.org/10.1007/s11227-021-03804-6
  • Cao, S., Sun, X., Bo, L., Wu, R., Li, B., & Tao, C. (2022, May 21-29). MVD: Memory-related vulnerability detection based on flow-sensitive graph neural networks. In: Proceedings of the 44th International Conference on Software Engineering (ICSE’22) (pp. 1456–1468), Pittsburgh Pennsylvania. https://doi.org/10.1145/3510003.3510219
  • Fey, M., & Lenssen, J. E. (2019). Fast graph representation learning with PyTorch Geometric. In: Proceedings of the ICLR Workshop on Representation Learning on Graphs and Manifolds. https://doi.org/10.48550/arXiv.1903.02428
  • Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems (NeurIPS), (pp. 1024–1034). https://doi.org/10.48550/arXiv.1706.02216
  • Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1609.02907
  • Laguna, I., Schulz, M., Richards, D. F., Calhoun, J., & Olson, L. N. (2016, March 12-18). IPAS: Intelligent protection against silent output corruption in scientific applications. In: Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO '16). Association for Computing Machinery (pp. 227–2389, Barcelona, Spain. https://doi.org/10.1145/2854038.2854059
  • Lu, Q., Farahani, M., Wei, J., & Pattabiraman, K. (2015, August 3-5). LLFI: An intermediate code level fault injection tool for hardware faults. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN ’15), Vancouver, BC, Canada. https://doi.org/10.1109/QRS.2015.13
  • Mukherjee, S. S., Kontz, C. T., & Reinhardt, S. K. (2002, May 25-29). Detailed design and evaluation of redundant multithreading alternatives. In: Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), (pp. 99–110), Anchorage, AK, USA. https://doi.org/10.1109/ISCA.2002.1003566
  • Ni, C., Guo, X., Zhu, Y., Xu, X., & Yang, X. (2024, September 11-15). Function-level Vulnerability Detection Through Fusing Multi-Modal Knowledge. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE '23). IEEE Press, (pp. 1911–1918), Luxembourg, Luxembourg. https://doi.org/10.1109/ASE56229.2023.00084
  • Öz, I., and Arslan, S. (2021). Predicting the soft error vulnerability of parallel applications using machine learning. International Journal of Parallel Programming, 49, 410–439. https://doi.org/10.1007/s10766-021-00707-0
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (pp. 8024–8035). Curran Associates, Inc.
  • Pouchet, L.-N. (2012). Polybench/c: The polyhedral benchmark suite. (Accessed: August 8, 2025) https://www.cs.colostate.edu/~pouchet/software/polybench/
  • Topçu, B., & Öz, I. (2023). Soft error vulnerability prediction of gpgpu applications. The Journal of Supercomputing, 79, 6965–6990. https://doi.org/10.1007/s11227-022-04933-2
  • Veličković, P., Cucurull, G., Casanova, A., Romero, A., Li`o, P., & Bengio, Y. (2018). Graph attention networks. In: International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1710.10903
  • Wei, X., Zhao, J., Jiang, N., & Yue, H. (2023, October 20–22). GLAM-SERP: Building a graph learning-assisted model for soft error resilience prediction in GPGPUs. In: Algorithms and Architectures for Parallel Processing: 23rd International Conference, ICA3PP 2023, Proceedings, Part IV, LNCS 14490, (pp. 419–435), Tianjin, China. https://doi.org/10.1007/978-981-97-0859-8_25
  • Zou, D., Wang, S., Xu, S., Li, Z., & Jin, H. (2021). μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection. IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 05, pp. 2224-2236, Sept.-Oct. 2021, https://doi.org/10.1109/TDSC.2019.2942930
There are 17 citations in total.

Details

Primary Language English
Subjects Dependable Systems, Deep Learning
Journal Section Research Article
Authors

Sanem Arslan Yılmaz 0000-0003-3019-7070

Submission Date August 19, 2025
Acceptance Date November 14, 2025
Publication Date December 31, 2025
Published in Issue Year 2025 Volume: 12 Issue: 4

Cite

APA Arslan Yılmaz, S. (2025). Graph Neural Network-Based Prediction of Soft Error Vulnerability and Criticality of Functions in Scientific Applications. Gazi University Journal of Science Part A: Engineering and Innovation, 12(4), 979-998. https://doi.org/10.54287/gujsa.1766028