Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness

Mücahit Cihan

doi:10.54287/gujsa.1870409

Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness

Abstract

Tabular deep learning is still challenging in real-world settings. Many datasets include both numerical and categorical variables, substantial missingness, and a need for not only strong classification performance but also interpretability and reliable probability estimates. DA2-Net is proposed to address this problem through a dual-branch architecture. It combines an interpretable additive pathway for feature-wise main effects with a selective self-attention pathway for higher-order interactions. In this design, features are ranked using additive contribution magnitude, uncertainty, and missingness-aware scaling. Only a Top-K subset is then passed to a single multi-head self-attention block. The final prediction is obtained through uncertainty-aware gated fusion. The model is also supported by sparsity, stability, and Brier-based calibration regularization. This allows it to balance expressive interaction modeling with transparency and robustness under incomplete data. DA2-Net is evaluated on four public binary tabular benchmarks, namely AdultIncome, DefaultCredit, HeartDisease, and BankMarketing, under controlled Missing Completely At Random (MCAR) missingness levels of 0.0, 0.1, 0.2, and 0.3. The evaluation uses 5-fold stratified cross-validation repeated across three random seeds. This produces 15 runs for each dataset and missingness condition, and 128 evaluation blocks in total across AUC, AUPRC, ACC, F1, sensitivity, specificity, Brier score, and Expected Calibration Error (ECE). Across this benchmark, DA2-Net achieves the best overall mean rank with 3.078 ± 2.044, ahead of SAINT-Lite at 3.980 ± 2.624. It achieves or shares the best result in all 16 AUC blocks, 13 of 16 AUPRC blocks, 10 of 16 ACC blocks, 11 of 16 Brier blocks, and 7 of 16 ECE blocks. These results show that its main strength lies in robust ranking-based discrimination and strong overall probability quality under missingness. It also shows a favorable practical-efficiency profile in the current benchmark, remaining more compact and inference-efficient than the main transformer-like baselines. Epoch-wise loss analysis also shows stable convergence across all four datasets. The binary cross-entropy (BCE) term drives the optimization, while the auxiliary regularizers act as controlled refinements. The ablation study further confirms that the interaction branch is essential. Removing it in the AdditiveOnly variant causes the clearest degradation in both predictive and calibration metrics. In contrast, removing the gate or the auxiliary regularization terms leads only to minor changes. A sensitivity analysis also supported the selected interaction subset size k=10 and spline knot count K=8 as balanced settings, while additive shape-function visualizations provided direct qualitative evidence for feature-wise interpretability.

Keywords

Ethical Statement

This study uses publicly available datasets and does not involve human participants, animals, or any prospective data collection. All analyses were conducted on de-identified, publicly accessible data; therefore, ethical approval and informed consent were not required. The study was performed in accordance with applicable institutional and international research integrity principles.

References

Adhikari, D., Jiang, W., Zhan, J., He, Z., Rawat, D. B., Aickelin, U., & Khorshidi, H. A. (2022). A comprehensive survey on imputation of missing data in internet of things. ACM Computing Surveys, 55(7), 133. https://doi.org/10.1145/3533381
Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., & Hinton, G. E. (2021, December 6-14). Neural additive models: Interpretable machine learning with neural nets. In: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. Wortman Vaughan (Eds.), Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS'21), (pp. 4699-4711). https://doi.org/10.48550/arXiv.2004.13912
Arik, S. Ö., & Pfister, T. (2021, February 2-9). Tabnet: Attentive interpretable tabular learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 8, pp. 6679-6687). https://doi.org/10.1609/aaai.v35i8.16826
Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., & Kasneci, G. (2022). Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 35(6), 7499-7519. http://doi.org/10.1109/TNNLS.2022.3229161
Carmichael, Z., & Scheirer, W. J. (2023). How well do feature-additive explainers explain feature-additive predictors?. https://doi.org/10.48550/arXiv.2310.18496
Chang, C. H., Tan, S., Lengerich, B., Goldenberg, A., & Caruana, R. (2021, August 14-18). How interpretable and trustworthy are gams?. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD’21), (pp. 95-105), Singapore. https://doi.org/10.1145/3447548.3467453
Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review. Biology, 12(7), 1033. https://doi.org/10.3390/biology12071033
Cihan, M. (2026, February 4-5). Interpretable Additive Modeling for Heart Disease Prediction: A Reproducible Benchmark on the UCI Cleveland Dataset. In: M. Keskin (Eds.), International Congress of Health Disciplines (ICHD 2026), (pp. 42-70).

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1-30.
Di Marino, A., Bevilacqua, V., Ciaramella, A., De Falco, I., & Sannino, G. (2025). Ante-Hoc Methods for Interpretable Deep Models: A Survey. ACM Computing Surveys, 57(10), 262. https://doi.org/10.1145/3728637
Farokhi, S., Chen, H., Moon, K., & Karimi, H. (2024, December 15-18). Advancing Tabular Data Classification with Graph Neural Networks: A Random Forest Proximity Method. In: 2024 IEEE International Conference on Big Data (BigData) (pp. 7011-7020). https://doi.org/10.1109/BigData62323.2024.10825972
Gilbert, C., & Gilbert, M. (2024). Privacy-preserving data mining and analytics in big data environments. Global Scientific Journals, 12(12). http://doi.org/10.2139/ssrn.5258795
Glenn, W. B. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1-3. https://doi.org/10.1175/1520-0493(1950)078%3C0001:VOFEIT%3E2.0.CO;2
Gomer, B., & Yuan, K.-H. (2023). A realistic evaluation of methods for handling missing data when there is a mixture of MCAR, MAR, and MNAR mechanisms in the same dataset. Multivariate Behavioral Research, 58(5), 988-1013. https://doi.org/10.1080/00273171.2022.2158776
Gorishniy, Y., Rubachev, I., Khrulkov, V., & Babenko, A. (2021, December 6-14). Revisiting deep learning models for tabular data. In: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. Wortman Vaughan (Eds.), Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS'21), (pp. 18932-18943). https://doi.org/10.48550/arXiv.2106.11959
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017, August 6-11). On calibration of modern neural networks. In: D. Precup, & Y. W. The (Eds.), Proceedings of the 34th International Conference on Machine Learning (ICML'17), (pp. 1321-1330). https://doi.org/10.48550/arXiv.1706.04599
Hasan, Md. S., Jakir, T., Hossain, A., Khan, Md. T., Sultana, K. S., Ahad, Md. A., Rabbi, Md. N. S., Billah, M., Rahman, Md. S., Udden, Md. A., & Ripa, S. J. (2025). Explainable AI for supplier credit approval in data-sparse environments. International Journal of Applied Mathematics, 38(5s), 1153-1177. https://doi.org/10.12732/ijam.v38i5s.380
Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Statistical Science, 1(3), 297-318.
Hossain, Md. A., Saif, S., & Islam, Md. S. (2024, May 14-15). Interpretable Machine Learning for IoT Security: Feature Selection and Explainability in Botnet Intrusion Detection using Extra Trees Classifier. In: 2024 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR) (pp. 1-6), Muscat, Oman. https://doi.org/10.1109/ICIESTR60916.2024.10798158
Huang, X., Khetan, A., Cvitkovic, M., & Karnin, Z. (2020). Tabtransformer: Tabular data modeling using contextual embeddings. https://doi.org/10.48550/arXiv.2012.06678
Insalata, B. (2024). Enhancing Multimodal Systems for Survival Prediction with Tabular Transformers. MSc Thesis, Politecnico di Milano.
Ipsen, N. B., Mattei, P. A., & Frellsen, J. (2021, May 3-7). not-MIWAE: Deep generative modelling with missing not at random data. In: The Ninth International Conference on Learning Representations (ICLR 2021), https://doi.org/10.48550/arXiv.2006.12871
Islam, Md. R., & Ikbal, Md. Z. (2022). Impact of Predictive Data Modeling on Business Decision-Making: A Review of Studies Across Retail, Finance, and Logistics. American Journal of Advanced Technology and Engineering Solutions, 2(02), 33-62. https://doi.org/10.63125/8hfbkt70
Li, Z., Lei, H., Ma, E., Lai, J., & Qiu, J. (2023). Ensemble technique to predict post-earthquake damage of buildings integrating tree-based models and tabular neural networks. Computers & Structures, 287, 107114. https://doi.org/10.1016/j.compstruc.2023.107114
Little, R., & Rubin, D. (2019). Statistical analysis with missing data. John Wiley & Sons. http://doi.org/10.1002/9781119482260
Loshchilov, I., & Hutter, F. (2019, May 6-9). Decoupled weight decay regularization. In: The Seventh International Conference on Learning Representations (ICLR 2019), New Orleans. https://doi.org/10.48550/arXiv.1711.05101
McElfresh, D., Khandagale, S., Valverde, J., Prasad C, V., Ramakrishnan, G., Goldblum, M., & White, C. (2023, December 10-16). When do neural nets outperform boosted trees on tabular data?. In: A. Oh, T. Naumann, A. Globerson K. Saenko, M. Hardt, & S. Levine (Eds.), Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS'23), (pp. 76336-76369), New Orleans LA USA. https://doi.org/10.48550/arXiv.2305.02997
Nixon, J., Dusenberry, M. W., Zhang, L., Jerfel, G., & Tran, D. (2019, June 18-20). Measuring calibration in deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (pp. 38-41), Long Beach, California.
Pham, T. M., Pandis, N., & White, I. R. (2022). Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter. American Journal of Orthodontics and Dentofacial Orthopedics, 162(1), 138-139. https://doi.org/10.1016/j.ajodo.2022.04.001
Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353-383. https://doi.org/10.1076/edre.7.4.353.8937
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (Eds.). (2008). Dataset shift in machine learning (Neural Information Processing series). The MIT Press. https://doi.org/10.7551/mitpress/9780262170055.001.0001
Sauber-Cole, R., & Khoshgoftaar, T. M. (2022). The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey. Journal of Big Data, 9(1), 98. https://doi.org/10.1186/s40537-022-00648-6
Servén, D., & Brummitt, C. (2018, March 27). pygam: Generalized additive models in python. Zenodo. (Version v0.4.1)
Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B., & Goldstein, T. (2022). SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. In: The Thirty-Sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022), Table Representation Learning Workshop. https://doi.org/10.48550/arXiv.2106.01342
Somvanshi, S., Das, S., Javed, S. A., Antariksa, G., & Hossain, A. (2024). A survey on deep tabular learning. https://doi.org/10.48550/arXiv.2410.12034
Uddin, M. S., Ahmed, A., Aktarujjaman, M., Moniruzzaman, M., Ahmed, M., Mridha, M. F., & Hossen, M. J. (2025). A hybrid reinforcement learning and knowledge graph framework for financial risk optimization in healthcare systems. Scientific Reports, 15(1), 29057. https://doi.org/10.1038/s41598-025-14355-8
Umer, M., Tahir, M., Sardaraz, M., Sharif, M., Elmannai, H., & Algarni, A. D. (2025). Network intrusion detection model using wrapper based feature selection and multi head attention transformers. Scientific Reports, 15(1), 28718. https://doi.org/10.1038/s41598-025-11348-5
Yıldız, A. Y., & Kalayci, A. (2025, June 24). Gradient boosting decision trees on medical diagnosis over tabular data. In: 2025 IEEE International Conference on AI and Data Analytics (ICAD), (pp. 1-8). https://doi.org/10.1109/ICAD65464.2025.11114069
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1), 32-35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3
Zschech, P., Weinzierl, S., Hambauer, N., Zilker, S., & Kraus, M. (2022). GAM (e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints. https://doi.org/10.48550/arXiv.2204.09123

Details

Primary Language

English

Subjects

Deep Learning, Neural Networks

Journal Section

Research Article

Authors

Mücahit Cihan ^*
0000-0002-1426-319X
Türkiye

Publication Date

June 30, 2026

Submission Date

January 23, 2026

Acceptance Date

April 20, 2026

Published in Issue

Year 2026 Volume: 13 Number: 2

DOI

https://doi.org/10.54287/gujsa.1870409

IZ

https://izlik.org/JA26JA49YT

Cite

RIS / Bibtex

APA

Cihan, M. (2026). Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness. Gazi University Journal of Science Part A: Engineering and Innovation, 13(2), 865-905. https://doi.org/10.54287/gujsa.1870409

AMA

1.Cihan M. Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness. GU J Sci, Part A. 2026;13(2):865-905. doi:10.54287/gujsa.1870409

Chicago

Cihan, Mücahit. 2026. “Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness”. Gazi University Journal of Science Part A: Engineering and Innovation 13 (2): 865-905. https://doi.org/10.54287/gujsa.1870409.

EndNote

Cihan M (June 1, 2026) Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness. Gazi University Journal of Science Part A: Engineering and Innovation 13 2 865–905.

IEEE

[1]M. Cihan, “Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness”, GU J Sci, Part A, vol. 13, no. 2, pp. 865–905, June 2026, doi: 10.54287/gujsa.1870409.

ISNAD

Cihan, Mücahit. “Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness”. Gazi University Journal of Science Part A: Engineering and Innovation 13/2 (June 1, 2026): 865-905. https://doi.org/10.54287/gujsa.1870409.

JAMA

1.Cihan M. Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness. GU J Sci, Part A. 2026;13:865–905.

MLA

Cihan, Mücahit. “Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness”. Gazi University Journal of Science Part A: Engineering and Innovation, vol. 13, no. 2, June 2026, pp. 865-0, doi:10.54287/gujsa.1870409.

Vancouver

1.Mücahit Cihan. Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness. GU J Sci, Part A. 2026 Jun. 1;13(2):865-90. doi:10.54287/gujsa.1870409