Araştırma Makalesi

Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams

Cilt: 15 Sayı: 2 1 Temmuz 2026
PDF İndir
EN TR

Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams

Öz

This study proposes a novel multi-source deep learning architecture, called the Gated Cross-Modal Fusion Transformer (GCM-FT), designed to more effectively integrate the complementary structure of visual and auditory information sources in scene classification. The proposed framework extracts deep representations from the visual stream using an EfficientNetV2 backbone, while processing the MFCC-based time–frequency features provided within the dataset for the auditory stream. The representation vectors obtained from both streams are dynamically unified through a gated attention mechanism. With its multi-headed loss function, auxiliary stream outputs, and attention-based fusion block, the model is able to learn the contributions of visual and auditory information in a stable and balanced manner. Extensive cross-validation experiments demonstrate that GCM-FT achieves higher accuracy, lower variance, and more consistent class-wise performance compared with single-stream models and existing fused-information approaches. These findings indicate that attention-guided fusion offers a powerful and generalizable information integration strategy for visual–auditory scene classification tasks.

Anahtar Kelimeler

Kaynakça

  1. Çelik Y. Application of deep learning for voice command classification in Turkish language. Bitlis Eren University Journal of Science. 2024;13(3):701–708. doi:10.17798/bitlisfen.1477191.
  2. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017;60(6):84–90. doi: 10.1145/3065386.
  3. Güneş H, Hark C, Akkaya AE. Comparison of Deep Learning Models and Optimization Algorithms in the Detection of Scoliosis and Spondylolisthesis from X-Ray Images. Sakarya University Journal of Science. 2024;28(2):438–451. doi:10.16984/saufenbilder.1246001.
  4. Doğan F, Aktaş M, Gürsoy Mİ. Classification of Skin Diseases with Different Deep Learning Models and Comparison of the Performances of the Models. TDFD. 2024;13(3):117–123. doi:10.46810/tdfd.1502471.
  5. Ceylan T, İnik Ö. Development of an Effective Deep Learning Model for COVID-19 Detection from CT Images. Tr. J. Nature Sci. 2025;14(1):156–166. doi:10.46810/tdfd.1472034.
  6. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv Preprint. 2013;arXiv:1301.3781.
  7. Liu X, Wang H, Li Z, Qin L. Deep learning in ECG diagnosis: A review. Knowledge-Based Systems. 2021;227:107187 doi: 10.1016/j.knosys.2021.107187.
  8. İnik Ö. Classification of Scenes in Aerial Images with Deep Learning Models. TDFD. 2023;12(1):37–43. doi:10.46810/tdfd.1225756.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Bilgi Sistemleri (Diğer)

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

1 Temmuz 2026

Gönderilme Tarihi

22 Kasım 2025

Kabul Tarihi

1 Haziran 2026

Yayımlandığı Sayı

Yıl 2026 Cilt: 15 Sayı: 2

Kaynak Göster

APA
Çelik, Y. (2026). Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams. Turkish Journal of Nature and Science, 15(2), 207-214. https://doi.org/10.46810/tdfd.1828359
AMA
1.Çelik Y. Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams. TDFD. 2026;15(2):207-214. doi:10.46810/tdfd.1828359
Chicago
Çelik, Yusuf. 2026. “Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams”. Turkish Journal of Nature and Science 15 (2): 207-14. https://doi.org/10.46810/tdfd.1828359.
EndNote
Çelik Y (01 Temmuz 2026) Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams. Turkish Journal of Nature and Science 15 2 207–214.
IEEE
[1]Y. Çelik, “Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams”, TDFD, c. 15, sy 2, ss. 207–214, Tem. 2026, doi: 10.46810/tdfd.1828359.
ISNAD
Çelik, Yusuf. “Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams”. Turkish Journal of Nature and Science 15/2 (01 Temmuz 2026): 207-214. https://doi.org/10.46810/tdfd.1828359.
JAMA
1.Çelik Y. Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams. TDFD. 2026;15:207–214.
MLA
Çelik, Yusuf. “Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams”. Turkish Journal of Nature and Science, c. 15, sy 2, Temmuz 2026, ss. 207-14, doi:10.46810/tdfd.1828359.
Vancouver
1.Yusuf Çelik. Scene Classification via Attention-Guided Integration of Visual and Auditory Data Streams. TDFD. 01 Temmuz 2026;15(2):207-14. doi:10.46810/tdfd.1828359