Single-Image HDR Reconstruction with Attention-Driven Autoencoder
Yıl 2025,
Cilt: 15 Sayı: 3, 45 - 54, 29.09.2025
Cevher Renk
,
Serdar Çiftçi
Öz
High Dynamic Range (HDR) imaging enables the representation of details in both bright and dark areas of a scene, aligning closely with human visual perception. Traditional multi-exposure HDR methods face challenges such as ghosting, hardware dependency, and high processing costs. This study adopts a baseline model that synthesizes multi-exposure representations from a single SDR image using a U-Net-like autoencoder, which is enhanced by integrating five distinct attention mechanisms: Spatial, Channel, Bottleneck, Squeeze-and-Excitation and Self Attention. Each attention module is individually embedded into the encoder and decoder layers to form separate model variants, all trained independently on the DrTMO dataset. Quantitative evaluations based on SSIM, PSNR, and LPIPS demonstrate that the Spatial Attention variant delivers the best performance across all metrics. The results highlight that incorporating attention mechanisms into autoencoder-based HDR reconstruction architectures significantly enhances both structural fidelity and perceptual image quality, making them promising for efficient single-image HDR synthesis.
Kaynakça
-
[1] P.-H. Le, Q. Le, R. Nguyen, and B.-S. Hua, “Single-image HDR reconstruction by multi-exposure generation,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2023, pp. 4063–4072.
-
[2] Z. Liu, Y. Wang, B. Zeng, and S. Liu, “Ghost-free high dynamic range imaging with context-aware transformer,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Cham, Switzerland: Springer, Oct. 2022, pp. 344–360.
-
[3] Y.-L. Liu, W.-S. Lai, Y.-S. Chen, Y.-L. Kao, M.-H. Yang, Y.-Y. Chuang, and J.-B. Huang, “Single-image HDR reconstruction by learning to reverse the camera pipeline,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1651–1660.
-
[4] X. Chen, Y. Liu, Z. Zhang, Y. Qiao, and C. Dong, “HDRUNet: Single image HDR reconstruction with denoising and dequantization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 354–363.
[5] K.-Y. Chen and J.-J. Leou, “Low light image enhancement using autoencoder-based deep neural networks,” in Image Processing, Computer Vision, and Pattern Recognition and Information and Knowledge Engineering, Las Vegas, NV, USA: Springer, 2025, pp. 73–84. doi: 10.1007/978-3-031-85933-5_6.
-
[6] A. Sharif, S. M. Naqvi, M. Biswas, and S. Kim, “A two-stage deep network for high dynamic range image reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 550–559.
-
[7] S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 6881–6890.
-
[8] Q. Yan, T. Hu, G. Chen, W. Dong, and Y. Zhang, “Boosting HDR image reconstruction via semantic knowledge transfer,” arXiv preprint arXiv:2503.15361, 2025.
-
[9] J. Ma and H. Zhang, “HDR-DANet: Single HDR image reconstruction via dual attention,” Multimedia Syst., vol. 31, no. 1, pp. 1–12, 2025.
-
[10] S. Y. Kim, J. Oh, and M. Kim, “JSI-GAN: GAN-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for UHD HDR video,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, Apr. 2020, pp. 11287–11295.
-
[11] Y. I. Park, J. W. Song, and S. J. Kang, “65‐3: Invited paper: Deep learning‐based image enhancement for HDR imaging,” in SID Symp. Dig. Tech. Papers, vol. 53, no. 1, Jun. 2022, pp. 865–868.
-
[12] A. de Santana Correia and E. L. Colombini, “Attention, please! A survey of neural attention models in deep learning,” Artif. Intell. Rev., vol. 55, no. 8, pp. 6037–6124, 2022.
-
[13] J. Shen and T. Wu, “Learning spatially-adaptive squeeze-excitation networks for few-shot image synthesis,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2023, pp. 2855–2859.
-
[14] Z. Zhang, Z. Xu, X. Gu, and J. Xiong, “Cross-CBAM: A lightweight network for scene segmentation,” arXiv preprint arXiv:2306.02306, 2023.
-
[15] S. Mekruksavanich and A. Jitpattanakul, “Hybrid convolution neural network with channel attention mechanism for sensor-based human activity recognition,” Sci. Rep., vol. 13, no. 1, p. 12067, 2023.
-
[16] A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, “Bottleneck transformers for visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 16519–16529.
-
[17] H. Y. Lin, Y. R. Lin, W. C. Lin, and C. C. Chang, “Reconstructing high dynamic range image from a single low dynamic range image using histogram learning,” Appl. Sci., vol. 14, no. 21, p. 9847, 2024.
-
[18] H. Zhang, K. Zu, J. Lu, Y. Zou, and D. Meng, “EPSANet: An efficient pyramid squeeze attention block on convolutional neural network,” in Proc. Asian Conf. Comput. Vis. (ACCV), 2022, pp. 1161–1177.
-
[19] B. Zhang, S. Gu, B. Zhang, J. Bao, D. Chen, F. Wen, et al., “StyleSwin: Transformer-based GAN for high-resolution image generation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 11304–11314.
-
[20] Y. Al Najjar, “Comparative analysis of image quality assessment metrics: MSE, PSNR, SSIM and FSIM,” Int. J. Sci. Res. (IJSR), vol. 13, no. 3, pp. 110–114, 2024.
-
[21] M. Azimi, “PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR,” in Proc. Picture Coding Symp. (PCS), Jun. 2021, pp. 1–5.
-
[22] A. Ghildyal and F. Liu, “Shift-tolerant perceptual similarity metric,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Cham, Switzerland: Springer, Oct. 2022, pp. 91–107.
Dikkat Tabanlı Otokodlayıcı ile Tek Görüntüden YDA Yapılandırması
Yıl 2025,
Cilt: 15 Sayı: 3, 45 - 54, 29.09.2025
Cevher Renk
,
Serdar Çiftçi
Öz
Yüksek Dinamik Aralıklı (YDA) görüntüleme, sahnedeki aydınlık ve karanlık bölgelerdeki ayrıntıların eş zamanlı temsilini sağlayarak insan görsel algısına daha yakın sonuçlar üretir. Geleneksel YDA yöntemleri çoğunlukla çoklu pozlamaya dayandığından, hareketli sahnelerde hayaletlenme, donanımsal sınırlamalar ve yüksek işlem süresi gibi dezavantajlar barındırır. Bu çalışmada, yalnızca tek bir Standart Dinamik Aralık (SDA) görüntüden sahte çoklu pozlamalar üreterek YDA sahne oluşturan temel bir mimari referans alınmış ve U-Net benzeri otokodlayıcı (encoder) yapısı, farklı dikkat mekanizmalarıyla (attention mechanisms) yeniden yapılandırılmıştır. Modelin kodlayıcı (encoder) ve kodçözücü (decoder) katmanlarına sırasıyla Spatial, Channel, Bottleneck, Squeeze-and-Excitation ve Self Attention olmak üzere beş dikkat mekanizması tekil olarak entegre edilmiş ve her bir varyant DrTMO veri seti üzerinde bağımsız şekilde eğitilerek karşılaştırmalı analiz yapılmıştır. Deneysel değerlendirmeler SSIM, PSNR ve LPIPS metrikleri üzerinden gerçekleştirilmiş; Spatial Attention modülünün tüm kriterlerde en başarılı sonuçları sunduğu gözlemlenmiştir. Elde edilen bulgular, dikkat mekanizmalarının otokodlayıcı tabanlı tek görüntüden YDA üretim modellerine entegre edilmesinin, yapısal tutarlılık ve algısal kalite üzerinde olumlu etkiler yarattığını göstermektedir.
Etik Beyan
Bu çalışmada etik kurul izni gerektiren bir deney, anket, insan ya da hayvan deneyi yapılmamıştır. Makalede kullanılan tüm veriler kamuya açık kaynaklardan temin edilmiş olup, herhangi bir özel ya da kişisel veri kullanılmamıştır.
Yazar(lar), çalışmanın tüm akademik ve etik kurallara uygun olarak hazırlandığını, intihal, sahtecilik, çarpıtma, tekrar yayın, dilimleme, haksız yazarlık ve benzeri etik dışı davranışlarda bulunulmadığını beyan eder.
Çalışmaya katkı sağlayan tüm yazarlar, makale içeriğini onaylamış ve gönderim süreci hakkında bilgilendirilmiştir. Herhangi bir çıkar çatışması bulunmamaktadır.
Destekleyen Kurum
Bu çalışma herhangi bir kurum ya da kuruluş tarafından maddi olarak desteklenmemiştir.
Kaynakça
-
[1] P.-H. Le, Q. Le, R. Nguyen, and B.-S. Hua, “Single-image HDR reconstruction by multi-exposure generation,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2023, pp. 4063–4072.
-
[2] Z. Liu, Y. Wang, B. Zeng, and S. Liu, “Ghost-free high dynamic range imaging with context-aware transformer,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Cham, Switzerland: Springer, Oct. 2022, pp. 344–360.
-
[3] Y.-L. Liu, W.-S. Lai, Y.-S. Chen, Y.-L. Kao, M.-H. Yang, Y.-Y. Chuang, and J.-B. Huang, “Single-image HDR reconstruction by learning to reverse the camera pipeline,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1651–1660.
-
[4] X. Chen, Y. Liu, Z. Zhang, Y. Qiao, and C. Dong, “HDRUNet: Single image HDR reconstruction with denoising and dequantization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 354–363.
[5] K.-Y. Chen and J.-J. Leou, “Low light image enhancement using autoencoder-based deep neural networks,” in Image Processing, Computer Vision, and Pattern Recognition and Information and Knowledge Engineering, Las Vegas, NV, USA: Springer, 2025, pp. 73–84. doi: 10.1007/978-3-031-85933-5_6.
-
[6] A. Sharif, S. M. Naqvi, M. Biswas, and S. Kim, “A two-stage deep network for high dynamic range image reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 550–559.
-
[7] S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 6881–6890.
-
[8] Q. Yan, T. Hu, G. Chen, W. Dong, and Y. Zhang, “Boosting HDR image reconstruction via semantic knowledge transfer,” arXiv preprint arXiv:2503.15361, 2025.
-
[9] J. Ma and H. Zhang, “HDR-DANet: Single HDR image reconstruction via dual attention,” Multimedia Syst., vol. 31, no. 1, pp. 1–12, 2025.
-
[10] S. Y. Kim, J. Oh, and M. Kim, “JSI-GAN: GAN-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for UHD HDR video,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, Apr. 2020, pp. 11287–11295.
-
[11] Y. I. Park, J. W. Song, and S. J. Kang, “65‐3: Invited paper: Deep learning‐based image enhancement for HDR imaging,” in SID Symp. Dig. Tech. Papers, vol. 53, no. 1, Jun. 2022, pp. 865–868.
-
[12] A. de Santana Correia and E. L. Colombini, “Attention, please! A survey of neural attention models in deep learning,” Artif. Intell. Rev., vol. 55, no. 8, pp. 6037–6124, 2022.
-
[13] J. Shen and T. Wu, “Learning spatially-adaptive squeeze-excitation networks for few-shot image synthesis,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2023, pp. 2855–2859.
-
[14] Z. Zhang, Z. Xu, X. Gu, and J. Xiong, “Cross-CBAM: A lightweight network for scene segmentation,” arXiv preprint arXiv:2306.02306, 2023.
-
[15] S. Mekruksavanich and A. Jitpattanakul, “Hybrid convolution neural network with channel attention mechanism for sensor-based human activity recognition,” Sci. Rep., vol. 13, no. 1, p. 12067, 2023.
-
[16] A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, “Bottleneck transformers for visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 16519–16529.
-
[17] H. Y. Lin, Y. R. Lin, W. C. Lin, and C. C. Chang, “Reconstructing high dynamic range image from a single low dynamic range image using histogram learning,” Appl. Sci., vol. 14, no. 21, p. 9847, 2024.
-
[18] H. Zhang, K. Zu, J. Lu, Y. Zou, and D. Meng, “EPSANet: An efficient pyramid squeeze attention block on convolutional neural network,” in Proc. Asian Conf. Comput. Vis. (ACCV), 2022, pp. 1161–1177.
-
[19] B. Zhang, S. Gu, B. Zhang, J. Bao, D. Chen, F. Wen, et al., “StyleSwin: Transformer-based GAN for high-resolution image generation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 11304–11314.
-
[20] Y. Al Najjar, “Comparative analysis of image quality assessment metrics: MSE, PSNR, SSIM and FSIM,” Int. J. Sci. Res. (IJSR), vol. 13, no. 3, pp. 110–114, 2024.
-
[21] M. Azimi, “PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR,” in Proc. Picture Coding Symp. (PCS), Jun. 2021, pp. 1–5.
-
[22] A. Ghildyal and F. Liu, “Shift-tolerant perceptual similarity metric,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Cham, Switzerland: Springer, Oct. 2022, pp. 91–107.