<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                                                <journal-id>saucis</journal-id>
            <journal-title-group>
                                                                                    <journal-title>Sakarya University Journal of Computer and Information Sciences</journal-title>
            </journal-title-group>
                                        <issn pub-type="epub">2636-8129</issn>
                                                                                            <publisher>
                    <publisher-name>Sakarya University</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.35377/saucis...1517723</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Computer Software</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Bilgisayar Yazılımı</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                                                            <article-title>Joint Detection and Removal of Specular Highlights using Vision Transformer with Multi-scale Patch Attention</article-title>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0003-2764-5258</contrib-id>
                                                                <name>
                                    <surname>Karacan</surname>
                                    <given-names>Levent</given-names>
                                </name>
                                                                    <aff>GAZIANTEP UNIVERSITY</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20250328">
                    <day>03</day>
                    <month>28</month>
                    <year>2025</year>
                </pub-date>
                                        <volume>8</volume>
                                        <issue>1</issue>
                                        <fpage>47</fpage>
                                        <lpage>57</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20240717">
                        <day>07</day>
                        <month>17</month>
                        <year>2024</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20250222">
                        <day>02</day>
                        <month>22</month>
                        <year>2025</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2018, Sakarya University Journal of Computer and Information Sciences</copyright-statement>
                    <copyright-year>2018</copyright-year>
                    <copyright-holder>Sakarya University Journal of Computer and Information Sciences</copyright-holder>
                </permissions>
            
                                                                                                                        <abstract><p>Specular highlights play a pivotal role in comprehending scenes within developed visual environment. Nevertheless, their presence can adversely affect the efficacy of solutions in various computer vision tasks. Current methodologies typically use Convolutional Neural Network (CNN)-based Unet architectures for specular highlight detection. However, CNNs exhibit limitations in capturing global contextual information, despite excelling in local context analysis. To utilize global context information, it is proposed a novel network architecture leveraging Vision Transformers (ViTs) to jointly detect and remove specular highlights for a given image. Developed model incorporates a multi-scale patch-based self-attention mechanism to effectively capture global context, alongside a CNN-based feed-forward network for local contextual cues. Experimental results with both quantitative and qualitative evaluations demonstrate that the proposed approach achieves state-of-the-art performance.</p></abstract>
                                                            
            
                                                                                        <kwd-group>
                                                    <kwd>Specular highlight detection</kwd>
                                                    <kwd>  Specular highlight removal</kwd>
                                                    <kwd>  Vision transformers</kwd>
                                                    <kwd>  Convolutional neural networks</kwd>
                                            </kwd-group>
                            
                                                                                                                                                    </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">S. Jiddi, P. Robert, and E. Marchand, “Detecting specular reflections and cast shadows to estimate reflectance and illumination of dynamic indoor scenes,” IEEE Trans. Vis. Comput. Graph., vol. 28, no. 2, pp. 1249–1260, 2020.</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">S. A. Shafer, “Using color to separate reflection components,” Color Res. Appl., vol. 10, no. 4, pp. 210–218, 1985.</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">L. T. Maloney and B. A. Wandell, “Color constancy: a method for recovering surface spectral reflectance,” in Readings in Computer Vision, Elsevier, 1987, pp. 293–297.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">Osadchy and Ramamoorthi, “Using specularities for recognition,” in IEEE ICCV, IEEE, 2003, pp. 1512–1519.</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">J. B. Park and A. C. Kak, “A truncated least squares approach to detecting specular highlights in color images,” in IEEE ICRA, IEEE, 2003, pp. 1397–1403.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">O. El Meslouhi, M. Kardouchi, H. Allali, T. Gadi, and Y. A. Benkaddour, “Automatic detection and inpainting of specular reflections for colposcopic images,” Cent. Eur. J. Comput. Sci., vol. 1, pp. 341–354, 2011.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">R. Li, J. Pan, Y. Si, B. Yan, Y. Hu, and H. Qin, “Specular reflections removal for endoscopic image sequences with adaptive-RPCA decomposition,” IEEE Trans. Med. Imaging, vol. 39, no. 2, pp. 328–340, 2019.</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">W. Zhang, X. Zhao, J.-M. Morvan, and L. Chen, “Improving shadow suppression for illumination robust face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 3, pp. 611–624, 2018.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">Q. Yang, S. Wang, and N. Ahuja, “Real-time specular highlight removal using bilateral filtering,” in Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, Springer, 2010, pp. 87–100.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">H. Kim, H. Jin, S. Hadap, and I. Kweon, “Specular reflection separation using dark channel prior,” in IEEE CVPR, 2013, pp. 1460–1467.</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">Q. Yang, J. Tang, and N. Ahuja, “Efficient and robust specular highlight removal,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 6, pp. 1304–1311, 2014.</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">Y. Liu, Z. Yuan, N. Zheng, and Y. Wu, “Saturation-preserving specular reflection separation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3725–3733.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">J. Suo, D. An, X. Ji, H. Wang, and Q. Dai, “Fast and high quality highlight removal from a single image,” IEEE Trans. Image Process., vol. 25, no. 11, pp. 5441–5454, 2016.</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">T. Yamamoto, T. Kitajima, and R. Kawauchi, “Efficient improvement method for separation of reflection components based on an energy function,” in 2017 IEEE international conference on image processing (ICIP), IEEE, 2017, pp. 4222–4226.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">I. Funke, S. Bodenstedt, C. Riediger, J. Weitz, and S. Speidel, “Generative adversarial networks for specular highlight removal in endoscopic images,” in Medical Imaging 2018: Image-Guided Procedures, Robotic Interventions, and Modeling, SPIE, 2018, pp. 8–16.</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">S. Muhammad, M. N. Dailey, M. Farooq, M. F. Majeed, and M. Ekpanyapong, “Spec-Net and Spec-CGAN: Deep learning models for specularity removal from faces,” Image Vis. Comput., vol. 93, p. 103823, 2020.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">G. Fu, Q. Zhang, Q. Lin, L. Zhu, and C. Xiao, “Learning to Detect Specular Highlights from Real-world Images,” in ACM Multimedia, 2020, pp. 1873–1881.</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">G. Fu, Q. Zhang, L. Zhu, P. Li, and C. Xiao, “A multi-task network for joint specular highlight detection and removal,” in IEEE/CVF CVPR, 2021, pp. 7752–7761.</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">Z. Wu, C. Zhuang, J. Shi, J. Xiao, and J. Guo, “Deep specular highlight removal for single real-world image,” in SIGGRAPH Asia 2020 Posters, 2020, pp. 1–2.</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">G. Fu, Q. Zhang, L. Zhu, C. Xiao, and P. Li, “Towards High-Quality Specular Highlight Removal by Leveraging Large-Scale Synthetic Data,” in IEEE/CVF ICCV, 2023, pp. 12857–12865.</mixed-citation>
                    </ref>
                                    <ref id="ref21">
                        <label>21</label>
                        <mixed-citation publication-type="journal">Z. Wu, J. Guo, C. Zhuang, J. Xiao, D.-M. Yan, and X. Zhang, “Joint specular highlight detection and removal in single images via Unet-Transformer,” Comput. Vis. Media, vol. 9, no. 1, pp. 141–154, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref22">
                        <label>22</label>
                        <mixed-citation publication-type="journal">J. Shi, Y. Dong, H. Su, and S. X. Yu, “Learning non-lambertian object intrinsics across shapenet categories,” in IEEE CVPR, 2017, pp. 1685–1694.</mixed-citation>
                    </ref>
                                    <ref id="ref23">
                        <label>23</label>
                        <mixed-citation publication-type="journal">Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada: IEEE, Oct. 2021, pp. 9992–10002. doi: 10.1109/ICCV48922.2021.00986.</mixed-citation>
                    </ref>
                                    <ref id="ref24">
                        <label>24</label>
                        <mixed-citation publication-type="journal">A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in International Conference on Learning Representations, 2020.</mixed-citation>
                    </ref>
                                    <ref id="ref25">
                        <label>25</label>
                        <mixed-citation publication-type="journal">Y. Li, K. Zhang, J. Cao, R. Timofte, and L. Van Gool, “Localvit: Bringing locality to vision transformers,” ArXiv Prepr. ArXiv210405707, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref26">
                        <label>26</label>
                        <mixed-citation publication-type="journal">L. Karacan, “Multi-image transformer for multi-focus image fusion,” Signal Process. Image Commun., vol. 119, p. 117058, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref27">
                        <label>27</label>
                        <mixed-citation publication-type="journal">J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE CVPR, 2018, pp. 7132–7141.</mixed-citation>
                    </ref>
                                    <ref id="ref28">
                        <label>28</label>
                        <mixed-citation publication-type="journal">C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3, Springer, 2017, pp. 240–248.</mixed-citation>
                    </ref>
                                    <ref id="ref29">
                        <label>29</label>
                        <mixed-citation publication-type="journal">H.-L. Shen, H.-G. Zhang, S.-J. Shao, and J. H. Xin, “Chromaticity-based separation of reflection components in a single image,” Pattern Recognit., vol. 41, no. 8, pp. 2461–2469, 2008.</mixed-citation>
                    </ref>
                                    <ref id="ref30">
                        <label>30</label>
                        <mixed-citation publication-type="journal">J. Lin, M. El Amine Seddik, M. Tamaazousti, Y. Tamaazousti, and A. Bartoli, “Deep multi-class adversarial specularity removal,” in Image Analysis: 21st Scandinavian Conference, SCIA 2019, Norrköping, Sweden, June 11–13, 2019, Proceedings 21, Springer, 2019, pp. 3–15.</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
