Performance Comparison in R-CNN and Dalle-3 Based Image Processing

Mehmet Akif Özdal

Research Article

Performance Comparison in R-CNN and Dalle-3 Based Image Processing

Year 2025, Volume: 6 Issue: 2, 1 - 26, 23.12.2025

Abstract

Image processing involves the manipulation and analysis of digital images. Artificial intelligence encompasses technologies that mimic human intelligence. The integration of these two fields provides improvements in terms of efficiency and accuracy in applications such as automatic image recognition, object detection and classification.
In this context, Faster R-CNN deep learning model and Dalle-3 artificial intelligence program were analyzed with descriptive statistics method using Python. In this process, object recognition and tracking abilities in the fields of art and design, educational technologies and security systems were evaluated in terms of creativity and limited to the Faster R-CNN deep learning model and Dalle-3 artificial intelligence by adopting comparative analysis and logical reasoning techniques from qualitative research methods.
The findings show that deep learning and object detection technologies have significant potential to solve complex image processing problems and enhance creative problem solving capacities. The results reveal that these technologies have strategic advantages and the ability to provide creative solutions even under challenging visual factors, and provide recommendations for future use and development.

Keywords

Image Processing , Deep Learning , Object Tracking , Artificial Intelligence , R-CNN

References

Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5, 115-133.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Özdal, M. A. (2023). Yapay Zekâ İle Oluşturulan Eserlerin Telif Hakkı Ve Kişisel Verilerin Korunması. Hakkari Review, 7(1), 90-110.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104, 154-171.
Özdal, M. A. (2020). Dijital Sanatta Gerçekliğin Yeri. Uluslararası Sanat Tasarım ve Eğitim Dergisi, 4(2), 11-21.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.
Özdal, M. A. (2024). Yapay zekâ ile üretilen görsel ve illüstrasyon eserlerinin telif hakları ve kişisel veri güvenliği. Disiplinlerarası Yenilik Araştırmaları Dergisi, 4(1), 7-31.
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104, 154-171.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Chakraborty, T., KS, U. R., Naik, S. M., Panja, M., & Manvitha, B. (2024). Ten years of generative adversarial nets (GANs): a survey of the state-of-the-art. Machine Learning: Science and Technology, 5(1), 011001.
Tan, B., Li, Y., Zhao, H., Li, X., & Ding, S. (2020). A novel dictionary learning method for sparse representation with nonconvex regularizations. Neurocomputing, 417, 128-141.
İnik, Ö., & Ülker, E. (2017). Derin öğrenme ve görüntü analizinde kullanılan derin öğrenme modelleri. Gaziosmanpaşa Bilimsel Araştırma Dergisi, 6(3), 85-104.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6142-6151).
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Leng, G. (2018). The heart of the brain: the hypothalamus and its hormones. MIT Press.
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Özdal, M. A. Yapay Zeka Destekli Grafik Tasarımın Yasal Boyutu. Uluslararası İşletme Bilimi ve Uygulamaları Dergisi, 3(2), 53-78.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Leng, G. (2018). The heart of the brain: the hypothalamus and its hormones. MIT Press.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Sapkota, R., Ahmed, D., & Karkee, M. (2024). Creating Image Datasets in Agricultural Environments using DALL. E: Generative AI-Powered Large Language Model. E: Generative AI-Powered Large Language Model (March 24, 2024).
Li, H., Gu, J., Koner, R., Sharifzadeh, S., & Tresp, V. (2023). Do Dalle and flamingo understand each other?. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1999-2010).
Ge, Y., Xu, J., Zhao, B. N., Joshi, N., Itti, L., & Vineet, V. (2022). Dalle for Detection: Language-driven Compositional Image Synthesis for Object Detection. arXiv preprint arXiv:2206.09592.
Russell[1] From Faster R-CNN to Mask R-CNN. (2022), https://elifmeseci.medium.com/r-cnn-ailesi-part-ii-76cce9e4a9d6, Accessed March 13, 2024.
Chess pieces created with Dalle-3 Artificial intelligence program. (2024), https://chatgpt.com/g/g-2fkFE8rbu-dall-e/c/4cbf365e-845e-4c59-b2e7-327942906b18,Erişim Date: April 10, 2024.
Human Face Prototype created with Dalle-3 Artificial intelligence program. (2024), https://chatgpt.com/g/g-2fkFE8rbu-dall-e/c/4cbf365e-845e-4c59-b2e7-327942906b18,Erişim Date: April 17, 2024.
Comparison of Chess Pieces and Human Face created with Dalle-3 artificial intelligence program with artificial intelligence using Faster R-CNN model. (2024), https://chatgpt.com/g/g-2fkFE8rbu-dall-e/c/4cbf365e-845e-4c59-b2e7-327942906b18, Access Date: 18 April 2024.
Python Statistical Performance Analysis of the Object and Face Recognition Process Using Faster R-CNN and Dalle-3 Models. (2024), https://www.python.org/, Access Date: 12 May 2024.
Python Statistical Analysis of the Effects of Faster R-CNN and Dalle-3 Models on Creativity. Applications on Art, Educational Technologies and Security Systems. (2024), https://www.python.org/, Accessed May 15, 2024.
Accuracy - Loss graph showing model performance (2024), https://chatgpt.com/c/670af7ee-96e8-8001-bccf-6c4a0536cc64

R-CNN ve Dalle-3 Tabanlı Görüntü İşlemede Performans Karşılaştırması

Year 2025, Volume: 6 Issue: 2, 1 - 26, 23.12.2025

Mehmet Akif Özdal

Abstract

Görüntü işleme, dijital görüntülerin manipülasyonu ve analizini içerir. Yapay zeka, insan zekâsını taklit eden teknolojileri kapsar. Bu iki alanın entegrasyonu, otomatik görüntü tanıma, nesne tespiti ve sınıflandırma gibi uygulamalarda verimlilik ve doğruluk açısından iyileştirmeler sağlamaktadır. Bu çerçevede araştırma kapsamında, Faster R-CNN derin öğrenme modeli ve Dalle-3 yapay zeka programı, Python kullanılarak betimsel istatistik yöntemi ile incelenmiştir. Bu süreçte, sanat ve tasarım, eğitim teknolojileri ve güvenlik sistemleri alanlarında nesne tanıma ve takip yetenekleri yaratıcılık açısından değerlendirilmiş olup, nitel araştırma yöntemlerinden karşılaştırmalı analiz ve mantıksal akıl yürütme teknikleri benimsenerek, Faster R-CNN derin öğrenme modeli ve Dalle-3 yapay zekası ile sınırlandırılmıştır. Bulgular, derin öğrenme ve nesne tespiti teknolojilerinin karmaşık görüntü işleme sorunlarını çözme ve yaratıcı problem çözme kapasitelerini geliştirme konusunda önemli bir potansiyele sahip olduğunu. Sonuçlar, bu teknolojilerin zorlayıcı görüntüsel faktörler altında bile yaratıcı çözümler sunabilme yeteneğine ve stratejik avantajlara sahip olduğunu ortaya koyarak, ileriye dönük kullanım ve geliştirme için önerilerde bulunmuştur.

Keywords

Görüntü İşleme , Derin Öğrenme , Nesne Takibi , Yapay Zeka , R-CNN

References

Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5, 115-133.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Özdal, M. A. (2023). Yapay Zekâ İle Oluşturulan Eserlerin Telif Hakkı Ve Kişisel Verilerin Korunması. Hakkari Review, 7(1), 90-110.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104, 154-171.
Özdal, M. A. (2020). Dijital Sanatta Gerçekliğin Yeri. Uluslararası Sanat Tasarım ve Eğitim Dergisi, 4(2), 11-21.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.
Özdal, M. A. (2024). Yapay zekâ ile üretilen görsel ve illüstrasyon eserlerinin telif hakları ve kişisel veri güvenliği. Disiplinlerarası Yenilik Araştırmaları Dergisi, 4(1), 7-31.
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104, 154-171.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), 3.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Chakraborty, T., KS, U. R., Naik, S. M., Panja, M., & Manvitha, B. (2024). Ten years of generative adversarial nets (GANs): a survey of the state-of-the-art. Machine Learning: Science and Technology, 5(1), 011001.
Tan, B., Li, Y., Zhao, H., Li, X., & Ding, S. (2020). A novel dictionary learning method for sparse representation with nonconvex regularizations. Neurocomputing, 417, 128-141.
İnik, Ö., & Ülker, E. (2017). Derin öğrenme ve görüntü analizinde kullanılan derin öğrenme modelleri. Gaziosmanpaşa Bilimsel Araştırma Dergisi, 6(3), 85-104.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6142-6151).
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Leng, G. (2018). The heart of the brain: the hypothalamus and its hormones. MIT Press.
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Özdal, M. A. Yapay Zeka Destekli Grafik Tasarımın Yasal Boyutu. Uluslararası İşletme Bilimi ve Uygulamaları Dergisi, 3(2), 53-78.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Leng, G. (2018). The heart of the brain: the hypothalamus and its hormones. MIT Press.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Kantor, C., Rauby, B., Boussioux, L., Jehanno, E., & Talbot, H. (2020). Over-CAM: Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks.
Sapkota, R., Ahmed, D., & Karkee, M. (2024). Creating Image Datasets in Agricultural Environments using DALL. E: Generative AI-Powered Large Language Model. E: Generative AI-Powered Large Language Model (March 24, 2024).
Li, H., Gu, J., Koner, R., Sharifzadeh, S., & Tresp, V. (2023). Do Dalle and flamingo understand each other?. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1999-2010).
Ge, Y., Xu, J., Zhao, B. N., Joshi, N., Itti, L., & Vineet, V. (2022). Dalle for Detection: Language-driven Compositional Image Synthesis for Object Detection. arXiv preprint arXiv:2206.09592.
Russell[1] From Faster R-CNN to Mask R-CNN. (2022), https://elifmeseci.medium.com/r-cnn-ailesi-part-ii-76cce9e4a9d6, Accessed March 13, 2024.
Chess pieces created with Dalle-3 Artificial intelligence program. (2024), https://chatgpt.com/g/g-2fkFE8rbu-dall-e/c/4cbf365e-845e-4c59-b2e7-327942906b18,Erişim Date: April 10, 2024.
Human Face Prototype created with Dalle-3 Artificial intelligence program. (2024), https://chatgpt.com/g/g-2fkFE8rbu-dall-e/c/4cbf365e-845e-4c59-b2e7-327942906b18,Erişim Date: April 17, 2024.
Comparison of Chess Pieces and Human Face created with Dalle-3 artificial intelligence program with artificial intelligence using Faster R-CNN model. (2024), https://chatgpt.com/g/g-2fkFE8rbu-dall-e/c/4cbf365e-845e-4c59-b2e7-327942906b18, Access Date: 18 April 2024.
Python Statistical Performance Analysis of the Object and Face Recognition Process Using Faster R-CNN and Dalle-3 Models. (2024), https://www.python.org/, Access Date: 12 May 2024.
Python Statistical Analysis of the Effects of Faster R-CNN and Dalle-3 Models on Creativity. Applications on Art, Educational Technologies and Security Systems. (2024), https://www.python.org/, Accessed May 15, 2024.
Accuracy - Loss graph showing model performance (2024), https://chatgpt.com/c/670af7ee-96e8-8001-bccf-6c4a0536cc64

There are 82 citations in total.

Details

Primary Language	English
Subjects	Deep Learning
Journal Section	Research Article
Authors	Mehmet Akif Özdal 0000-0003-3148-8988
Submission Date	March 15, 2025
Acceptance Date	July 2, 2025
Publication Date	December 23, 2025
Published in Issue	Year 2025 Volume: 6 Issue: 2

Cite

Vancouver	Özdal MA. Performance Comparison in R-CNN and Dalle-3 Based Image Processing. BUTS. 2025;6(2):1-26.

Article Files

Full Text

This journal is prepared and published by the Bingöl University Technical Sciences journal team.