ArchiJury: Exploring the Capabilities of Vision-Language Models to Generate Architectural Critique

Selen Çiçek; Mehmet Sadık Aksu; Emre Öztürk; Kaan Bingöl; Gizem Mersin; Mustafa Koç; Oben Kazım Akmaz; Lale Başarır

doi:10.53710/jcode.1618548

Research Article

Yapay Zeka Zeka ile Mimari Tashih: Görme-Dil Modelleri ile Mimari Yorum Üretimi

Year 2025, Volume: 6 Issue: 1, 165 - 190, 31.03.2025

Selen Çiçek , Mehmet Sadık Aksu , Emre Öztürk , Kaan Bingöl , Gizem Mersin , Mustafa Koç , Oben Kazım Akmaz , Lale Başarır

https://doi.org/10.53710/jcode.1618548

Abstract

Günümüz tasarım pratiğini radikal şekilde dönüştürmeye başlayan üretken Yapay Zeka (YZ) modelleri, tasarım sürecinin derinlemesine değerlendirilmesi ve geliştirilmesi için kritik bir öneme sahip olan mimari eleştiri için önemli bir potansiyel sunmaktadır. Özellikle, mimari tasarım yarışmaları gibi yoğun katılımcı sayısına sahip, kapsamlı ve tutarlı mimari eleştirilerin elzem olduğu çerçevelerde mimari kritiğe ulaşmak büyük bir zorluk oluşturmaktadır. Bu noktada çalışma, Görme Dil Modelleri olarak bilinen bir yapay zeka modeli mimarisini, tasarım problemlerini sorgulayarak, üretilen mimari çözümlere yorum ve mimari eleştiri geliştirmek üzere kullanılmasını önceleyen bir çerçeve önermektedir. Mimari tasarım pratiklerinde YZ araçları daha çok üretim, görsel temsil ve optimizasyon gibi somut çıktılar elde etmek için kullanılsa da, mimari eleştiri gibi sezgisellik, sorgulama ve bağlamsallık gerektiren alanlarda henüz sınırlı bir kullanım alanına sahiptir. Araştırma kapsamında önerilen YZ modelinin mimari eleştirinin sezgisel ve yoruma dayalı, nicel veriler ile ölçülemeyen boyutlarına entegre edilerek, tutarlı ve ölçeklenebilir eleştirilerin geliştirilmesi amaçlamaktadır. Önerilen model, hem bağlam duyarlılığı hem de mimari değerlere uygunluğu sağlamak adına alan uzmanları tarafından tasarlanmış bir veri seti ile eğitilmiştir. Modelin geliştirilmesi safhası, iki temel aşamadan oluşmaktadır. İlk aşama olan "v1," görsel mimari özelliklerin (örneğin, geleneksel veya çağdaş, açısal veya organik formlar gibi) ikili sınıflandırmasını inceleyerek, çalışmanın ikinci aşamasında geliştirilen model mimarisinin tanımlanan araştırma problemi karşısında uygulanabilirliğini test etmeyi amaçlamaktadır. İkinci aşama olan “v2”de ise model mimarisi, önceden tanımlanmış değerlendirme kriterlerini (bağlam, ölçek, tasarım stratejileri, programatik ilişkiler vb.) kullanarak kapsamlı ve detaylı metinsel eleştiriler üretmek üzere geliştirilmiştir. İlk aşamada elde edilen sonuçların değerlendirilmesinin ardından; ikinci versiyonda model, genişletilmiş bir görsel veri seti ve uzman değerlendirmesiyle elde edilen mimari yorumlar ile eğitilerek, modelin kapsamlı ve tutarlı eleştiriler üretme kapasitesi artırılmıştır. Bu süreçte, modelin ürettiği her eleştiri, doğruluk ve tutarlılık açısından alan uzmanları tarafından gözden geçirilmiş ve revize edilmiştir. Çalışmanın ekolojik zekayı bu çerçeveye entegre ederek, eleştiri, tasarımları çevresel etkileri ve sürdürülebilirlik uygulamaları açısından da değerlendirebilir ve mimari yeniliği ekolojik sorumlulukla uyumlu hale getiren bütünsel bir yaklaşımı teşvik etmesi hedeflenmektedir. Çalışma kapsamında elde edilen sonuçlar, Görme Dil Modellerinin geleneksel jüri süreçlerini yapılandırılmış, ölçeklenebilir ve bağlam duyarlı eleştirilerle destekleyerek mimari tasarım pratiği ve yapay zeka arasındaki diyaloğu geliştirme potansiyeline sahip olduğunun altını çizmektedir.

Keywords

Mimari Eleştiri , Yapay Zeka , Görme Dil Modelleri (GDM) , Yapay Zeka ve Mimari Tasarım , Mimari Tasarım Yarışmaları

References

Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/access.2018.2870052
As, I., Pal, S., & Basu, P. (2018). Artificial intelligence in architecture: Generating conceptual design via deep learning. International Journal of Architectural Computing, 16(4), 306–327. https://doi.org/10.1177/1478077118800982
Bordes, F., Pang, R. Y., Ajay, A., Li, A. C., Bardes, A., Petryk, S., Mañas, O., Lin, Z., Mahmoud, A., Jayaraman, B., Ibrahim, M., Hall, M., Xiong, Y., Lebensold, J., Ross, C., Jayakumar, S., Guo, C., Bouchacourt, D., Al-Tahan, H., . . . Chandra, V. (2024). An introduction to Vision-Language modeling. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.17247
Denzler, J., Rodner, E., & Simon, M. (2016). Convolutional neural networks as a computational model for the underlying processes of aesthetics perception. In Lecture notes in computer science (pp. 871–887). https://doi.org/10.1007/978-3-319-46604-0_60
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLORA: Efficient Finetuning of Quantized LLMS. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.14314
Fischer, G., Nakakoji, K., Ostwald, J., Stahl, G., & Sumner, T. (1993). Embedding critics in design environments. The Knowledge Engineering Review, 8(4), 285–307. https://doi.org/10.1017/s026988890000031x
Frederickson, M. P. (1990). Design Juries: A study in Lines of communication. Journal of Architectural Education, 43(2), 22–27. https://doi.org/10.1080/10464883.1990.10758556
Ghosh, A., Acharya, A., Saha, S., Jain, V., & CHadha, A. (2024). Exploring the Frontier of Vision-Language Models: A survey of current methodologies and future directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2404.07214
Gokhale, T., Palangi, H., Nushi, B., Vineet, V., Horvitz, E., Kamar, E., Baral, C., & Yang, Y. (2022). Benchmarking spatial relationships in Text-to-Image generation. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2212.10015
Güzelci, O. Z., & Şener, S. M. (2019). An entropy-Based Design Evaluation Model for architectural competitions through multiple factors. Entropy (Basel, Switzerland), 21(11), 1064. doi:10.3390/e21111064
Güzer, C. A. G. (1994). The Limits of architecturalcritism: Architecture as a procress of represantation, commodification and legitimation [PhD Dissertation, Middle East Technical University]. https://open.metu.edu.tr/handle/11511/858
Laurençon, H., Marafioti, A., Sanh, V., & Tronchon, L. (2024). Building and better understanding vision-language models: insights and future directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.12637
Li, C., Zhang, T., Du, X., Zhang, Y., & Xie, H. (2024). Generative AI models for different steps in architectural design: A literature review. Frontiers of Architectural Research. https://doi.org/10.1016/j.foar.2024.10.001
Luther, K., Tolentino, J., Wu, W., Pavel, A., Bailey, B. P., Agrawala, M., Hartmann, B., & Dow, S. P. (2015). Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. CSCW ’15: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. https://doi.org/10.1145/2675133.2675283
Luther, K., Williams, A., Hicks, J., & Dow, S. P. (2015). CrowdCrit: A crowdsourced approach to critique for design education. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 163–172.
Lymer, G. (2009). Demonstrating Professional vision: The work of critique in Architectural education. Mind Culture and Activity, 16(2), 145–171. https://doi.org/10.1080/10749030802590580
Marafioti, A., Zohar, O., Farré, M., Noyan, M., Bakouch, E., Cuenca, P., … Wolf, T. (2025). SmolVLM: Redefining small and efficient multimodal models.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on Bias and Fairness in Machine Learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607
Mittal, A., Murthy, R., Kumar, V., & Bhat, R. (2024). Towards understanding and mitigating the hallucinations in NLP and Speech. CODS-COMAD ’24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data, 489–492. https://doi.org/10.1145/3632410.3633297
Rönn, M. (2011). Architectural quality in competitions. A dialogue based assessment of design proposals. FormAkademisk - Forskningstidsskrift for Design Og Designdidaktikk, 4(1). https://doi.org/10.7577/formakademisk.130
Salem, A., Mansour, Y., & Eldaly, H. (2024). Generative vs. Non-Generative AI: Analyzing the Effects of AI on the Architectural Design Process. Engineering Research Journal (Shoubra), 53(2), 119-128. doi: 10.21608/erjsh.2024.255372.1256
Sanalan, A. (2022). The role of artificial intelligence and big data technologies in architectural design processes. Maltepe University Graduate Institute. Retrieved from Maltepe Open Access
Shen, S., Logeswaran, L., Lee, M., Lee, H., Poria, S., & Mihalcea, R. (2024). Understanding the capabilities and limitations of large language models for cultural commonsense. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.04655
SmolVLM - small yet mighty Vision Language Model. (2024). https://huggingface.co/blog/smolvlm Wu, T., Liu, C., & Li, Y. (2020). Visual classification in architectural design: A neural network approach. Architectural Computing Journal, 18(2), 102–118.
Zhang, M., Press, O., Merrill, W., Liu, A., & Smith, N. A. (2023). How language model hallucinations can snowball. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.13534

ArchiJury: Exploring the Capabilities of Vision-Language Models to Generate Architectural Critique

Year 2025, Volume: 6 Issue: 1, 165 - 190, 31.03.2025

Selen Çiçek , Mehmet Sadık Aksu , Emre Öztürk , Kaan Bingöl , Gizem Mersin , Mustafa Koç , Oben Kazım Akmaz , Lale Başarır

https://doi.org/10.53710/jcode.1618548

Abstract

Artificial Intelligence (AI) offers a potent opportunity to rethink architectural critique, in cases such as architectural design competitions. The challenge lies in capturing the interpretive depth required for design evaluation—an inherently human process that connects intuition, reasoning, and contextual sensitivity. Building on this premise, the proposed approach uses a domain-specific dataset, curated and validated by experienced architects as domain experts, to train a context-aware Visual-Language Model (VLM) capable of delivering a nuanced critique. The model development follows two distinct phases: an initial prototype (v1) explores feasibility through classification of visual architectural attributes, while the second phase (v2) evolves into a structure generating detailed critique texts guided by predefined criteria such as context, form, and programmatic considerations. The proposed model aims to bridge the gap between computational precision and the complexities of architectural judgment, offering a structured yet adaptable framework for utilizing AI in the evaluative aspects of design.By integrating ecological intelligence into this framework, the critique can also assess designs based on their environmental impact and sustainability practices, encouraging a holistic approach that aligns architectural innovation with ecological responsibility. Although still in its early stages, this work opens a pathway to complement traditional review processes with reliable, scalable, and context-sensitive feedback, laying a foundation for incorporating the patterns of tacit knowledge in architectural design into the review process.

Keywords

Architectural Critique , Artificial Intelligence (AI) , Vision-Language Models (VLM) , AI and Architectural Design , Architecture Competitions

References

Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/access.2018.2870052
As, I., Pal, S., & Basu, P. (2018). Artificial intelligence in architecture: Generating conceptual design via deep learning. International Journal of Architectural Computing, 16(4), 306–327. https://doi.org/10.1177/1478077118800982
Bordes, F., Pang, R. Y., Ajay, A., Li, A. C., Bardes, A., Petryk, S., Mañas, O., Lin, Z., Mahmoud, A., Jayaraman, B., Ibrahim, M., Hall, M., Xiong, Y., Lebensold, J., Ross, C., Jayakumar, S., Guo, C., Bouchacourt, D., Al-Tahan, H., . . . Chandra, V. (2024). An introduction to Vision-Language modeling. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.17247
Denzler, J., Rodner, E., & Simon, M. (2016). Convolutional neural networks as a computational model for the underlying processes of aesthetics perception. In Lecture notes in computer science (pp. 871–887). https://doi.org/10.1007/978-3-319-46604-0_60
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLORA: Efficient Finetuning of Quantized LLMS. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.14314
Fischer, G., Nakakoji, K., Ostwald, J., Stahl, G., & Sumner, T. (1993). Embedding critics in design environments. The Knowledge Engineering Review, 8(4), 285–307. https://doi.org/10.1017/s026988890000031x
Frederickson, M. P. (1990). Design Juries: A study in Lines of communication. Journal of Architectural Education, 43(2), 22–27. https://doi.org/10.1080/10464883.1990.10758556
Ghosh, A., Acharya, A., Saha, S., Jain, V., & CHadha, A. (2024). Exploring the Frontier of Vision-Language Models: A survey of current methodologies and future directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2404.07214
Gokhale, T., Palangi, H., Nushi, B., Vineet, V., Horvitz, E., Kamar, E., Baral, C., & Yang, Y. (2022). Benchmarking spatial relationships in Text-to-Image generation. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2212.10015
Güzelci, O. Z., & Şener, S. M. (2019). An entropy-Based Design Evaluation Model for architectural competitions through multiple factors. Entropy (Basel, Switzerland), 21(11), 1064. doi:10.3390/e21111064
Güzer, C. A. G. (1994). The Limits of architecturalcritism: Architecture as a procress of represantation, commodification and legitimation [PhD Dissertation, Middle East Technical University]. https://open.metu.edu.tr/handle/11511/858
Laurençon, H., Marafioti, A., Sanh, V., & Tronchon, L. (2024). Building and better understanding vision-language models: insights and future directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.12637
Li, C., Zhang, T., Du, X., Zhang, Y., & Xie, H. (2024). Generative AI models for different steps in architectural design: A literature review. Frontiers of Architectural Research. https://doi.org/10.1016/j.foar.2024.10.001
Luther, K., Tolentino, J., Wu, W., Pavel, A., Bailey, B. P., Agrawala, M., Hartmann, B., & Dow, S. P. (2015). Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. CSCW ’15: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. https://doi.org/10.1145/2675133.2675283
Luther, K., Williams, A., Hicks, J., & Dow, S. P. (2015). CrowdCrit: A crowdsourced approach to critique for design education. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 163–172.
Lymer, G. (2009). Demonstrating Professional vision: The work of critique in Architectural education. Mind Culture and Activity, 16(2), 145–171. https://doi.org/10.1080/10749030802590580
Marafioti, A., Zohar, O., Farré, M., Noyan, M., Bakouch, E., Cuenca, P., … Wolf, T. (2025). SmolVLM: Redefining small and efficient multimodal models.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on Bias and Fairness in Machine Learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607
Mittal, A., Murthy, R., Kumar, V., & Bhat, R. (2024). Towards understanding and mitigating the hallucinations in NLP and Speech. CODS-COMAD ’24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data, 489–492. https://doi.org/10.1145/3632410.3633297
Rönn, M. (2011). Architectural quality in competitions. A dialogue based assessment of design proposals. FormAkademisk - Forskningstidsskrift for Design Og Designdidaktikk, 4(1). https://doi.org/10.7577/formakademisk.130
Salem, A., Mansour, Y., & Eldaly, H. (2024). Generative vs. Non-Generative AI: Analyzing the Effects of AI on the Architectural Design Process. Engineering Research Journal (Shoubra), 53(2), 119-128. doi: 10.21608/erjsh.2024.255372.1256
Sanalan, A. (2022). The role of artificial intelligence and big data technologies in architectural design processes. Maltepe University Graduate Institute. Retrieved from Maltepe Open Access
Shen, S., Logeswaran, L., Lee, M., Lee, H., Poria, S., & Mihalcea, R. (2024). Understanding the capabilities and limitations of large language models for cultural commonsense. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.04655
SmolVLM - small yet mighty Vision Language Model. (2024). https://huggingface.co/blog/smolvlm Wu, T., Liu, C., & Li, Y. (2020). Visual classification in architectural design: A neural network approach. Architectural Computing Journal, 18(2), 102–118.
Zhang, M., Press, O., Merrill, W., Liu, A., & Smith, N. A. (2023). How language model hallucinations can snowball. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.13534

There are 25 citations in total.

Details

Primary Language	English
Subjects	Natural Language Processing, Architectural Science and Technology, Information Technologies in Architecture and Design
Journal Section	Research Article
Authors	Selen Çiçek 0000-0003-2489-2536 Mehmet Sadık Aksu 0009-0004-7024-3304 Emre Öztürk 0009-0009-9937-4099 Kaan Bingöl 0000-0001-7175-3198 Gizem Mersin 0009-0000-2295-353X Mustafa Koç 0000-0001-8131-8878 Oben Kazım Akmaz Lale Başarır 0000-0001-8620-6429
Submission Date	January 13, 2025
Acceptance Date	March 20, 2025
Early Pub Date	March 28, 2025
Publication Date	March 31, 2025
Published in Issue	Year 2025 Volume: 6 Issue: 1

Cite

APA	Çiçek, S., Aksu, M. S., Öztürk, E., … Bingöl, K. (2025). ArchiJury: Exploring the Capabilities of Vision-Language Models to Generate Architectural Critique. Journal of Computational Design, 6(1), 165-190. https://doi.org/10.53710/jcode.1618548

Article Files

Full Text

The papers published in JCoDe are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.