Araştırma Makalesi
BibTex RIS Kaynak Göster

DESIGN OF METADATA MANAGEMENT PLATFORM USING ARTIFICIAL INTELLIGENCE

Yıl 2025, Cilt: 8 Sayı: 1, 41 - 58, 26.08.2025
https://doi.org/10.56809/icujtas.1563267

Öz

The growth rate of data is increasing rapidly every passing day. In addition to this structurally growing data, unstructured data is now also a part of the data world. Today, while many different types of devices produce and transfer data, data is now an asset and value for institutions. However, at a point where data grows and diversifies so rapidly, managing the data itself and the metadata containing the data of the data, benefiting from this, and ensuring data-driven business transformations are even more difficult areas. In this study, a system is presented to companies to solve management problems in the field of metadata, where they can track data and digital assets that are rapidly expanding with the age of digitalization end-to-end. In addition, this system aims to group data with the support of large language models, classify data baskets with machine learning methods, comply with data security policies required by KVKK with natural language processing methods, and create a platform where companies can analyze their own metadata. With this platform, the design phase of which has been completed, using machine learning methods including natural language processing and quality assessment methods, data profiling, increasing data quality, and grouping related data will enable institutions to use the full potential of their data in decision-making. In addition, institutions will be able to manage data lines on the same platform without the need for other tools.

Proje Numarası

3220241

Kaynakça

  • Affolter, K., Stockinger, K., & Bernstein, A. (2019). A comparative survey of recent natural language interfaces for databases. The VLDB Journal, 28(5), 793-819.
  • Boukraa, D., Bala, M., & Rizzi, S. (2024). Metadata Management in Data Lake Environments: A Survey. Journal of Library Metadata, 24(4), 215-274.
  • Dai, W., Wardlaw, I., Cui, Y., Mehdi, K., Li, Y., & Long, J. (2016). Data profiling technology of data governance regarding big data: review and rethinking. Paper presented at the Information Technology: New Generations: 13th International Conference on Information Technology.
  • Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3 (pp. 265-284). Springer Berlin Heidelberg.
  • Dwork, C., Naor, M., Pitassi, T., & Rothblum, G. N. (2010). Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing (pp. 715-724).
  • Gao, Y., Huang, S., & Parameswaran, A. (2018). Navigating the data lake with Datamaran: Automatically extracting structure from log datasets. In Proceedings of the 2018 International Conference on Management of Data (pp. 943-958).
  • Hai, R., Koutras, C., Quix, C., & Jarke, M. (2023). Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12571-12590.
  • Hellerstein, J., Ré, C., Schoppmann, F., Wang, D. Z., Fratkin, E., Gorajek, A., ... & Kumar, A. (2012). The MADlib analytics library or MAD skills, the SQL. arXiv preprint arXiv:1208.4165.
  • Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E. S., Spicer, K., & De Wolf, P. P. (2012). Statistical disclosure control. John Wiley & Sons.
  • Katsogiannis-Meimarakis, G., & Koutrika, G. (2023). A survey on deep learning approaches for text-to-SQL. The VLDB Journal, 32(4), 905-936.
  • Li, F., & Wang, Y. (2007). Routing in vehicular ad hoc networks: A survey. IEEE Vehicular technology magazine, 2(2), 12-22.
  • Li, H., Zhang, J., Liu, H., Fan, J., Zhang, X., Zhu, J., ... & Chen, H. (2024). Codes: Towards building open-source language models for text-to-sql. Proceedings of the ACM on Management of Data, 2(3), 1-28.
  • Li, N., Li, T., & Venkatasubramanian, S. (2007). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd international conference on data engineering (pp. 106-115). IEEE.
  • Loshin, D. (2010). Master data management. Morgan Kaufmann The MK OMG Press . Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. Acm transactions on knowledge discovery from data (tkdd), 1(1), 3-es.
  • Quix, C., Hai, R., & Vatov, I. (2016, June). GEMMS: A Generic and Extensible Metadata Management System for Data Lakes. In CAiSE forum (Vol. 129).
  • Raman, V., & Hellerstein, J. M. (2001). Potter’s Wheel: An Interactive Data Cleaning System. VLDB (2001). Ridzuan, F., & Zainon, W. M. N. W. (2019). A review on data cleansing methods for big data. Procedia Computer Science, 161, 731-738. Samarati, P. (2001). Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 13(6), 1010-1027.
  • Sawadogo, P., & Darmont, J. (2021). On data lake architectures and metadata management. Journal of Intelligent Information Systems, 56(1), 97-120.
  • Silva, P., Gonçalves, C., Godinho, C., Antunes, N., & Curado, M. (2020). Using NLP and machine learning to detect data privacy violations. In IEEE INFOCOM 2020-IEEE conference on computer communications workshops (INFOCOM WKSHPS) (pp. 972-977). IEEE.
  • Singh, R. K., & Madalli, D. P. (2023). DMPFrame: A conceptual metadata framework for data management plans. Journal of Library Metadata, 23(3-4), 121-160.
  • Solove, D. J (2012)., Nothing to Hide: The False Tradeoff Between Privacy and Security. Journal of Information Policy, 2.
  • Soria-Comas, J., & Domingo-Ferrer, J. (2016). Big data privacy: challenges to privacy principles and models. Data science and engineering, 1(1), 21-28.
  • Sun, R., Arik, S. Ö., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., ... & Pfister, T. (2023). SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended). arXiv preprint arXiv:2306.00739.
  • Trivedi, D., Zavarsky, P., & Butakov, S. (2016). Enhancing relational database security by metadata segregation. Procedia Computer Science, 94, 453-458.
  • Vassiliadis, P. (2009). A survey of extract–transform–load technology. International Journal of Data Warehousing and Mining (IJDWM), 5(3), 1-27.
  • Vilminko-Heikkinen, R., & Pekkola, S. (2017). Master data management and its organizational implementation: An ethnographical study within the public sector. Journal of Enterprise Information Management, 30(3), 454-475.
  • Wang, B., Ren, C., Yang, J., Liang, X., Bai, J., Zhang, Q. W., ... & Li, Z. (2023). Mac-sql: Multi-agent collaboration for text-to-sql. arXiv preprint arXiv:2312.11242.

YAPAY ZEKA YÖNTEMLERİ KULLANILARAK METAVERİ YÖNETİM PLATFORMU TASARIMI

Yıl 2025, Cilt: 8 Sayı: 1, 41 - 58, 26.08.2025
https://doi.org/10.56809/icujtas.1563267

Öz

Verilerin büyüme hızı her geçen gün ivmeli bir şekilde artmaktadır. Yapısal olarak büyüyen bu verilerin yanı sıra artık yapısal olmayan veriler de veri dünyasının bir parçası olmaktadır. Günümüzde artık çok farklı tipteki cihazlar veri üretip, aktarırken veri artık kurumlar için bir varlık ve değer anlamına gelmektedir. Fakat verilerin bu kadar hızlı büyüdüğü ve çeşitlendiği noktada verinin kendisini ve de verinin verisini içeren metadata yönetmek bundan fayda sağlamak, veri güdümlü iş dönüşümlerini sağlamak daha da güçlük yaşanan bir alandır. Metaveri alanındaki yönetim problemlerini çözmek için bu çalışmada, kurumlara dijitalleşme çağıyla birlikte hızla genişleyen veri ve dijital varlıklarını uçtan uca takip edebilecekler, büyük dil modellerinin desteği ile verinin gruplanması, doğal dil işleme yöntemler ile KVKK gereği oluşan veri güvenliği politikalarına uyum sağlayabilecekleri, ayrıca kendi verileriyle analiz yapabilecekleri bir platform oluşturulması amaçlanmaktadır. Tasarım aşaması gerçekleştirilen bu platform ile birlikte doğal dil işleme ve nitelik değerlendirme yöntemlerini içeren makine öğrenmesi yöntemlerinin kullanılmasıyla veri profillemesi, veri kalitesinin arttırılması, ilişkili verileri gruplama gibi özellikleriyle kurumların karar almasında sahip oldukları verinin tam potansiyelini kullanması sağlanacaktır. Ayrıca, kurumların diğer araçlara gerek duymadan aynı platform üzerinde veri hatlarında veriyi yönetebilmeleri sağlanacaktır.

Etik Beyan

yok

Destekleyen Kurum

Tubitak

Proje Numarası

3220241

Teşekkür

Bu çalışma DIP Bilgisayar Yazılım Ticaret Anonim Şirketi'nin Tubitak TEYDEB Programı kapsamında kabul edilen 3220241 kodlu Yapay Zeka ve Makina Öğrenmesi Destekli Veri ve MetaVeri Yönetim Platformu Programı başlıklı projesi kapsamında desteklenmiştir. Destekleri için Tubitak'a teşekkürlerimizi sunarız.

Kaynakça

  • Affolter, K., Stockinger, K., & Bernstein, A. (2019). A comparative survey of recent natural language interfaces for databases. The VLDB Journal, 28(5), 793-819.
  • Boukraa, D., Bala, M., & Rizzi, S. (2024). Metadata Management in Data Lake Environments: A Survey. Journal of Library Metadata, 24(4), 215-274.
  • Dai, W., Wardlaw, I., Cui, Y., Mehdi, K., Li, Y., & Long, J. (2016). Data profiling technology of data governance regarding big data: review and rethinking. Paper presented at the Information Technology: New Generations: 13th International Conference on Information Technology.
  • Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3 (pp. 265-284). Springer Berlin Heidelberg.
  • Dwork, C., Naor, M., Pitassi, T., & Rothblum, G. N. (2010). Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing (pp. 715-724).
  • Gao, Y., Huang, S., & Parameswaran, A. (2018). Navigating the data lake with Datamaran: Automatically extracting structure from log datasets. In Proceedings of the 2018 International Conference on Management of Data (pp. 943-958).
  • Hai, R., Koutras, C., Quix, C., & Jarke, M. (2023). Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12571-12590.
  • Hellerstein, J., Ré, C., Schoppmann, F., Wang, D. Z., Fratkin, E., Gorajek, A., ... & Kumar, A. (2012). The MADlib analytics library or MAD skills, the SQL. arXiv preprint arXiv:1208.4165.
  • Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E. S., Spicer, K., & De Wolf, P. P. (2012). Statistical disclosure control. John Wiley & Sons.
  • Katsogiannis-Meimarakis, G., & Koutrika, G. (2023). A survey on deep learning approaches for text-to-SQL. The VLDB Journal, 32(4), 905-936.
  • Li, F., & Wang, Y. (2007). Routing in vehicular ad hoc networks: A survey. IEEE Vehicular technology magazine, 2(2), 12-22.
  • Li, H., Zhang, J., Liu, H., Fan, J., Zhang, X., Zhu, J., ... & Chen, H. (2024). Codes: Towards building open-source language models for text-to-sql. Proceedings of the ACM on Management of Data, 2(3), 1-28.
  • Li, N., Li, T., & Venkatasubramanian, S. (2007). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd international conference on data engineering (pp. 106-115). IEEE.
  • Loshin, D. (2010). Master data management. Morgan Kaufmann The MK OMG Press . Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. Acm transactions on knowledge discovery from data (tkdd), 1(1), 3-es.
  • Quix, C., Hai, R., & Vatov, I. (2016, June). GEMMS: A Generic and Extensible Metadata Management System for Data Lakes. In CAiSE forum (Vol. 129).
  • Raman, V., & Hellerstein, J. M. (2001). Potter’s Wheel: An Interactive Data Cleaning System. VLDB (2001). Ridzuan, F., & Zainon, W. M. N. W. (2019). A review on data cleansing methods for big data. Procedia Computer Science, 161, 731-738. Samarati, P. (2001). Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 13(6), 1010-1027.
  • Sawadogo, P., & Darmont, J. (2021). On data lake architectures and metadata management. Journal of Intelligent Information Systems, 56(1), 97-120.
  • Silva, P., Gonçalves, C., Godinho, C., Antunes, N., & Curado, M. (2020). Using NLP and machine learning to detect data privacy violations. In IEEE INFOCOM 2020-IEEE conference on computer communications workshops (INFOCOM WKSHPS) (pp. 972-977). IEEE.
  • Singh, R. K., & Madalli, D. P. (2023). DMPFrame: A conceptual metadata framework for data management plans. Journal of Library Metadata, 23(3-4), 121-160.
  • Solove, D. J (2012)., Nothing to Hide: The False Tradeoff Between Privacy and Security. Journal of Information Policy, 2.
  • Soria-Comas, J., & Domingo-Ferrer, J. (2016). Big data privacy: challenges to privacy principles and models. Data science and engineering, 1(1), 21-28.
  • Sun, R., Arik, S. Ö., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., ... & Pfister, T. (2023). SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended). arXiv preprint arXiv:2306.00739.
  • Trivedi, D., Zavarsky, P., & Butakov, S. (2016). Enhancing relational database security by metadata segregation. Procedia Computer Science, 94, 453-458.
  • Vassiliadis, P. (2009). A survey of extract–transform–load technology. International Journal of Data Warehousing and Mining (IJDWM), 5(3), 1-27.
  • Vilminko-Heikkinen, R., & Pekkola, S. (2017). Master data management and its organizational implementation: An ethnographical study within the public sector. Journal of Enterprise Information Management, 30(3), 454-475.
  • Wang, B., Ren, C., Yang, J., Liang, X., Bai, J., Zhang, Q. W., ... & Li, Z. (2023). Mac-sql: Multi-agent collaboration for text-to-sql. arXiv preprint arXiv:2312.11242.
Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgisayar Yazılımı
Bölüm Research Article
Yazarlar

Kıvanç Kişlal 0009-0008-9015-5773

Buket Doğan 0000-0003-1062-2439

Ammar Abdurrauf 0009-0001-5925-5425

Proje Numarası 3220241
Yayımlanma Tarihi 26 Ağustos 2025
Gönderilme Tarihi 8 Ekim 2024
Kabul Tarihi 12 Aralık 2024
Yayımlandığı Sayı Yıl 2025 Cilt: 8 Sayı: 1

Kaynak Göster

APA Kişlal, K., Doğan, B., & Abdurrauf, A. (2025). DESIGN OF METADATA MANAGEMENT PLATFORM USING ARTIFICIAL INTELLIGENCE. İstanbul Ticaret Üniversitesi Teknoloji ve Uygulamalı Bilimler Dergisi, 8(1), 41-58. https://doi.org/10.56809/icujtas.1563267