Makine Öğrenmesi Yöntemleri Kullanılarak Mahkeme Kararlarlarının Kümelenmesi

Muhammed Burak Görentaş; Taner Uçkan

doi:10.53070/bbd.1318518

Research Article

Clustering Court Orders Using Machine Learning Methods

Year 2023, Volume: Vol:8 Issue: Issue:2, 148 - 158, 20.12.2023

Muhammed Burak Görentaş , Taner Uçkan

https://doi.org/10.53070/bbd.1318518

Cited By: 1

Abstract

Artificial intelligence (AI) is a rapidly evolving technology that has found applications in various fields of life. It has also made its way into the legal domain. AI has numerous applications in the legal realm, including legal research, case management, legal consultancy, legal language analysis, case precedents analysis, and legal risk assessment. Natural language processing (NLP) techniques have been employed to develop various AI applications in the legal field. Text clustering is one such application. Text clustering is a technique used in NLP and machine learning to group similar texts based on their content or linguistic features. Given the complexity and vastness of legal texts, clustering methods provide valuable contributions. These methods aid in grouping cases with similar attributes in a specific subject area, thus helping us better understand legal principles and judicial trends. Clustering techniques offer advantages such as facilitating quick access to a wide range of cases for legal researchers and improving the legal analysis process. Furthermore, the outcomes of clustering can be utilized in diverse areas, including the development of legal strategies, pre-trial preparations, and substantiating legal decisions. In this study, decisions of the Dispute Resolution Court were subjected to natural language processing using the TF-IDF method, followed by clustering using AI techniques such as CURE, K-MEANS, DBSCAN, AGNES, AFFINITY, and BIRCH. Based on evaluation metrics, the BIRCH algorithm yielded the best results.

Keywords

Artificial Intelligence, Law, Text Clustering

References

Aizawa, A. (2003). An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1), 45–65. https://doi.org/10.1016/S0306-4573(02)00021-3
Altszyler, E., Ribeiro, S., Sigman, M., & Fernández Slezak, D. (2017). The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text. Consciousness and Cognition, 56, 178–187. https://doi.org/10.1016/j.concog.2017.09.004
Anders, K.-H. (2003). Anders, K. H. (2003, April). A hierarchical graph-clustering approach to find groups of objects. 5th Workshop on Progress in Automated Map Generalization, 1–8.
Ay, S. (2018). Türkiye’deki Ceza Davalarının İstatistiksel Analizi. İstanbul Ticaret Üniversitesi Sosyal Bilimler Dergisi, 17(33), 25–36.
Aydın, Ö. (2020). Mobbing İçerikli Yargı Kararlarının Makine Öğrenmesi Algoritmaları ile Sınıflandırılması [Yayımlanmış Yüksek Lisans Tezi]. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü.
Bateni, M., Behnezhad, S., Derakhshan, M., Hajiaghayi, M., Kiveris, R., Lattanzi, S., & Mirrokni, V. (2017). Affinity clustering: Hierarchical clustering at scale. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 1–11). Guha, S., Rastogi, R., & Shim, K. (2001). Cure: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35–58. https://doi.org/10.1016/S0306-4379(01)00008-4
Han, J., Kamber, M., & Pei, J. (2011). Data Mining. Concepts and Techniques, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems).
Hosseini, S., & Varzaneh, Z. A. (2022). Deep text clustering using stacked AutoEncoder. Multimedia Tools and Applications, 81(8), 10861–10881. https://doi.org/10.1007/s11042-022-12155-0
Kılıç, B., & Öner, Y. (2021). Yargıtay Kararlarının Suç Türlerine Göre Makine Öğrenmesi Yöntemleri İle Sınıflandırılması (Vol. 4, Issue 3). https://dergipark.org.tr/en/download/article-file/2032425
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of Cluster in K-Means Clustering. International Journal of Advance Research in Computer Science and Management Studies, 1(6), 90–95.
Lovmar, L., Ahlford, A., Jonsson, M., & Syvänen, A.-C. (2005). Silhouette scores for assessment of SNP genotype clusters. BMC Genomics, 6(1), 35. https://doi.org/10.1186/1471-2164-6-35
Lukasik, S., Kowalski, P. A., Charytanowicz, M., & Kulczycki, P. (2016). Clustering using flower pollination algorithm and Calinski-Harabasz index. 2016 IEEE Congress on Evolutionary Computation (CEC), 2724–2728. https://doi.org/10.1109/CEC.2016.7744132
Mumcuoğlu, E., Öztürk, C. E., Ozaktas, H. M., & Koç, A. (2021). Natural language processing in law: Prediction of outcomes in the higher courts of Turkey. Information Processing and Management, 58(5). https://doi.org/10.1016/j.ipm.2021.102684
Ogbuabor, G., & F. N, U. (2018). Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value. International Journal of Computer Science and Information Technology, 10(2), 27–37. https://doi.org/10.5121/ijcsit.2018.10203
Petrovic, S. (2006). A Comparison Between The Silhouette Index And The Davies-Bouldin Index in Labelling Ids Clusters. 11th Nordic Workshop of Secure IT Systems, 53–64.
Prameswari, P., Zulkarnain, Surjandari, I., & Laoh, E. (2017). Mining online reviews in Indonesia’s priority tourist destinations using sentiment analysis and text summarization approach. 2017 IEEE 8th International Conference on Awareness Science and Technology (ICAST), 121–126. https://doi.org/10.1109/ICAwST.2017.8256429
Ramadhani, S., Azzahra, D., & Z, T. (2022). Comparison of K-Means and K-Medoids Algorithms in Text Mining based on Davies Bouldin Index Testing for Classification of Student’s Thesis. Digital Zone: Jurnal Teknologi Informasi Dan Komunikasi, 13(1), 24–33. https://doi.org/10.31849/digitalzone.v13i1.9292
Rashid, J., Shah, S. M. A., & Irtaza, A. (2020). An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering. Mehran University Research Journal of Engineering and Technology, 39(1), 213–222. https://doi.org/10.22581/muet1982.2001.20
Rehman, S. U., Asghar, S., Fong, S., & Sarasvady, S. (2014). DBSCAN: Past, present and future. The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), 232–238. https://doi.org/10.1109/ICADIWT.2014.6814687
Shahapure, K. R., & Nicholas, C. (2020). Cluster Quality Analysis Using Silhouette Score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 747–748. https://doi.org/10.1109/DSAA49011.2020.00096
Singh, A. K., Mittal, S., Malhotra, P., & Srivastava, Y. V. (2020). Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 306–310. https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00057
Suchacka, G., Cabri, A., Rovetta, S., & Masulli, F. (2021). Efficient on-the-fly Web bot detection. Knowledge-Based Systems, 223, 107074. https://doi.org/10.1016/j.knosys.2021.107074
Wang, X., & Xu, Y. (2019). An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Materials Science and Engineering, 569(5), 052024. https://doi.org/10.1088/1757-899X/569/5/052024
Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1(2), 141–182. https://doi.org/10.1023/A:1009783824328
Zhao, Q., & Fränti, P. (2014). WB-index: A sum-of-squares based index for cluster validity. Data & Knowledge Engineering, 92, 77–89. https://doi.org/10.1016/j.datak.2014.07.008

Makine Öğrenmesi Yöntemleri Kullanılarak Mahkeme Kararlarlarının Kümelenmesi

Year 2023, Volume: Vol:8 Issue: Issue:2, 148 - 158, 20.12.2023

Muhammed Burak Görentaş , Taner Uçkan

https://doi.org/10.53070/bbd.1318518

Cited By: 1

Abstract

Yapay zeka son yıllarda hızlı bir şekilde gelişen bir teknolojidir ve hayatın hemen her alanında uygulanma olanağı bulmuştur. Sağlık, otomotiv, eğitim, müzik, finans, tarım ve daha birçok alanda yapay zeka kullanılmaya başlanmıştır. Bu alanlardan biri de hukuktur. Hukuk aleminde yapay zekanın birçok uygulanma ortamı bulunmaktadır. Hukuk araştırmaları, dava yönetimi, hukuk danışmanlığı, hukuki dil analizi, içtihat taramaları, hukuki risk analizi gibi yardımcı araç olarak kullanımının yanında yargısal kararların analizi gibi kullanımları da mevcuttur. Yapay zeka hukuk alanında doğal dil işleme teknolojisi kullanılarak birçok uygulama geliştirilmiştir. Metin kümeleme bu uygulama alanlarından biridir. Metin kümeleme, doğal dil işleme ve makine öğrenmesinde kullanılan bir tekniktir ve içerik veya dilbilimsel özelliklerine göre benzer metinleri gruplandırmaya yardımcı olmaktadır. Özellikle hukuk alanında karmaşık ve geniş bir metin kümesi olduğundan kümeleme yöntemleri değerli bir katkı sunmaktadır. Bu yöntemler, belirli bir konuda benzer niteliklere sahip davaları gruplandırarak, hukuki prensipleri ve yargısal eğilimleri daha iyi anlamamıza yardımcı olmaktadır. Kümeleme yöntemleri, hukuki araştırmacıların geniş bir dava yelpazesine hızlı bir şekilde erişmelerini sağlaması ve hukuki analiz sürecini iyileştirmesi gibi avantajlar sunmaktadır. Ayrıca, kümeleme sonuçları, hukuki stratejilerin geliştirilmesi, dava öncesi hazırlık ve hukuki kararların temellendirilmesi gibi birçok farklı alanda kullanılmaktadır. Bu çalışmada Uyuşmazlık Mahkemesi kararları TF-IDF yöntemi ile doğal dil işleme sürecinden geçirilmiş ve ardından CURE, K-MEANS, DBSCAN, AGNES, AFFINITY ve BIRCH gibi yapay zeka yöntemleri ile kümelenmiştir. Değerlendirme metriklerine göre en iyi sonucu BIRCH algoritmasının verdiği görülmüştür.

Keywords

Hukuk, Metin Kümeleme, Yapay Zeka

References

Aizawa, A. (2003). An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1), 45–65. https://doi.org/10.1016/S0306-4573(02)00021-3
Altszyler, E., Ribeiro, S., Sigman, M., & Fernández Slezak, D. (2017). The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text. Consciousness and Cognition, 56, 178–187. https://doi.org/10.1016/j.concog.2017.09.004
Anders, K.-H. (2003). Anders, K. H. (2003, April). A hierarchical graph-clustering approach to find groups of objects. 5th Workshop on Progress in Automated Map Generalization, 1–8.
Ay, S. (2018). Türkiye’deki Ceza Davalarının İstatistiksel Analizi. İstanbul Ticaret Üniversitesi Sosyal Bilimler Dergisi, 17(33), 25–36.
Aydın, Ö. (2020). Mobbing İçerikli Yargı Kararlarının Makine Öğrenmesi Algoritmaları ile Sınıflandırılması [Yayımlanmış Yüksek Lisans Tezi]. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü.
Bateni, M., Behnezhad, S., Derakhshan, M., Hajiaghayi, M., Kiveris, R., Lattanzi, S., & Mirrokni, V. (2017). Affinity clustering: Hierarchical clustering at scale. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 1–11). Guha, S., Rastogi, R., & Shim, K. (2001). Cure: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35–58. https://doi.org/10.1016/S0306-4379(01)00008-4
Han, J., Kamber, M., & Pei, J. (2011). Data Mining. Concepts and Techniques, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems).
Hosseini, S., & Varzaneh, Z. A. (2022). Deep text clustering using stacked AutoEncoder. Multimedia Tools and Applications, 81(8), 10861–10881. https://doi.org/10.1007/s11042-022-12155-0
Kılıç, B., & Öner, Y. (2021). Yargıtay Kararlarının Suç Türlerine Göre Makine Öğrenmesi Yöntemleri İle Sınıflandırılması (Vol. 4, Issue 3). https://dergipark.org.tr/en/download/article-file/2032425
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of Cluster in K-Means Clustering. International Journal of Advance Research in Computer Science and Management Studies, 1(6), 90–95.
Lovmar, L., Ahlford, A., Jonsson, M., & Syvänen, A.-C. (2005). Silhouette scores for assessment of SNP genotype clusters. BMC Genomics, 6(1), 35. https://doi.org/10.1186/1471-2164-6-35
Lukasik, S., Kowalski, P. A., Charytanowicz, M., & Kulczycki, P. (2016). Clustering using flower pollination algorithm and Calinski-Harabasz index. 2016 IEEE Congress on Evolutionary Computation (CEC), 2724–2728. https://doi.org/10.1109/CEC.2016.7744132
Mumcuoğlu, E., Öztürk, C. E., Ozaktas, H. M., & Koç, A. (2021). Natural language processing in law: Prediction of outcomes in the higher courts of Turkey. Information Processing and Management, 58(5). https://doi.org/10.1016/j.ipm.2021.102684
Ogbuabor, G., & F. N, U. (2018). Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value. International Journal of Computer Science and Information Technology, 10(2), 27–37. https://doi.org/10.5121/ijcsit.2018.10203
Petrovic, S. (2006). A Comparison Between The Silhouette Index And The Davies-Bouldin Index in Labelling Ids Clusters. 11th Nordic Workshop of Secure IT Systems, 53–64.
Prameswari, P., Zulkarnain, Surjandari, I., & Laoh, E. (2017). Mining online reviews in Indonesia’s priority tourist destinations using sentiment analysis and text summarization approach. 2017 IEEE 8th International Conference on Awareness Science and Technology (ICAST), 121–126. https://doi.org/10.1109/ICAwST.2017.8256429
Ramadhani, S., Azzahra, D., & Z, T. (2022). Comparison of K-Means and K-Medoids Algorithms in Text Mining based on Davies Bouldin Index Testing for Classification of Student’s Thesis. Digital Zone: Jurnal Teknologi Informasi Dan Komunikasi, 13(1), 24–33. https://doi.org/10.31849/digitalzone.v13i1.9292
Rashid, J., Shah, S. M. A., & Irtaza, A. (2020). An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering. Mehran University Research Journal of Engineering and Technology, 39(1), 213–222. https://doi.org/10.22581/muet1982.2001.20
Rehman, S. U., Asghar, S., Fong, S., & Sarasvady, S. (2014). DBSCAN: Past, present and future. The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), 232–238. https://doi.org/10.1109/ICADIWT.2014.6814687
Shahapure, K. R., & Nicholas, C. (2020). Cluster Quality Analysis Using Silhouette Score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 747–748. https://doi.org/10.1109/DSAA49011.2020.00096
Singh, A. K., Mittal, S., Malhotra, P., & Srivastava, Y. V. (2020). Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 306–310. https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00057
Suchacka, G., Cabri, A., Rovetta, S., & Masulli, F. (2021). Efficient on-the-fly Web bot detection. Knowledge-Based Systems, 223, 107074. https://doi.org/10.1016/j.knosys.2021.107074
Wang, X., & Xu, Y. (2019). An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Materials Science and Engineering, 569(5), 052024. https://doi.org/10.1088/1757-899X/569/5/052024
Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1(2), 141–182. https://doi.org/10.1023/A:1009783824328
Zhao, Q., & Fränti, P. (2014). WB-index: A sum-of-squares based index for cluster validity. Data & Knowledge Engineering, 92, 77–89. https://doi.org/10.1016/j.datak.2014.07.008

There are 25 citations in total.

Details

Primary Language	Turkish
Subjects	Machine Learning (Other)
Journal Section	PAPERS
Authors	Muhammed Burak Görentaş 0000-0001-8898-9631 Taner Uçkan 0000-0001-5385-6775
Publication Date	December 20, 2023
Submission Date	June 22, 2023
Acceptance Date	October 6, 2023
Published in Issue	Year 2023 Volume: Vol:8 Issue: Issue:2

Cite

APA	Görentaş, M. B., & Uçkan, T. (2023). Makine Öğrenmesi Yöntemleri Kullanılarak Mahkeme Kararlarlarının Kümelenmesi. Computer Science, Vol:8(Issue:2), 148-158. https://doi.org/10.53070/bbd.1318518

Cited By

The Effectiveness of Machine Learning Algorithms in Extractive Text Summarization: A Comparative Analysis of K-Means, Random Forest, GBM, Logistic Regression, and SVM

Doğu Fen Bilimleri Dergisi

https://doi.org/10.57244/dfbd.1538959

Article Files

Full Text

The Creative Commons Attribution 4.0 International License is applied to all research papers published by JCS and

A Digital Object Identifier (DOI) is assigned for each published paper.