Araştırma Makalesi
BibTex RIS Kaynak Göster

Investigating Word Association Mining Techniques

Yıl 2022, Cilt: 5 Sayı: 2, 106 - 114, 25.12.2022

Öz

This study presents the investigation of the effect of conditional entropy, mutual information (MI) values, log-likelihood ratio (LLR), and simple co-occurrences on extracting strong syntagmatic relationships. Experiments are conducted by using the Yelp Academic Dataset, which includes extracted 10.000 restaurant reviews. The mutual information values of word pairs are considered to extract the top syntagmatically related words from the corpus. For this purpose, Spyder 3.3.6 and Python Natural Language Toolkit (NLTK) Library are used. The mutual information values are then compared with simple co-occurrences count. The analysis results indicated that the three Word collocation techniques give similar results and therefore, all of those can be employed for Word collocations effectively.

Kaynakça

  • [1] Zhai, C. X., Massung, S., Text Data Management and Analysis- A Practical Introduction to Information Retrieval and Text Mining, ACM Books , 2016.
  • [2] Church, KW., Hanks, P., Word Association norms, mutual information and lexicography. Computational Linguistics, ACM Books , 1990.
  • [3] Damani, O.P., Improving Pointwise Mutual Information (PMI) by incorporating Significant Co-occurrence. 17th Conference on Computational Natural Language Learning , 2013.
  • [4] F. H. Khan, U.Qamar, S. Bashir, SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection, Applied Soft Computing 39, 140–153, 2016.
  • [5] A.K. Jain, Y. Pandey, Analysis and implementation of sentiment classification using lexical POS markers, Int. J. Comput. Commun. Netw. 2 (1) , 36-40, 2013.
  • [6] T. Xu, Q. Peng, Y. Cheng, Identifying the semantic orientation of terms using S-HAL for sentiment analysis, Knowl. Based Syst. 35, 279–289, 2012.
  • [7] Manning, C.D., Raghavan, R. and Schütze, H., Introduction to Information Retrieval, Cambridge University Press (2008).
  • [8] Garrett, Michael, et al. "Leveraging mutual information to generate domain specific lexicons." Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation, Washington DC, USA. 2018.
  • [9] Kang, Beom-mo. "Collocation and word association: Comparing collocation measuring methods." International journal of corpus linguistics 23.1, 85-113, 2018.
  • [10] Lai, Huei-ling. "Collocation analysis of news discourse and its ideological implications." Pragmatics 29.4 ,545-570, 2019.
  • [11] Liu, Xiaoxia, et al. "Recognition of collocation frames from sentences." IEICE TRANSACTIONS on Information and Systems 102.3, 620-627, 2019.
  • [12] Williams, Christopher KI. "On Suspicious Coincidences and Pointwise Mutual Information." arXiv preprint arXiv:2203.08089, 2022.
  • [13] Krenn, Brigitte. "Collocation mining: Exploiting corpora for collocation identification and representation." Entropy 1, 2000.
  • [14] Zhang, Ke, et al. "A Construction Method of Electric Power Professional Domain Corpus Based on Multi-model Collaboration." 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES). IEEE, 2022.
  • [15] https://www.yelp.com/dataset

Kelime Birliktelik Madenciliği Tekniklerinin İncelenmesi

Yıl 2022, Cilt: 5 Sayı: 2, 106 - 114, 25.12.2022

Öz

Bu çalışma, koşullu entropi, ortak bilgi (MI) değerleri, log-birliktelik oranı (LLR) ve basit ortak oluşumların güçlü sözdizimsel ilişkilerin çıkarılması üzerindeki etkisinin araştırılmasını sunmaktadır. Deneyler, 10.000 restoran yorumunu içeren Yelp Akademik Veri Kümesi kullanılarak gerçekleştirilmiştir. Ortak bilgi değeri en yüksek sözcük çiftlerinin, söz dizimsel olarak ilişkili en üstteki sözcükleri derlemden çıkardığı kabul edilir. Bu amaçla Spyder 3.3.6 ve Python Natural Language Toolkit (NLTK) Library kullanılmıştır. Ortak bilgi değerleri daha sonra basit ortak oluşum sayısı ile karşılaştırılır. Analiz sonuçları, üç farklı kelime eşdizimleme tekniğinin benzer sonuçlar verdiğini ve bu nedenle, bunların hepsinin Kelime eşdizimleri için etkili bir şekilde kullanılabileceğini göstermiştir.

Kaynakça

  • [1] Zhai, C. X., Massung, S., Text Data Management and Analysis- A Practical Introduction to Information Retrieval and Text Mining, ACM Books , 2016.
  • [2] Church, KW., Hanks, P., Word Association norms, mutual information and lexicography. Computational Linguistics, ACM Books , 1990.
  • [3] Damani, O.P., Improving Pointwise Mutual Information (PMI) by incorporating Significant Co-occurrence. 17th Conference on Computational Natural Language Learning , 2013.
  • [4] F. H. Khan, U.Qamar, S. Bashir, SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection, Applied Soft Computing 39, 140–153, 2016.
  • [5] A.K. Jain, Y. Pandey, Analysis and implementation of sentiment classification using lexical POS markers, Int. J. Comput. Commun. Netw. 2 (1) , 36-40, 2013.
  • [6] T. Xu, Q. Peng, Y. Cheng, Identifying the semantic orientation of terms using S-HAL for sentiment analysis, Knowl. Based Syst. 35, 279–289, 2012.
  • [7] Manning, C.D., Raghavan, R. and Schütze, H., Introduction to Information Retrieval, Cambridge University Press (2008).
  • [8] Garrett, Michael, et al. "Leveraging mutual information to generate domain specific lexicons." Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation, Washington DC, USA. 2018.
  • [9] Kang, Beom-mo. "Collocation and word association: Comparing collocation measuring methods." International journal of corpus linguistics 23.1, 85-113, 2018.
  • [10] Lai, Huei-ling. "Collocation analysis of news discourse and its ideological implications." Pragmatics 29.4 ,545-570, 2019.
  • [11] Liu, Xiaoxia, et al. "Recognition of collocation frames from sentences." IEICE TRANSACTIONS on Information and Systems 102.3, 620-627, 2019.
  • [12] Williams, Christopher KI. "On Suspicious Coincidences and Pointwise Mutual Information." arXiv preprint arXiv:2203.08089, 2022.
  • [13] Krenn, Brigitte. "Collocation mining: Exploiting corpora for collocation identification and representation." Entropy 1, 2000.
  • [14] Zhang, Ke, et al. "A Construction Method of Electric Power Professional Domain Corpus Based on Multi-model Collaboration." 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES). IEEE, 2022.
  • [15] https://www.yelp.com/dataset
Toplam 15 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Mühendislik
Bölüm Makaleler
Yazarlar

Duygu Bağcı Daş

Sevdanur Genç 0000-0003-4774-9265

Yayımlanma Tarihi 25 Aralık 2022
Yayımlandığı Sayı Yıl 2022 Cilt: 5 Sayı: 2

Kaynak Göster

APA Bağcı Daş, D., & Genç, S. (2022). Investigating Word Association Mining Techniques. Veri Bilimi, 5(2), 106-114.



Dergimizin Tarandığı Dizinler (İndeksler)


Academic Resource Index

logo.png

journalseeker.researchbib.com

Google Scholar

scholar_logo_64dp.png

ASOS Index

asos-index.png

Rooting Index

logo.png

www.rootindexing.com

The JournalTOCs Index

journal-tocs-logo.jpg?w=584

www.journaltocs.ac.uk

General Impact Factor (GIF) Index

images?q=tbn%3AANd9GcQ0CrEQm4bHBnwh4XJv9I3ZCdHgQarj_qLyPTkGpeoRRmNh10eC

generalif.com

Directory of Research Journals Indexing

DRJI_Logo.jpg

olddrji.lbp.world/indexedJournals.aspx

I2OR Index

8c492a0a466f9b2cd59ec89595639a5c?AccessKeyId=245B99561176BAE11FEB&disposition=0&alloworigin=1

http://www.i2or.com/8.html



logo.png