Research Article
BibTex RIS Cite

Investigating Word Association Mining Techniques

Year 2022, Volume: 5 Issue: 2, 106 - 114, 25.12.2022

Abstract

This study presents the investigation of the effect of conditional entropy, mutual information (MI) values, log-likelihood ratio (LLR), and simple co-occurrences on extracting strong syntagmatic relationships. Experiments are conducted by using the Yelp Academic Dataset, which includes extracted 10.000 restaurant reviews. The mutual information values of word pairs are considered to extract the top syntagmatically related words from the corpus. For this purpose, Spyder 3.3.6 and Python Natural Language Toolkit (NLTK) Library are used. The mutual information values are then compared with simple co-occurrences count. The analysis results indicated that the three Word collocation techniques give similar results and therefore, all of those can be employed for Word collocations effectively.

References

  • [1] Zhai, C. X., Massung, S., Text Data Management and Analysis- A Practical Introduction to Information Retrieval and Text Mining, ACM Books , 2016.
  • [2] Church, KW., Hanks, P., Word Association norms, mutual information and lexicography. Computational Linguistics, ACM Books , 1990.
  • [3] Damani, O.P., Improving Pointwise Mutual Information (PMI) by incorporating Significant Co-occurrence. 17th Conference on Computational Natural Language Learning , 2013.
  • [4] F. H. Khan, U.Qamar, S. Bashir, SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection, Applied Soft Computing 39, 140–153, 2016.
  • [5] A.K. Jain, Y. Pandey, Analysis and implementation of sentiment classification using lexical POS markers, Int. J. Comput. Commun. Netw. 2 (1) , 36-40, 2013.
  • [6] T. Xu, Q. Peng, Y. Cheng, Identifying the semantic orientation of terms using S-HAL for sentiment analysis, Knowl. Based Syst. 35, 279–289, 2012.
  • [7] Manning, C.D., Raghavan, R. and Schütze, H., Introduction to Information Retrieval, Cambridge University Press (2008).
  • [8] Garrett, Michael, et al. "Leveraging mutual information to generate domain specific lexicons." Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation, Washington DC, USA. 2018.
  • [9] Kang, Beom-mo. "Collocation and word association: Comparing collocation measuring methods." International journal of corpus linguistics 23.1, 85-113, 2018.
  • [10] Lai, Huei-ling. "Collocation analysis of news discourse and its ideological implications." Pragmatics 29.4 ,545-570, 2019.
  • [11] Liu, Xiaoxia, et al. "Recognition of collocation frames from sentences." IEICE TRANSACTIONS on Information and Systems 102.3, 620-627, 2019.
  • [12] Williams, Christopher KI. "On Suspicious Coincidences and Pointwise Mutual Information." arXiv preprint arXiv:2203.08089, 2022.
  • [13] Krenn, Brigitte. "Collocation mining: Exploiting corpora for collocation identification and representation." Entropy 1, 2000.
  • [14] Zhang, Ke, et al. "A Construction Method of Electric Power Professional Domain Corpus Based on Multi-model Collaboration." 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES). IEEE, 2022.
  • [15] https://www.yelp.com/dataset

Kelime Birliktelik Madenciliği Tekniklerinin İncelenmesi

Year 2022, Volume: 5 Issue: 2, 106 - 114, 25.12.2022

Abstract

Bu çalışma, koşullu entropi, ortak bilgi (MI) değerleri, log-birliktelik oranı (LLR) ve basit ortak oluşumların güçlü sözdizimsel ilişkilerin çıkarılması üzerindeki etkisinin araştırılmasını sunmaktadır. Deneyler, 10.000 restoran yorumunu içeren Yelp Akademik Veri Kümesi kullanılarak gerçekleştirilmiştir. Ortak bilgi değeri en yüksek sözcük çiftlerinin, söz dizimsel olarak ilişkili en üstteki sözcükleri derlemden çıkardığı kabul edilir. Bu amaçla Spyder 3.3.6 ve Python Natural Language Toolkit (NLTK) Library kullanılmıştır. Ortak bilgi değerleri daha sonra basit ortak oluşum sayısı ile karşılaştırılır. Analiz sonuçları, üç farklı kelime eşdizimleme tekniğinin benzer sonuçlar verdiğini ve bu nedenle, bunların hepsinin Kelime eşdizimleri için etkili bir şekilde kullanılabileceğini göstermiştir.

References

  • [1] Zhai, C. X., Massung, S., Text Data Management and Analysis- A Practical Introduction to Information Retrieval and Text Mining, ACM Books , 2016.
  • [2] Church, KW., Hanks, P., Word Association norms, mutual information and lexicography. Computational Linguistics, ACM Books , 1990.
  • [3] Damani, O.P., Improving Pointwise Mutual Information (PMI) by incorporating Significant Co-occurrence. 17th Conference on Computational Natural Language Learning , 2013.
  • [4] F. H. Khan, U.Qamar, S. Bashir, SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection, Applied Soft Computing 39, 140–153, 2016.
  • [5] A.K. Jain, Y. Pandey, Analysis and implementation of sentiment classification using lexical POS markers, Int. J. Comput. Commun. Netw. 2 (1) , 36-40, 2013.
  • [6] T. Xu, Q. Peng, Y. Cheng, Identifying the semantic orientation of terms using S-HAL for sentiment analysis, Knowl. Based Syst. 35, 279–289, 2012.
  • [7] Manning, C.D., Raghavan, R. and Schütze, H., Introduction to Information Retrieval, Cambridge University Press (2008).
  • [8] Garrett, Michael, et al. "Leveraging mutual information to generate domain specific lexicons." Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation, Washington DC, USA. 2018.
  • [9] Kang, Beom-mo. "Collocation and word association: Comparing collocation measuring methods." International journal of corpus linguistics 23.1, 85-113, 2018.
  • [10] Lai, Huei-ling. "Collocation analysis of news discourse and its ideological implications." Pragmatics 29.4 ,545-570, 2019.
  • [11] Liu, Xiaoxia, et al. "Recognition of collocation frames from sentences." IEICE TRANSACTIONS on Information and Systems 102.3, 620-627, 2019.
  • [12] Williams, Christopher KI. "On Suspicious Coincidences and Pointwise Mutual Information." arXiv preprint arXiv:2203.08089, 2022.
  • [13] Krenn, Brigitte. "Collocation mining: Exploiting corpora for collocation identification and representation." Entropy 1, 2000.
  • [14] Zhang, Ke, et al. "A Construction Method of Electric Power Professional Domain Corpus Based on Multi-model Collaboration." 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES). IEEE, 2022.
  • [15] https://www.yelp.com/dataset
There are 15 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Duygu Bağcı Daş

Sevdanur Genç 0000-0003-4774-9265

Publication Date December 25, 2022
Published in Issue Year 2022 Volume: 5 Issue: 2

Cite

APA Bağcı Daş, D., & Genç, S. (2022). Investigating Word Association Mining Techniques. Veri Bilimi, 5(2), 106-114.



Dergimizin Tarandığı Dizinler (İndeksler)


Academic Resource Index

logo.png

journalseeker.researchbib.com

Google Scholar

scholar_logo_64dp.png

ASOS Index

asos-index.png

Rooting Index

logo.png

www.rootindexing.com

The JournalTOCs Index

journal-tocs-logo.jpg?w=584

www.journaltocs.ac.uk

General Impact Factor (GIF) Index

images?q=tbn%3AANd9GcQ0CrEQm4bHBnwh4XJv9I3ZCdHgQarj_qLyPTkGpeoRRmNh10eC

generalif.com

Directory of Research Journals Indexing

DRJI_Logo.jpg

olddrji.lbp.world/indexedJournals.aspx

I2OR Index

8c492a0a466f9b2cd59ec89595639a5c?AccessKeyId=245B99561176BAE11FEB&disposition=0&alloworigin=1

http://www.i2or.com/8.html



logo.png