<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                                                <journal-id>data sci. j.</journal-id>
            <journal-title-group>
                                                                                    <journal-title>Veri Bilimi</journal-title>
            </journal-title-group>
                                        <issn pub-type="epub">2667-582X</issn>
                                                                                            <publisher>
                    <publisher-name>Murat GÖK</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id/>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Engineering</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Mühendislik</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                        <article-title>Investigating Word Association Mining Techniques</article-title>
                                                                                                                                                                                                <trans-title-group xml:lang="tr">
                                    <trans-title>Kelime Birliktelik Madenciliği Tekniklerinin İncelenmesi</trans-title>
                                </trans-title-group>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                <name>
                                    <surname>Bağcı Daş</surname>
                                    <given-names>Duygu</given-names>
                                </name>
                                                                    <aff>EGE UNIVERSITY</aff>
                                                            </contrib>
                                                    <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0003-4774-9265</contrib-id>
                                                                <name>
                                    <surname>Genç</surname>
                                    <given-names>Sevdanur</given-names>
                                </name>
                                                                    <aff>KASTAMONU UNIVERSITY</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20221225">
                    <day>12</day>
                    <month>25</month>
                    <year>2022</year>
                </pub-date>
                                        <volume>5</volume>
                                        <issue>2</issue>
                                        <fpage>106</fpage>
                                        <lpage>114</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20221007">
                        <day>10</day>
                        <month>07</month>
                        <year>2022</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20221121">
                        <day>11</day>
                        <month>21</month>
                        <year>2022</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2018, Veri Bilimi</copyright-statement>
                    <copyright-year>2018</copyright-year>
                    <copyright-holder>Veri Bilimi</copyright-holder>
                </permissions>
            
                                                                                                <abstract><p>This study presents the investigation of the effect of conditional entropy, mutual information (MI) values, log-likelihood ratio (LLR), and simple co-occurrences on extracting strong syntagmatic relationships. Experiments are conducted by using the Yelp Academic Dataset, which includes extracted 10.000 restaurant reviews. The mutual information values of word pairs are considered to extract the top syntagmatically related words from the corpus. For this purpose, Spyder 3.3.6 and Python Natural Language Toolkit (NLTK) Library are used. The mutual information values are then compared with simple co-occurrences count. The analysis results indicated that the three Word collocation techniques give similar results and therefore, all of those can be employed for Word collocations effectively.</p></abstract>
                                                                                                                                    <trans-abstract xml:lang="tr">
                            <p>Bu çalışma, koşullu entropi, ortak bilgi (MI) değerleri, log-birliktelik oranı (LLR) ve basit ortak oluşumların güçlü sözdizimsel ilişkilerin çıkarılması üzerindeki etkisinin araştırılmasını sunmaktadır. Deneyler, 10.000 restoran yorumunu içeren Yelp Akademik Veri Kümesi kullanılarak gerçekleştirilmiştir. Ortak bilgi değeri en yüksek sözcük çiftlerinin, söz dizimsel olarak ilişkili en üstteki sözcükleri derlemden çıkardığı kabul edilir. Bu amaçla Spyder 3.3.6 ve Python Natural Language Toolkit (NLTK) Library kullanılmıştır. Ortak bilgi değerleri daha sonra basit ortak oluşum sayısı ile karşılaştırılır. Analiz sonuçları, üç farklı kelime eşdizimleme tekniğinin benzer sonuçlar verdiğini ve bu nedenle, bunların hepsinin Kelime eşdizimleri için etkili bir şekilde kullanılabileceğini göstermiştir.</p></trans-abstract>
                                                            
            
                                                            <kwd-group>
                                                    <kwd>Word Collocation</kwd>
                                                    <kwd>  Collocation Mining</kwd>
                                                    <kwd>  Collocation extraction</kwd>
                                                    <kwd>  Mutual Information</kwd>
                                                    <kwd>  Text Mining</kwd>
                                            </kwd-group>
                                                        
                                                                            <kwd-group xml:lang="tr">
                                                    <kwd>Kelime birlikteliği</kwd>
                                                    <kwd>  Birliktelik madenciliği</kwd>
                                                    <kwd>  Eşdizim çıkarma</kwd>
                                                    <kwd>  Ortak bilgi</kwd>
                                                    <kwd>  Metin madenciliği</kwd>
                                            </kwd-group>
                                                                                                            </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">[1]	Zhai, C. X., Massung, S., Text Data Management and Analysis- A Practical Introduction to Information Retrieval and Text Mining, ACM Books , 2016.</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">[2]	Church, KW., Hanks, P., Word Association norms, mutual information and lexicography. Computational Linguistics, ACM Books , 1990.</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">[3]	Damani, O.P., Improving Pointwise Mutual Information (PMI) by incorporating Significant Co-occurrence. 17th Conference on Computational Natural Language Learning , 2013.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">[4]	F. H. Khan, U.Qamar, S. Bashir, SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection, Applied Soft Computing 39, 140–153, 2016.</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">[5]	A.K. Jain, Y. Pandey, Analysis and implementation of sentiment classification using lexical POS markers, Int. J. Comput. Commun. Netw. 2 (1) , 36-40, 2013.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">[6]	T. Xu, Q. Peng, Y. Cheng, Identifying the semantic orientation of terms using S-HAL for sentiment analysis, Knowl. Based Syst. 35, 279–289, 2012.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">[7]	Manning, C.D., Raghavan, R. and Schütze, H.,  Introduction to Information Retrieval, Cambridge University Press (2008).</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">[8]	Garrett, Michael, et al. &quot;Leveraging mutual information to generate domain specific lexicons.&quot; Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, &amp; Prediction and Behavior Representation in Modeling and Simulation, Washington DC, USA. 2018.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">[9]	Kang, Beom-mo. &quot;Collocation and word association: Comparing collocation measuring methods.&quot; International journal of corpus linguistics 23.1, 85-113, 2018.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">[10]	Lai, Huei-ling. &quot;Collocation analysis of news discourse and its ideological implications.&quot; Pragmatics 29.4 ,545-570, 2019.</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">[11]	Liu, Xiaoxia, et al. &quot;Recognition of collocation frames from sentences.&quot; IEICE TRANSACTIONS on Information and Systems 102.3, 620-627, 2019.</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">[12]	Williams, Christopher KI. &quot;On Suspicious Coincidences and Pointwise Mutual Information.&quot; arXiv preprint arXiv:2203.08089, 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">[13]	Krenn, Brigitte. &quot;Collocation mining: Exploiting corpora for collocation identification and representation.&quot; Entropy 1, 2000.</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">[14]	Zhang, Ke, et al. &quot;A Construction Method of Electric Power Professional Domain Corpus Based on Multi-model Collaboration.&quot; 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES). IEEE, 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">[15]	https://www.yelp.com/dataset</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
