AUTOMATIC DISCOVERY OF SIMILAR WORDS BY SUBSTITUTE VECTORS

İnci Düzenli; M. Fatih Amasyalı

Research Article

AUTOMATIC DISCOVERY OF SIMILAR WORDS BY SUBSTITUTE VECTORS

Year 2016, Volume: 34 Issue: 1, 125 - 133, 01.03.2016

Abstract

Patterns between words are generally used for automatic information extraction. However, the patterns can only find related words close to each other. In this study, a method based on substitute vectors can overcome of this difficulty. Firstly, the word sets having the same substitute vector are constructed. Then, similar word sets are obtained according to the number of co-occurring sets. In this sets, semantically relatedness ratio is above 70%. The proposed method is unsupervised. Because, it does not require any seed words manually labeled.

Keywords

Automatic information extraction , discovery of similar words , synonyms , near-synonyms , substitute vectors , natural language processing , artificial intelligence.

References

[1] Tarau, Paul, ve Elizabeth Figa. “Knowledge-based conversational agents and virtual storytelling”, Proceedings of the 2004 ACM symposium on Applied computing, ACM, 2004
[2] Kathy Panton, Cynthia Matuszek , Douglas Lenat , Dave Schneider, Michael Witbrock, Nick Siegel, Blake Shepard, “Common sense reasoning–from Cyc to intelligent assistant”, Ambient Intelligence in Everyday Life. Springer Berlin Heidelberg, 2006.
[3] Al-Zubaide, Hadeel, ve Ayman A. Issa. “Ontbot: Ontology based chatbot”, Innovation in Information & Communication Technology (ISIICT), 2011 Fourth International Symposium on. IEEE, 2011.
[4] Cambria, Erik, Catherine Havasi, ve Amir Hussain. “SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis”, FLAIRS Conference. 2012.
[5] Cambria, E., Song, Y., Wang, H., ve Howard, N., “Semantic multi-dimensional scaling for open-domain sentiment analysis”, In IEEE Intelligent Systems, DOI: 10.1109/MIS.2012.118, 2013.
[6] Chia-Hui Chang, Kayed Mohammed, Girgis, M.R., Shaalan, K.F., “A Survey of Web Information Extraction Systems”,Knowledge and Data Engineering, IEEE Transactions on, Vol:18(10), pp.1411 - 1428, 2006.
[7] Hearst, M., “Automated Discovery of WordNet Relations in WordNet: An Electronic Lexical Database”, Christiane Fellbaum (ed.), MIT Press, 1998.
[8] M.Fatih Amasyalı, “Automatic Construction of Turkish Wordnet”, SIU 2005.
[9] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., Tom M. Mitchell, “Toward an Architecture for Never-Ending Language Learning”, AAAI Publications, Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
[10] Chia-Hui Chang, Chun-Nan Hsu, Shao-Chen Lui, “Automatic Information Extraction from Semi-Structured Web Pages By Pattern Discovery”, Decision Support Systems J., vol. 35, no. 1, pp. 129-147, 2003.
[11] Emre Yazıcı, M.Fatih Amasyalı, “Automatic Extraction of Semantic Relationships Using Turkish Dictionary Definitions”, EMO Bilimsel Dergi, Vol. 1, No. 1, pp. 1-13, 2011.
[12] Deniz Yüret, “Word sense disambiguation by substitution”, SemEval-2007, pages 207-214, Prague, Czech Republic.
[13] H. Chen, K.J. Lynch, “Automatic construction of networks of concepts characterizing document databases”, IEEE Transactions on Systems, Man and Cybernetics, 22(5):885–902, 1992.
[14] Z.S., Haris, “Mathematical structures of language”, Wiley, s.12, 1968.
[15] P. D. Turney, “Mining the Web for synonyms: PMI-IR versus LSA on TOEFL”, In Proceedings of the European Conference on Machine Learning, p. 491–502, 2001.
[16] J.M. Kleinberg, “Authoritative sources in a hyperlinked environment”, Journal of the ACM, 46(5):604–632, 1999.
[17] Pierre P. Senellart, Vincent D. Blondel, “Automatic Discovery of Similar Words”, Survey of Text Mining Clustering, Classification, and Retrieval Berry, Michael W. (Ed.), pp.2-44, 2004.
[18] Yatbaz, A. Y., Sert E., Yuret D., “Learning Sysntactic Categories Using Paradigmatic Representations of Word Context”, EMNLP-CoNLL 2012, Jeju Island, Korea.
[19] Can, F., Koçberber, S., Bağlıoğlu, O., Kardaş, S., Öcalan, H.C., Uyar, E., “Türkçe haberlerde yeni olay bulma ve izleme: Bir deney derleminin oluşturulması”, Akademik Bilişim Sempozyumu, 2009.
[20] Ahmet Afşin Akın, Mehmet Dündar Akın, Zemberek, an open source NLP framework for Turkic Languages, Yayınlanmamış çalışma, 2007, http://zemberek.googlecode.com/files/zemberek_makale.pdf

There are 20 citations in total.

Details

Primary Language	English
Journal Section	Research Articles
Authors	İnci Düzenli This is me M. Fatih Amasyalı This is me
Publication Date	March 1, 2016
Submission Date	November 4, 2013
Published in Issue	Year 2016 Volume: 34 Issue: 1

Cite

Vancouver	Düzenli İ, Amasyalı MF. AUTOMATIC DISCOVERY OF SIMILAR WORDS BY SUBSTITUTE VECTORS. SIGMA. 2016;34(1):125-33.

Download Cover Image

Article Files

Full Text

IMPORTANT NOTE: JOURNAL SUBMISSION LINK https://eds.yildiz.edu.tr/sigma/