Using Word Embeddings for Ontology Enrichment

İzzet Pembeci

doi:10.18201/ijisae.58806

Using Word Embeddings for Ontology Enrichment

Year 2016, Volume: 4 Issue: 3, 49 - 56, 01.11.2016

İzzet Pembeci

https://doi.org/10.18201/ijisae.58806

Cited By: 9

Abstract

Word embeddings, distributed word representations in a reduced linear space, show a lot of promise for accomplishing Natural Language Processing (NLP) tasks in an unsupervised manner. In this study, we investigate if the success of word2vec, a Neural Networks based word embeddings algorithm, can be replicated in an aggluginative language like Turkish. Turkish is more challenging than languages like English for complex NLP tasks because of her rich morphology. We picked ontology enrichment, again a relatively harder NLP task, as our test application. Firstly, we show how ontological relations can be extracted automaticaly from Turkish Wikipedia to construct a gold standard. Then by running experiments we show that the word vector representations produced by word2vec are useful to detect ontological relations encoded in Wikipedia. We propose a simple but yet effective weakly supervised ontology enrichment algorithm where for a given word a few know ontologically related concepts coupled with similarity scores computed via word2vec models can result in discovery of other related concepts. We argue how our algorithm can be improved and augmented to make it a viable component of an ontoloy learning and population framework.

Keywords

Neural Language Models, Word Embeddings; Ontology Enrichment; Ontology Population

References

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research. 12:2493-537.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111-3119).
Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E. 2011. Ontology Population and Enrichment: State of the Art. In Knowledge-Driven Multimedia Information Extraction and Ontology Evolution (pp. 134-166). Springer-Verlag.
Zouaq A, Gasevic D, Hatala M. 2011. Towards Open Ontology Learning and Filtering. Information Systems. 36(7):1064-81.
Tanev H, Magnini B. 2008. Weakly supervised approaches for ontology population. In Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge (pp. 129-143).
Rong X. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
Pennington J, Socher R, Manning CD. 2014. Glove: Global Vectors for Word Representation. In EMNLP 2014 (Vol. 14, pp. 1532-1543).
Ji S, Yun H, Yanardag P, Matsushima S, Vishwanathan SV. 2015. WordRank: Learning Word Embeddings via Robust Ranking. arXiv preprint arXiv:1506.02761.
Le QV, Mikolov T. 2014. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
Barkan O, Koenigstein N. 2016. Item2Vec: Neural Item Embedding for Collaborative Filtering. arXiv preprint arXiv:1603.04259.
Perozzi B, Al-Rfou R, Skiena S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.
Vilnis L, McCallum A. 2015. Word representations via gaussian embedding. In Proceedings of International Conference on Learning Representations 2015.
Arora S, Li Y, Liang Y, Ma T, Risteski A. 2015. Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings. arXiv preprint arXiv:1502.03520.
Levy O, Goldberg Y. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems 2014 (pp. 2177-2185).
Tamagawa S, Sakurai S, Tejima T, Morita T, Izumi N, Yamaguchi T. 2010. Learning a large scale of ontology from Japanese wikipedia. In Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE/WIC/ACM International Conference on 2010 Aug 31 (Vol. 1, pp. 279-286). IEEE.
Wu F, Weld DS. 2008. Automatically refining the wikipedia infobox ontology. In Proceedings of the 17th international conference on World Wide Web (pp. 635-644). ACM.
Janik M, Kochut KJ. 2008. Wikipedia in action: Ontological knowledge in text categorization. In Semantic Computing, 2008 IEEE International Conference (pp. 268-275). IEEE.
Kim HJ, Hong KJ. 2015. Building Semantic Concept Networks by Wikipedia-Based Formal Concept Analysis. Advanced Science Letters. 21(3):435-8.
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C. 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web. 6(2):167-95.
Hoffart J, Suchanek FM, Berberich K, Weikum G. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence. 194:28-61.
Hearst MA. 1992. Automatic acquisition of hyponyms from large text corpora. InProceedings of the 14th conference on Computational linguistics-Volume 2 (pp. 539-545). Association for Computational Linguistics.
Maynard D, Funk A, Peters W. 2008. Using lexico-syntactic ontology design patterns for ontology creation and population. In Proc. of the Workshop on Ontology Patterns.
Yeh E, Ramage D, Manning CD, Agirre E, Soroa A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. InProceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (pp. 41-49). Association for Computational Linguistics.
Zesch T, Gurevych I. 2007. Analysis of the Wikipedia category graph for NLP applications. InProceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1-8).
Van der Maaten L, Hinton G. 2008. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research. 9(2579-2605):85.
Řehůřek R., Sojka P. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.

Year 2016, Volume: 4 Issue: 3, 49 - 56, 01.11.2016

İzzet Pembeci

https://doi.org/10.18201/ijisae.58806

Cited By: 9

Abstract

References

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research. 12:2493-537.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111-3119).
Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E. 2011. Ontology Population and Enrichment: State of the Art. In Knowledge-Driven Multimedia Information Extraction and Ontology Evolution (pp. 134-166). Springer-Verlag.
Zouaq A, Gasevic D, Hatala M. 2011. Towards Open Ontology Learning and Filtering. Information Systems. 36(7):1064-81.
Tanev H, Magnini B. 2008. Weakly supervised approaches for ontology population. In Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge (pp. 129-143).
Rong X. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
Pennington J, Socher R, Manning CD. 2014. Glove: Global Vectors for Word Representation. In EMNLP 2014 (Vol. 14, pp. 1532-1543).
Ji S, Yun H, Yanardag P, Matsushima S, Vishwanathan SV. 2015. WordRank: Learning Word Embeddings via Robust Ranking. arXiv preprint arXiv:1506.02761.
Le QV, Mikolov T. 2014. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
Barkan O, Koenigstein N. 2016. Item2Vec: Neural Item Embedding for Collaborative Filtering. arXiv preprint arXiv:1603.04259.
Perozzi B, Al-Rfou R, Skiena S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.
Vilnis L, McCallum A. 2015. Word representations via gaussian embedding. In Proceedings of International Conference on Learning Representations 2015.
Arora S, Li Y, Liang Y, Ma T, Risteski A. 2015. Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings. arXiv preprint arXiv:1502.03520.
Levy O, Goldberg Y. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems 2014 (pp. 2177-2185).
Tamagawa S, Sakurai S, Tejima T, Morita T, Izumi N, Yamaguchi T. 2010. Learning a large scale of ontology from Japanese wikipedia. In Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE/WIC/ACM International Conference on 2010 Aug 31 (Vol. 1, pp. 279-286). IEEE.
Wu F, Weld DS. 2008. Automatically refining the wikipedia infobox ontology. In Proceedings of the 17th international conference on World Wide Web (pp. 635-644). ACM.
Janik M, Kochut KJ. 2008. Wikipedia in action: Ontological knowledge in text categorization. In Semantic Computing, 2008 IEEE International Conference (pp. 268-275). IEEE.
Kim HJ, Hong KJ. 2015. Building Semantic Concept Networks by Wikipedia-Based Formal Concept Analysis. Advanced Science Letters. 21(3):435-8.
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C. 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web. 6(2):167-95.
Hoffart J, Suchanek FM, Berberich K, Weikum G. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence. 194:28-61.
Hearst MA. 1992. Automatic acquisition of hyponyms from large text corpora. InProceedings of the 14th conference on Computational linguistics-Volume 2 (pp. 539-545). Association for Computational Linguistics.
Maynard D, Funk A, Peters W. 2008. Using lexico-syntactic ontology design patterns for ontology creation and population. In Proc. of the Workshop on Ontology Patterns.
Yeh E, Ramage D, Manning CD, Agirre E, Soroa A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. InProceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (pp. 41-49). Association for Computational Linguistics.
Zesch T, Gurevych I. 2007. Analysis of the Wikipedia category graph for NLP applications. InProceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1-8).
Van der Maaten L, Hinton G. 2008. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research. 9(2579-2605):85.
Řehůřek R., Sojka P. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.

There are 27 citations in total.

Details

Journal Section	Research Article
Authors	İzzet Pembeci This is me
Publication Date	November 1, 2016
Published in Issue	Year 2016 Volume: 4 Issue: 3

Cite

APA	Pembeci, İ. (2016). Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering, 4(3), 49-56. https://doi.org/10.18201/ijisae.58806
AMA	Pembeci İ. Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering. November 2016;4(3):49-56. doi:10.18201/ijisae.58806
Chicago	Pembeci, İzzet. “Using Word Embeddings for Ontology Enrichment”. International Journal of Intelligent Systems and Applications in Engineering 4, no. 3 (November 2016): 49-56. https://doi.org/10.18201/ijisae.58806.
EndNote	Pembeci İ (November 1, 2016) Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering 4 3 49–56.
IEEE	İ. Pembeci, “Using Word Embeddings for Ontology Enrichment”, International Journal of Intelligent Systems and Applications in Engineering, vol. 4, no. 3, pp. 49–56, 2016, doi: 10.18201/ijisae.58806.
ISNAD	Pembeci, İzzet. “Using Word Embeddings for Ontology Enrichment”. International Journal of Intelligent Systems and Applications in Engineering 4/3 (November 2016), 49-56. https://doi.org/10.18201/ijisae.58806.
JAMA	Pembeci İ. Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering. 2016;4:49–56.
MLA	Pembeci, İzzet. “Using Word Embeddings for Ontology Enrichment”. International Journal of Intelligent Systems and Applications in Engineering, vol. 4, no. 3, 2016, pp. 49-56, doi:10.18201/ijisae.58806.
Vancouver	Pembeci İ. Using Word Embeddings for Ontology Enrichment. International Journal of Intelligent Systems and Applications in Engineering. 2016;4(3):49-56.