Research Article
BibTex RIS Cite

Latent Dirichlet Allocation for Medical Dataset

Year 2020, Volume: 22 Issue: 64, 67 - 80, 24.01.2020
https://doi.org/10.21205/deufmd.2020226408

Abstract

Examination
of the literature in the relevant field is a very important stage in scientific
studies. When the literature is reviewed manually, it is not possible to
perform a comprehensive review or such a search takes a very long time. On the
other hand, the automatic search of the literature does not enable in-depth
semantic search. In this study, a topic modelling method Latent Dirichlet
Allocation (LDA), that performs the automatic and semantic analysis of medical
articles published by researchers in Turkey, is applied. The experimental study
was carried out on articles in the medical literature in the last 11 (eleven) years
from PubMed, which is a medical database based on years. When the experimental
results are analyzed, it has been observed that the titles, which have trend in
the last 11 (eleven) years, have been discovered successfully.

References

  • Blei, D.M., Ng, A.Y. 2003. Latent dirichlet allocation, The Journal of Machine Learning Research, Cilt. 3, s. 993-1022.
  • Agrawal, A., Fu, W., Menzies, T. 2018. What is wrong with topic modeling? And how to fix it using search-based software engineering, Information and Software Technology, Cilt. 98, s. 74-88. DOI: 10.1016/j.infsof.2018.02.005
  • Holzinger, A., Dehmer, M., Jurisica, I. 2014. Knowledge discovery and interactive data mining in bioinformatics: State-of-the-art, future challenges and research directions, BMC Bioinformatics, Cilt. 15, s. 1-9. DOI: 10.1186/1471-2105-15-S6-I1
  • Lin, J.M., Bohland, J.V., Andrews, P., Burns, G.A.P.C., Allen, C.B., Mitra, P.P. 2008. An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006, PLoS One, Cilt. 3, s. 1-9. DOI: 10.1371/journal.pone.0002052
  • Bundschus, M., Tresp, V., Kriegel, H.P. 2009. Topic Models for Semantically Annotated Document Collections. NIPS Workshop: Applications for Topic Models: Text and Beyond, 7-10 Aralık, Whistler, 1-4.
  • Jiang, Z., Zhou, X., Zhang, X., Chen, S. 2012. Using Link Topic Model to Analyze Traditional Chinese Medicine Clinical Symptom-Herb Regularities. 2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom), 10-13 Ekim, Pekin, 15-18.
  • Redfield, C.K., Lou, X., Karaletsos, T., Crosbie, C., Gardos, S., Artz, D., Ratsch, G. 2013. An empirical analysis of topic modeling for mining cancer clinical notes. 2013 IEEE 13th International Conference on Data Mining Workshops, 7-10 Aralık, Dallas, 56-63.
  • Cui, M., Liang, Y., Li, Y., Guan, R. 2015. Exploring Trends of Cancer Research Based on Topic Model. 1st International Workshop on Semantic Technologies, 9-12 Mart, Changchun, 7-18.
  • Beykikhoshk, A., Arandjelović, O., Venkatesh, S., Phung, D. 2015. Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature. ss 550-562. Cao, T., Lim, E.P., Zhou, Z.H., Ho, T.B., Cheung, D., Motoda, H., ed. 2015. Advances in Knowledge Discovery and Data Mining, Springer Cham, Switzerland, 763s.
  • Song, M., Heo, G.E., Lee, D. 2015. Identifying the landscape of Alzheimer’s disease research with network and content analysis, Scientometrics, Cilt. 102, s. 905-927. DOI: 10.1007/s11192-014-1372-x
  • van Altena, A.J., Moerland, P.D., Zwinderman, A.H., Olabarriaga, S.D. 2016. Understanding big data themes from scientific biomedical literature through topic modeling, Journal of Big Data, Cilt. 3, s. 1-21. DOI: 10.1186/s40537-016-0057-0
  • Speier, W., Ong, M.K., Arnold, C.W. 2016. Using phrases and document metadata to improve topic modeling of clinical reports, Journal of Biomedical Informatics, Cilt. 61C, s. 260-266. DOI: 10.1016/j.jbi.2016.04.005
  • Hahn, A., Mohanty, S.D., Manda, P. 2017. What’s Hot and What’s Not? - Exploring Trends in Bioinformatics Literature Using Topic Modeling and Keyword Analysis. ss 279-290. Cai, Z., Daescu, O., Li, M., ed. 2017. Bioinformatics Research and Applications, Springer Cham, Switzerland, 499s.
  • Drosatos, G., Kavvadias, S.E., Kaldoudi, E. 2018. Topics and Trends Analysis in eHealth Literature. ss 563-566. Eskola, H., Väisänen, O., Viik, J., Hyttinen, J., ed. 2018. IFMBE Proceedings, Springer Singapore, Singapore, 1139s.
  • Steyvers, M., Griffiths, T. 2007. Probabilistic Topic Models. ss 1-15. Landauer, T., McNamara, D.S., Dennis, S., Kintsch, W., ed. 2007. Handbook of Latent Semantic Analysis: A Road to Meaning, Psychology Press, USA, 544s.
  • Lu, Y., Mei, Q., Zhai, C.X. 2011. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, Cilt. 14, s. 178-203. DOI: 10.1007/s10791-010-9141-9
  • Blei, D.M. 2012. Probabilistic Topic Models, Communucations of the ACM, Cilt. 55, s. 77-84. DOI: 10.1145/2133806.2133826
  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harsman, R. 1990. Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, Cilt. 41, s. 391-407. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  • Hoffman, T. 1999. Probabilistic Latent Semantic Analysis. Fifteenth Conference on Uncertainty in Artificial Intelligence, 20 Temmuz-1 Ağustos, Stockholm, 289-296.
  • Popescul, A., Ungar, L., Pennock, D., Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. 17th Conference in Uncertainty in Artificial Intelligence, 2-5 Ağustos, Washington, 437-444.
  • Ng, K.W., Tian, G.L., Tang, M.L. 2011. Dirichlet and Related Distributions: Theory, Methods and Applications. Wiley, New York, 337s.
  • Ekinci, E., İlhan Omurca, S. 2017. Ürün Özelliklerinin Konu Modelleme Yöntemi ile Çıkarılması, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, Cilt. 9, s. 51-58.
  • Griffiths, T.L., Steyvers, M. 2004. Finding Scientific Topics, Proceedings of the National Academy of Sciences of the United States of America, Cilt. 101, s. 5228-5235. DOI: 10.1073/pnas.0307752101
  • Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A. 2011. Optimizing semantic coherence in topic models. Conference on empirical methods in natural language processing, 27-31 Temmuz, Edinburgh, 262-272.
  • Newman, D., Lau, J.H., Grieser, K., Baldwin, T., McCallum, A. 2010. Automatic evaluation of topic coherence. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2-4 Haziran, California, 100-108.
  • Atıcı, B., İlhan Omurca, S., Ekinci, E. 2017. Kullanıcı Şikayetlerindeki Ürün Özelliklerinin Gizli Dirichlet Ayırımı ile Saptanması. 2017 Uluslararası Bilgisayar Bilimleri ve Mühendisliği Konferansı, 5-8 Ekim, Antalya, 250-254.

Tıp Veri Kümesi için Gizli Dirichlet Ayrımı

Year 2020, Volume: 22 Issue: 64, 67 - 80, 24.01.2020
https://doi.org/10.21205/deufmd.2020226408

Abstract



Bilimsel çalışmalarda ilgili alandaki literatürün
incelenmesi oldukça önemli bir aşamadır. Literatür insan tarafından
tarandığında, geniş kapsamlı bir inceleme yapılması mümkün olamamakta, ya da
böyle bir arama çok uzun zaman almaktadır. Öte yandan literatürün otomatik
olarak taranması derinlemesine bir anlamsal analizi mümkün kılmamaktadır. Bu
çalışma kapsamında Türkiye’deki araştırmacılar tarafından yayınlanmış tıp
makalelerinin otomatik ve anlamsal analizini gerçekleştiren bir konu
modelleme yöntemi olan Gizli Dirichlet Ayrımı (GDA) uygulanmıştır. Deneysel
çalışma, yıllara göre bir tıp veritabanı olan PubMed’den elde edilen son 11 (on
bir) yıldaki yayınla tıp literatüründeki makaleler üzerinde gerçeklenmiştir.
Deneysel sonuçlar incelendiğinde, son 11 (on bir) yılda trend olan çalışma
başlıklarının başarılı bir şekilde keşfedildiği gözlenmiştir.


References

  • Blei, D.M., Ng, A.Y. 2003. Latent dirichlet allocation, The Journal of Machine Learning Research, Cilt. 3, s. 993-1022.
  • Agrawal, A., Fu, W., Menzies, T. 2018. What is wrong with topic modeling? And how to fix it using search-based software engineering, Information and Software Technology, Cilt. 98, s. 74-88. DOI: 10.1016/j.infsof.2018.02.005
  • Holzinger, A., Dehmer, M., Jurisica, I. 2014. Knowledge discovery and interactive data mining in bioinformatics: State-of-the-art, future challenges and research directions, BMC Bioinformatics, Cilt. 15, s. 1-9. DOI: 10.1186/1471-2105-15-S6-I1
  • Lin, J.M., Bohland, J.V., Andrews, P., Burns, G.A.P.C., Allen, C.B., Mitra, P.P. 2008. An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006, PLoS One, Cilt. 3, s. 1-9. DOI: 10.1371/journal.pone.0002052
  • Bundschus, M., Tresp, V., Kriegel, H.P. 2009. Topic Models for Semantically Annotated Document Collections. NIPS Workshop: Applications for Topic Models: Text and Beyond, 7-10 Aralık, Whistler, 1-4.
  • Jiang, Z., Zhou, X., Zhang, X., Chen, S. 2012. Using Link Topic Model to Analyze Traditional Chinese Medicine Clinical Symptom-Herb Regularities. 2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom), 10-13 Ekim, Pekin, 15-18.
  • Redfield, C.K., Lou, X., Karaletsos, T., Crosbie, C., Gardos, S., Artz, D., Ratsch, G. 2013. An empirical analysis of topic modeling for mining cancer clinical notes. 2013 IEEE 13th International Conference on Data Mining Workshops, 7-10 Aralık, Dallas, 56-63.
  • Cui, M., Liang, Y., Li, Y., Guan, R. 2015. Exploring Trends of Cancer Research Based on Topic Model. 1st International Workshop on Semantic Technologies, 9-12 Mart, Changchun, 7-18.
  • Beykikhoshk, A., Arandjelović, O., Venkatesh, S., Phung, D. 2015. Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature. ss 550-562. Cao, T., Lim, E.P., Zhou, Z.H., Ho, T.B., Cheung, D., Motoda, H., ed. 2015. Advances in Knowledge Discovery and Data Mining, Springer Cham, Switzerland, 763s.
  • Song, M., Heo, G.E., Lee, D. 2015. Identifying the landscape of Alzheimer’s disease research with network and content analysis, Scientometrics, Cilt. 102, s. 905-927. DOI: 10.1007/s11192-014-1372-x
  • van Altena, A.J., Moerland, P.D., Zwinderman, A.H., Olabarriaga, S.D. 2016. Understanding big data themes from scientific biomedical literature through topic modeling, Journal of Big Data, Cilt. 3, s. 1-21. DOI: 10.1186/s40537-016-0057-0
  • Speier, W., Ong, M.K., Arnold, C.W. 2016. Using phrases and document metadata to improve topic modeling of clinical reports, Journal of Biomedical Informatics, Cilt. 61C, s. 260-266. DOI: 10.1016/j.jbi.2016.04.005
  • Hahn, A., Mohanty, S.D., Manda, P. 2017. What’s Hot and What’s Not? - Exploring Trends in Bioinformatics Literature Using Topic Modeling and Keyword Analysis. ss 279-290. Cai, Z., Daescu, O., Li, M., ed. 2017. Bioinformatics Research and Applications, Springer Cham, Switzerland, 499s.
  • Drosatos, G., Kavvadias, S.E., Kaldoudi, E. 2018. Topics and Trends Analysis in eHealth Literature. ss 563-566. Eskola, H., Väisänen, O., Viik, J., Hyttinen, J., ed. 2018. IFMBE Proceedings, Springer Singapore, Singapore, 1139s.
  • Steyvers, M., Griffiths, T. 2007. Probabilistic Topic Models. ss 1-15. Landauer, T., McNamara, D.S., Dennis, S., Kintsch, W., ed. 2007. Handbook of Latent Semantic Analysis: A Road to Meaning, Psychology Press, USA, 544s.
  • Lu, Y., Mei, Q., Zhai, C.X. 2011. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, Cilt. 14, s. 178-203. DOI: 10.1007/s10791-010-9141-9
  • Blei, D.M. 2012. Probabilistic Topic Models, Communucations of the ACM, Cilt. 55, s. 77-84. DOI: 10.1145/2133806.2133826
  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harsman, R. 1990. Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, Cilt. 41, s. 391-407. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  • Hoffman, T. 1999. Probabilistic Latent Semantic Analysis. Fifteenth Conference on Uncertainty in Artificial Intelligence, 20 Temmuz-1 Ağustos, Stockholm, 289-296.
  • Popescul, A., Ungar, L., Pennock, D., Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. 17th Conference in Uncertainty in Artificial Intelligence, 2-5 Ağustos, Washington, 437-444.
  • Ng, K.W., Tian, G.L., Tang, M.L. 2011. Dirichlet and Related Distributions: Theory, Methods and Applications. Wiley, New York, 337s.
  • Ekinci, E., İlhan Omurca, S. 2017. Ürün Özelliklerinin Konu Modelleme Yöntemi ile Çıkarılması, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, Cilt. 9, s. 51-58.
  • Griffiths, T.L., Steyvers, M. 2004. Finding Scientific Topics, Proceedings of the National Academy of Sciences of the United States of America, Cilt. 101, s. 5228-5235. DOI: 10.1073/pnas.0307752101
  • Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A. 2011. Optimizing semantic coherence in topic models. Conference on empirical methods in natural language processing, 27-31 Temmuz, Edinburgh, 262-272.
  • Newman, D., Lau, J.H., Grieser, K., Baldwin, T., McCallum, A. 2010. Automatic evaluation of topic coherence. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2-4 Haziran, California, 100-108.
  • Atıcı, B., İlhan Omurca, S., Ekinci, E. 2017. Kullanıcı Şikayetlerindeki Ürün Özelliklerinin Gizli Dirichlet Ayırımı ile Saptanması. 2017 Uluslararası Bilgisayar Bilimleri ve Mühendisliği Konferansı, 5-8 Ekim, Antalya, 250-254.
There are 26 citations in total.

Details

Primary Language Turkish
Journal Section Research Article
Authors

Ekin Ekinci 0000-0003-0658-592X

Sevinç İlhan Omurca 0000-0003-1214-9235

Elif Kırık This is me 0000-0001-7314-4521

Şeymanur Taşçı This is me 0000-0002-5872-6731

Publication Date January 24, 2020
Published in Issue Year 2020 Volume: 22 Issue: 64

Cite

APA Ekinci, E., Omurca, S. İ., Kırık, E., Taşçı, Ş. (2020). Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, 22(64), 67-80. https://doi.org/10.21205/deufmd.2020226408
AMA Ekinci E, Omurca Sİ, Kırık E, Taşçı Ş. Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. DEUFMD. January 2020;22(64):67-80. doi:10.21205/deufmd.2020226408
Chicago Ekinci, Ekin, Sevinç İlhan Omurca, Elif Kırık, and Şeymanur Taşçı. “Tıp Veri Kümesi için Gizli Dirichlet Ayrımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi 22, no. 64 (January 2020): 67-80. https://doi.org/10.21205/deufmd.2020226408.
EndNote Ekinci E, Omurca Sİ, Kırık E, Taşçı Ş (January 1, 2020) Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 22 64 67–80.
IEEE E. Ekinci, S. İ. Omurca, E. Kırık, and Ş. Taşçı, “Tıp Veri Kümesi için Gizli Dirichlet Ayrımı”, DEUFMD, vol. 22, no. 64, pp. 67–80, 2020, doi: 10.21205/deufmd.2020226408.
ISNAD Ekinci, Ekin et al. “Tıp Veri Kümesi için Gizli Dirichlet Ayrımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 22/64 (January 2020), 67-80. https://doi.org/10.21205/deufmd.2020226408.
JAMA Ekinci E, Omurca Sİ, Kırık E, Taşçı Ş. Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. DEUFMD. 2020;22:67–80.
MLA Ekinci, Ekin et al. “Tıp Veri Kümesi için Gizli Dirichlet Ayrımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, vol. 22, no. 64, 2020, pp. 67-80, doi:10.21205/deufmd.2020226408.
Vancouver Ekinci E, Omurca Sİ, Kırık E, Taşçı Ş. Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. DEUFMD. 2020;22(64):67-80.

Dokuz Eylül Üniversitesi, Mühendislik Fakültesi Dekanlığı Tınaztepe Yerleşkesi, Adatepe Mah. Doğuş Cad. No: 207-I / 35390 Buca-İZMİR.