Research Article

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

Volume: 11 Number: 40 May 3, 2020
  • Pius Marthın *
  • Duygu İçen *
TR EN

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

Abstract

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model ion was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.

Keywords

References

  1. Bhargava, Apurva, (2019). Grouping of Medicinal Drugs Used for Similar Symptoms by Mining Clusters from Drug Benefits Reviews. Available at SSRN: https://ssrn.com or http://dx.doi.org/10.2139/ssrn.3356314
  2. Denecke, K., Deng, Y, (2015). Sentiment analysis in medical settings: new opportunities and challenges. Artif. Intell. Med. 64(1), 17–27.
  3. Gräβer, F., Kallumadi, S., Malberg, H., & Zaunseder, S. (2018). Aspect-Based Sentiment Analysis of Drug Reviews Applying Cross-Domain and Cross-Data Learning. Proceedings of the 2018 International Conference on Digital Health - DH ’18. doi:10.1145/3194658.3194677 https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29
  4. IBM Corporation, (2013). Data-driven healthcare organizations use big data analytics for big gains. Somers, NY: IBM Corporation.
  5. Jimene-Zafra, S.M., Martín-Valdivia, M.T, Urena-Lopez, L.A., (2019). How do we talk about doctors and drugs? Sentiment analysis in forums expressing opinions for the medical domain. Artificial Intelligence in Medicine 93, 50–57. doi: 10.1016/j.artmed.2018.03.007
  6. Kerstin Denecke, (2015). Sentiment Analysis from Medical Texts. Springer International Publishing, Cham, 83–98. https://doi.org/10.1007/978-3-319-20582 3_10
  7. Kho S.J., Padhee S., Bajaj G., Thirunarayan K., Sheth A. (2019). Domain-Specific Use Cases for Knowledge-Enabled Social Media Analysis. In: Agarwal N.,
  8. Dokoohaki N., Tokdemir S. (eds) Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining. Lecture Notes in Social Networks. Springer, Cham.

Details

Primary Language

English

Subjects

-

Journal Section

Research Article

Authors

Pius Marthın * This is me
0000-0003-3529-0311
Türkiye

Duygu İçen * This is me
0000-0002-7940-5064
Türkiye

Publication Date

May 3, 2020

Submission Date

February 26, 2020

Acceptance Date

-

Published in Issue

Year 2020 Volume: 11 Number: 40

APA
Marthın, P., & İçen, D. (2020). Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance. AJIT-E: Academic Journal of Information Technology, 11(40), 8-23. https://doi.org/10.5824/ajite.2020.01.001.x
AMA
1.Marthın P, İçen D. Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance. AJIT-e: Academic Journal of Information Technology. 2020;11(40):8-23. doi:10.5824/ajite.2020.01.001.x
Chicago
Marthın, Pius, and Duygu İçen. 2020. “Application of Natural Language Processing With Supervised Machine Learning Techniques to Predict the Overall Drugs Performance”. AJIT-E: Academic Journal of Information Technology 11 (40): 8-23. https://doi.org/10.5824/ajite.2020.01.001.x.
EndNote
Marthın P, İçen D (May 1, 2020) Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance. AJIT-e: Academic Journal of Information Technology 11 40 8–23.
IEEE
[1]P. Marthın and D. İçen, “Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance”, AJIT-e: Academic Journal of Information Technology, vol. 11, no. 40, pp. 8–23, May 2020, doi: 10.5824/ajite.2020.01.001.x.
ISNAD
Marthın, Pius - İçen, Duygu. “Application of Natural Language Processing With Supervised Machine Learning Techniques to Predict the Overall Drugs Performance”. AJIT-e: Academic Journal of Information Technology 11/40 (May 1, 2020): 8-23. https://doi.org/10.5824/ajite.2020.01.001.x.
JAMA
1.Marthın P, İçen D. Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance. AJIT-e: Academic Journal of Information Technology. 2020;11:8–23.
MLA
Marthın, Pius, and Duygu İçen. “Application of Natural Language Processing With Supervised Machine Learning Techniques to Predict the Overall Drugs Performance”. AJIT-E: Academic Journal of Information Technology, vol. 11, no. 40, May 2020, pp. 8-23, doi:10.5824/ajite.2020.01.001.x.
Vancouver
1.Pius Marthın, Duygu İçen. Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance. AJIT-e: Academic Journal of Information Technology. 2020 May 1;11(40):8-23. doi:10.5824/ajite.2020.01.001.x