Year 2019, Volume 23 , Issue 1, Pages 126 - 132 2019-04-01

LASSO ve Relief Özellik Seçimi Yöntemleri ile DVM, ÇKA ve RO Ağ Yapıları Kullanılarak DNA Mikroçip Gen İfadesi Verisetlerinin Sınıflandırılması
DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO

Kıvanç GÜÇKIRAN [1] , İsmail CANTÜRK [2] , Lale ÖZYILMAZ [3]


 DNA Mikroçip teknolojisi, çok sayıda gen ifadesinin aynı anda gözlemlenebilmesini sağlayan özgün bir yöntemdir. Günümüzde bu gen ifadeleri bir çok hastalığı teşhis etmek için kullanılmaktadırlar. Bu çalışma iki özellik seçimi ve ağ yapısını çaprazlayarak birden çok verisetinde karşılaştırma yapmaktadır. Mikroçip verisetlerinde her bir örneğin gen sayısı çok sayıda olduğu için, bilgi kazancı en yüksek olan gen seçimi yapılmalıdır. Bu seçim için Relief ve LASSO özellik seçimi yöntemlerini kullandık. En önemli genler örnekten seçildikten sonra Destek Vektör Makinası (DVM), Çok Katmanlı Algılayıcı (ÇKA) ve Rastgele Orman (RO) gibi sıklıkla kullanılan sınıflandırıcılar kullanılarak veri sınıflandırıldı. LASSO özellik seçimi ve DVM daha önceki çalışmaları doğruluk ve eğitim hızı bakımından geride bırakmaktadır.

DNA microarray technology is a novel method to monitor expression levels of large number of genes simultaneously. These gene expressions can be and is being used to detect various forms of diseases. Using multiple microarray datasets, this paper cross compares two different methods for classification and feature selection. Since individual gene count in microarray datas are too many, most informative genes should be selected and used. For this selection, we have tried Relief and LASSO feature selection methods. After selecting informative genes from microarray data, classification is performed with Support Vector Machines (SVM) and Multilayer Perceptron Networks (MLP) which both are widely used in multiple classification tasks. The overall accuracy with LASSO and SVM outperforms most of the approaches proposed.

  • [1] Schena, M., Shalon, D., Davis, R. W., & Brown, P. O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270(5235), 467-470.
  • [2] Alizadeh, Ash & B Eisen, Michael & Davis, Richard & Ma, Chi & S Lossos, Izidore & Rosenwald, Andreas & C Boldrick, Jennifer & Sabet, Hajeer & Tran, Truc & Yu, Xin. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 403. 503-511.
  • [3] Hira, Z. M., & Gillies, D. F. (2015). A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Advances in Bioinformatics, 2015, 198363.
  • [4] Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Machine Learning Proceedings 1992 (pp. 249-256).
  • [5] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
  • [6] Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C., Ares, M., & Haussler, D. (1999). Support vector machine classification of microarray gene expression data. University of California, Santa Cruz, Technical Report UCSC-CRL-99-09.
  • [7] Rafii, F., Kbir, M. H. A., & Hassani, B. D. R. (2015, November). MLP network for lung cancer presence prediction based on microarray data. In Complex Systems (WCCS), 2015 Third World Conference on (pp. 1-6). IEEE.
  • [8] Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3.
  • [9] Drotár, P., Gazda, J., & Smékal, Z. (2015). An experimental comparison of feature selection methods on two-class biomedical datasets. Computers in biology and medicine, 66, 1-10.
  • [10] Gutkin, M., Shamir, R., & Dror, G. (2009). SlimPLS: a method for feature selection in gene expressionbased disease classification. PloS one, 4(7), e6416.
  • [11] Lippmann, R. (1987). An introduction to computing with neural nets. IEEE Assp magazine, 4(2), 4-22.
  • [12] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • [13] Vapnik, V. (2013). The nature of statistical learning theory. Springer Science & Business Media.
  • [14] Radmacher, M. D., McShane, L. M., & Simon, R. (2002). A paradigm for class prediction using gene expression profiles. Journal of Computational Biology, 9(3), 505-511.
  • [15] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).
  • [16] Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010 (pp. 177-186). Physica-Verlag HD.
  • [17] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [18] LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on (Vol. 2, pp. II-104). IEEE.
  • [19] Chin, K., DeVries, S., Fridlyand, J., Spellman, P. T., Roydasgupta, R., Kuo, W. L., ... & Chen, F. (2006). Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer cell, 10(6), 529-541.
  • [20] Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., ... & Mazumder, A. (2006). Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. The journal of molecular diagnostics, 8(1), 31-39.
  • [21] Gravier, E., Pierron, G., Vincent‐Salomon, A., Gruel, N., Raynal, V., Savignoni, A., ... & Fourquet, A. (2010). A prognostic DNA signature for T1T2 node‐negative breast cancer patients. Genes, chromosomes and cancer, 49(12), 1125-1134.
  • [22] Sørlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., ... & Thorsen, T. (2001). Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences, 98(19), 10869-10874.
  • [23] West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., ... & Nevins, J. R. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 98(20), 11462-11467.
  • [24] Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., ... & Allen, J. C. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870), 436.
  • [25] Burczynski, M. E., Peterson, R. L., Twine, N. C., Zuberek, K. A., Brodeur, B. J., Casciotti, L., ... & Spinelli, W. (2006). Molecular classification of Crohn's disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. The journal of molecular diagnostics, 8(1), 51-61.
  • [26] Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
  • [27] Sun, L., Hui, A. M., Su, Q., Vortmeyer, A., Kotliarov, Y., Pastorino, S., ... & Rosenblum, M. (2006). Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell, 9(4), 287-300.
  • [28] Borovecki, F., Lovrecic, L., Zhou, J., Jeong, H., Then, F., Rosas, H. D., ... & Krainc, D. (2005). Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proceedings of the National Academy of Sciences, 102(31), 11023-11028.
  • [29] Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F., ... & Foa, R. (2004). Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood, 103(7), 2771-2778.
  • [30] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., ... & Bloomfield, C. D. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science, 286(5439), 531-537.
  • [31] Yeoh, E. J., Ross, M. E., Shurtleff, S. A., Williams, W. K., Patel, D., Mahfouz, R., ... & Cheng, C. (2002). Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer cell, 1(2), 133-143.
  • [32] Gordon, G. J., Jensen, R. V., Hsiao, L. L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., ... & Bueno, R. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer research, 62(17), 4963-4967.
  • [33] Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C., ... & Ray, T. S. (2002). Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 8(1), 68.
  • [34] Tian, E., Zhan, F., Walker, R., Rasmussen, E., Ma, Y., Barlogie, B., & Shaughnessy Jr, J. D. (2003). The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. New England Journal of Medicine, 349(26), 2483-2494.
  • [35] Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., & Lander, E. S. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1(2), 203-209.
  • [36] Nakayama, R., Nemoto, T., Takahashi, H., Ohta, T., Kawai, A., Seki, K., & Hasegawa, T. (2007). Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma. Modern pathology, 20(7), 749.
  • [37] Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., & Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine, 7(6), 673.
  • [38] Christensen, B. C., Houseman, E. A., Marsit, C. J., Zheng, S., Wrensch, M. R., Wiemels, J. L., & Sugarbaker, D. J. (2009). Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS genetics, 5(8), e1000602.
  • [39] Su, A. I., Cooke, M. P., Ching, K. A., Hakak, Y., Walker, J. R., Wiltshire, T., & Patapoutian, A. (2002). Large-scale analysis of the human and mouse transcriptomes. Proceedings of the National Academy of Sciences, 99(7), 4465-4470.
  • [40] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., & Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545-15550.
  • [41] Arias-Michel, R., García-Torres, M., Schaerer, C. E., & Divina, F. (2015, September). Feature selection via approximated Markov blankets using the CFS method. In Data Mining with Industrial Applications (DMIA), 2015 International Workshop on (pp. 38-43). IEEE.
  • [42] Huertas, C., & Juarez-Ramirez, R. (2016). Automatic Threshold Search for Heat Map Based Feature Selection: A Cancer Dataset Analysis. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 10(7), 1341-1347.
  • [43] Huynh, P. H., Nguyen, V. H., & Do, T. N. (2018, December). Random ensemble oblique decision stumps for classifying gene expression data. In Proceedings of the Ninth International Symposium on Information and Communication Technology (pp. 137-144). ACM.
  • [44] Mundra, P. A., & Rajapakse, J. C. (2010). Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing, 73(13-15), 2353-2362.
  • [45] Le Thi, H. A., & Phan, D. N. (2017). DC programming and DCA for sparse Fisher linear discriminant analysis. Neural Computing and Applications, 28(9), 2809-2822.
Primary Language en
Subjects Engineering
Journal Section Articles
Authors

Orcid: 0000-0002-9501-2068
Author: Kıvanç GÜÇKIRAN (Primary Author)
Institution: Elektronik ve Haberleşme Mühendisliği Bölümü
Country: Turkey


Orcid: 0000-0003-0690-1873
Author: İsmail CANTÜRK
Institution: Elektronik ve Haberleşme Mühendisliği Bölümü
Country: Turkey


Orcid: 0000-0001-9720-9852
Author: Lale ÖZYILMAZ
Institution: Elektronik ve Haberleşme Mühendisliği Bölümü
Country: Turkey


Dates

Publication Date : April 1, 2019

Bibtex @research article { sdufenbed453462, journal = {Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi}, issn = {}, eissn = {1308-6529}, address = {}, publisher = {Süleyman Demirel University}, year = {2019}, volume = {23}, pages = {126 - 132}, doi = {10.19113/sdufenbed.453462}, title = {DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO}, key = {cite}, author = {Güçkıran, Kıvanç and Cantürk, İsmail and Özyılmaz, Lale} }
APA Güçkıran, K , Cantürk, İ , Özyılmaz, L . (2019). DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO . Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi , 23 (1) , 126-132 . DOI: 10.19113/sdufenbed.453462
MLA Güçkıran, K , Cantürk, İ , Özyılmaz, L . "DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO" . Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23 (2019 ): 126-132 <https://dergipark.org.tr/en/pub/sdufenbed/issue/39838/453462>
Chicago Güçkıran, K , Cantürk, İ , Özyılmaz, L . "DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO". Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23 (2019 ): 126-132
RIS TY - JOUR T1 - DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO AU - Kıvanç Güçkıran , İsmail Cantürk , Lale Özyılmaz Y1 - 2019 PY - 2019 N1 - doi: 10.19113/sdufenbed.453462 DO - 10.19113/sdufenbed.453462 T2 - Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi JF - Journal JO - JOR SP - 126 EP - 132 VL - 23 IS - 1 SN - -1308-6529 M3 - doi: 10.19113/sdufenbed.453462 UR - https://doi.org/10.19113/sdufenbed.453462 Y2 - 2019 ER -
EndNote %0 Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO %A Kıvanç Güçkıran , İsmail Cantürk , Lale Özyılmaz %T DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO %D 2019 %J Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi %P -1308-6529 %V 23 %N 1 %R doi: 10.19113/sdufenbed.453462 %U 10.19113/sdufenbed.453462
ISNAD Güçkıran, Kıvanç , Cantürk, İsmail , Özyılmaz, Lale . "DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO". Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23 / 1 (April 2019): 126-132 . https://doi.org/10.19113/sdufenbed.453462
AMA Güçkıran K , Cantürk İ , Özyılmaz L . DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO. SDÜ Fen Bil Enst Der. 2019; 23(1): 126-132.
Vancouver Güçkıran K , Cantürk İ , Özyılmaz L . DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2019; 23(1): 126-132.
IEEE K. Güçkıran , İ. Cantürk and L. Özyılmaz , "DNA Microarray Gene Expression Data Classification Using SVM, MLP, and RF with Feature Selection Methods Relief and LASSO", Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 23, no. 1, pp. 126-132, Apr. 2019, doi:10.19113/sdufenbed.453462