Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması

Fırat İsmailoğlu

doi:10.29130/dubited.886353

Araştırma Makalesi

Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması

Yıl 2021, Cilt: 9 Sayı: 5, 1739 - 1753, 31.10.2021

Fırat İsmailoğlu

https://doi.org/10.29130/dubited.886353

Öz

Kanserli dokuların heterojen doğası gereği birçok kanserin alt türü vardır, ve bu alt türler tespit edilmedikçe kanser tedavisi hedefi bulamaz. Mikrodizi gen teknolojisi ve veri teknolojisinin gelişmesiyle beraber, son yıllarda kanserli dokulara ait mikro dizi gen ifadesi verilerini kullanarak makine öğrenmesi yardımıyla kanserlerin alt türünü tespit etmek yaygınlaşmıştır. Fakat burada asıl problem, veri setinde her bir gene bir özniteliğin karşılık gelmesi, bu yüzden yüksek boyut probleminin ortaya çıkmasıdır. Bu çalışmada üç farklı metrik öğrenmesi metodu (LMNN, ITML ve NCA) ayrı ayrı kullanılarak çeşitli kanser türlerine ait mikro dizi gen veri setleri boyutu azaltılmış uzaylara transfer edilmiştir. Bu sayede, PCA gibi klasik boyut azaltma yöntemlerinden farklı olarak boyutu azaltılmış uzayda, aynı sınıfa (kanser alt türüne) ait örnekleri birbirine yaklaştırılırken, farklı sınıflara ait örnekleri birbirinden uzaklaştırılmıştır. t-SNE metodu yardımıyla azaltılmış boyutlu uzaylar görüntülenerek sınıfların birbirinden ayrıştığı teyit edilmiştir. İlaveten, bu yeni uzaylarda sınıflama algoritmalarının daha performanslı çalıştığını göstermek amacıyla, k-NN, en yakın merkez ve LVQ gibi örnek temelli (instance-based) sınıflama algoritmaları çalıştırılmış ve bu algoritmaların kanser türlerini tespit etmede orjinal uzaydaki performanslarına göre yaklaşık %30'a kadar performanslarının arttığı gözlemlenmiştir.

Anahtar Kelimeler

Mikroarray Gen Verisi, Kanser Sınıflandırma, Metrik Öğrenmesi, Örnek Temelli Sınıflama

Kaynakça

[1] H. Salem, H. G. Attiya and N. El-Fishawy, “Classification of human cancer diseases by gene expression profiles,” Applied Soft Computing, vol. 50, pp. 124–134, 2017.
[2] A. K. Dwivedi, “Artificial neural network model for effective cancer classification using microarray gene expression data,” Neural Computing And Applications, vol. 29, no. 12, pp. 1545–1554, 2018.
[3] M. Dashtban and M. Balafar, “Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts,” Genomics, vol. 109, no. 2, pp. 91–107, 2017.
[4] N. Almugren and H. Alshamlan, “A survey on hybrid feature selection methods in microarray gene expression data for cancer classification,” IEEE Access, vol. 7, pp. 78533–78548, 2019.
[5] Z. M. Hira and D.F. Gillies, “A review of feature selection and feature extraction methods applied on microarray data,” Advances In Bioinformatics, vol. 1, no. 198363, 2015.
[6] B. Kulis, “Metric learning: A survey,” Foundations and trends in machine learning, vol. 5, no. 4, pp. 287–364, 2012.
[7] S. B. Cho and H. H. Won, “Machine learning in DNA microarray analysis for cancer classification,” in Proceedings of the First Asia-Pacific Bioinformatics Conference on Bioinformatics, Adelaide, Australia, 2003, vol. 19, pp. 189–198.
[8] S. Kılıçarslan, K. Adem ve O. Cömert, “Parçacık sürü optimizasyonu kullanılarak boyutu azaltılmış mikrodizi verileri üzerinde makine öğrenmesi yöntemleri ile prostat kanseri teşhisi,” Düzce Üniversitesi Bilim ve Teknoloji Dergisi, c. 7, s. 1, ss. 769–777, 2019.
[9] B. Haznedar, M. T. Arslan ve A. Kalınlı, “Karaciğer mikroarray kanser verisinin sınıflandırılması için genetik algoritma kullanarak ANFIS’in eğitilmesi,” Sakarya University Journal of Science, c. 21, s. 1, ss. 54–62, 2017.
[10] F. Morais-Rodrigues, R. Silv́erio-Machado, R. B. Kato and D. L. N. Rodrigues, “Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression,” Gene, vol. 726, pp. 144–168, 2020.
[11] O. Yıldız, M. Tez, H. Ş. Bilge, M.A.Akcayol ve İ. Güler, “Meme kanseri sınıflandırması için gen seçimi,” IEEE 20. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, İstanbul, Türkiye, 2012, ss. 18–20.
[12] R. Ruiz, J. C. Riquelme and J. S. Aguilar-Ruiz, “Incremental wrapper-based gene selection from microarray data for cancer classification,” Pattern Recognition, vol. 39, no. 12, pp. 2383–2392, 2006.
[13] K. Wagstaff, C. Cardie, S. Rogers and S. Schroedl, “Constrained k-means clustering with background knowledge,” in Proceedings of the 18th International Conference on Machine Learning, Florida, USA, 2001, vol. 1, pp. 577–584.
[14] W. De Vazelhes, C. J. Carey, Y. Tang, N. Vauquier and A. Bellet, “Metric-learn: metric learning algorithms in Python,” Journal of Machine Learning Research, vol. 21, no. 138, pp. 1–6, 2020.
[15] F. Wang and J. Sun, “Survey on distance metric learning and dimensionality reduction in data mining,” Data Mining and Knowledge Discovery, vol. 29, no. 2, pp. 534–564, 2015.
[16] K. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” Journal of Machine Learning Research, vol. 10, no. 2, 2009.
[17] J. Goldberger, S. Roweis, G. Hinton and R. Salakhutdinov, “Neighbourhood components analysis,” Advances in Neural Information Processing Systems, vol. 17, pp. 513–520, 2004.
[18]. J. V. Davis, B. Kulis, B. P. Jain, S. Sra and I. S. Dhillon, “Information-theoretic metric learning,” in Proceedings of the 24th International Conference on Machine Learning, New York, USA, 2007, pp. 209–216.
[19] E. Gravier, G. Pierron, A. Vincent-Salomon, A. Gruel, N. Raynal, V. Savignoni and A. Fourquet, “A prognostic DNA signature for T1T2 nodenegative breast cancer patients,” Genes, Chromosomes and Cancer, vol. 49, no. 12, pp. 1125–1134, 2010.
[20] T. Sørlie, C. M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen and A. L. Børresen-Dale, “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” National Academy of Sciences, vol. 98, no. 19, pp. 0869–10874, 2001.
[21] S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, M. Angelo, and M. E. McLaughlin, “Prediction of central nervous system embryonal tumour outcome based on gene expression,” Nature, vol. 415, no. 6870, pp. 436–442, 2002.
[22] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov and E.S. Lander, “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–537, 1999.
[23] M.A. Shipp, K.N. Ross, P. Tamayo, A. P. Weng, J. L. Kutok and R.C. Aguiar, “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nature Medicine, vol. 8, no. 1, pp. 68–74, 2002.
[24] J. Khan, J. S. Wei, M. Ringner, L. H.Saal, M. Ladanyi, F. Westermann and P. S. Meltzer, “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Medicine, vol.7, no. 6, pp. 673–679, 2001.
[25] L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no.11, 2008.
[26] T. Kohonen, Self-Organizing Maps, 1st ed., Berlin, Germany: Springer, 1995, pp. 245–26.

Classifying Microarray Gene Data of Various Cancerous Tissues Using Metric Learning

Yıl 2021, Cilt: 9 Sayı: 5, 1739 - 1753, 31.10.2021

Fırat İsmailoğlu

https://doi.org/10.29130/dubited.886353

Öz

Due to the heterogeneous structure of cancerous tissues, they have several subclasses. Unless the subclasses are detected, the cancer treatment cannot be carried out accurately. With the advent of microarray gene technology and data science technology, employing machine learning methods that use the microarray gene expression data of the cancerous tissues for classifying the cancer subclasses has gained an increasing popularity. However, as there exists one feature for each gene, the issue of the curse of dimensionality arises. In the present study, the microarray gene expression data of various cancer types were transferred to some dimensionality reduced spaces by the means of three metric learning methods: LMNN, ITML and NCA. As a result, the instances of the same classes come closer in the reduced space; while those from different classes locate far from each other, which is different from the conventional dimensionality reduction methods, such as PCA, do. To verify this, dimensionality reduced spaces created by the t-SNE method were monitored. Additionally, to show that the classification algorithms perform better in such new spaces, instance-based classifiers, e.g. k-NN, the nearest mean classifier and the LVQ, were built and then it was observed that the performances of the classifiers increased up to 30% in comparison with their performances in the original space.

Anahtar Kelimeler

Cancer Classification, Metric Learning, Microarray Gene Expressions, Instance-based Classification

Kaynakça

[1] H. Salem, H. G. Attiya and N. El-Fishawy, “Classification of human cancer diseases by gene expression profiles,” Applied Soft Computing, vol. 50, pp. 124–134, 2017.
[2] A. K. Dwivedi, “Artificial neural network model for effective cancer classification using microarray gene expression data,” Neural Computing And Applications, vol. 29, no. 12, pp. 1545–1554, 2018.
[3] M. Dashtban and M. Balafar, “Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts,” Genomics, vol. 109, no. 2, pp. 91–107, 2017.
[4] N. Almugren and H. Alshamlan, “A survey on hybrid feature selection methods in microarray gene expression data for cancer classification,” IEEE Access, vol. 7, pp. 78533–78548, 2019.
[5] Z. M. Hira and D.F. Gillies, “A review of feature selection and feature extraction methods applied on microarray data,” Advances In Bioinformatics, vol. 1, no. 198363, 2015.
[6] B. Kulis, “Metric learning: A survey,” Foundations and trends in machine learning, vol. 5, no. 4, pp. 287–364, 2012.
[7] S. B. Cho and H. H. Won, “Machine learning in DNA microarray analysis for cancer classification,” in Proceedings of the First Asia-Pacific Bioinformatics Conference on Bioinformatics, Adelaide, Australia, 2003, vol. 19, pp. 189–198.
[8] S. Kılıçarslan, K. Adem ve O. Cömert, “Parçacık sürü optimizasyonu kullanılarak boyutu azaltılmış mikrodizi verileri üzerinde makine öğrenmesi yöntemleri ile prostat kanseri teşhisi,” Düzce Üniversitesi Bilim ve Teknoloji Dergisi, c. 7, s. 1, ss. 769–777, 2019.
[9] B. Haznedar, M. T. Arslan ve A. Kalınlı, “Karaciğer mikroarray kanser verisinin sınıflandırılması için genetik algoritma kullanarak ANFIS’in eğitilmesi,” Sakarya University Journal of Science, c. 21, s. 1, ss. 54–62, 2017.
[10] F. Morais-Rodrigues, R. Silv́erio-Machado, R. B. Kato and D. L. N. Rodrigues, “Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression,” Gene, vol. 726, pp. 144–168, 2020.
[11] O. Yıldız, M. Tez, H. Ş. Bilge, M.A.Akcayol ve İ. Güler, “Meme kanseri sınıflandırması için gen seçimi,” IEEE 20. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, İstanbul, Türkiye, 2012, ss. 18–20.
[12] R. Ruiz, J. C. Riquelme and J. S. Aguilar-Ruiz, “Incremental wrapper-based gene selection from microarray data for cancer classification,” Pattern Recognition, vol. 39, no. 12, pp. 2383–2392, 2006.
[13] K. Wagstaff, C. Cardie, S. Rogers and S. Schroedl, “Constrained k-means clustering with background knowledge,” in Proceedings of the 18th International Conference on Machine Learning, Florida, USA, 2001, vol. 1, pp. 577–584.
[14] W. De Vazelhes, C. J. Carey, Y. Tang, N. Vauquier and A. Bellet, “Metric-learn: metric learning algorithms in Python,” Journal of Machine Learning Research, vol. 21, no. 138, pp. 1–6, 2020.
[15] F. Wang and J. Sun, “Survey on distance metric learning and dimensionality reduction in data mining,” Data Mining and Knowledge Discovery, vol. 29, no. 2, pp. 534–564, 2015.
[16] K. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” Journal of Machine Learning Research, vol. 10, no. 2, 2009.
[17] J. Goldberger, S. Roweis, G. Hinton and R. Salakhutdinov, “Neighbourhood components analysis,” Advances in Neural Information Processing Systems, vol. 17, pp. 513–520, 2004.
[18]. J. V. Davis, B. Kulis, B. P. Jain, S. Sra and I. S. Dhillon, “Information-theoretic metric learning,” in Proceedings of the 24th International Conference on Machine Learning, New York, USA, 2007, pp. 209–216.
[19] E. Gravier, G. Pierron, A. Vincent-Salomon, A. Gruel, N. Raynal, V. Savignoni and A. Fourquet, “A prognostic DNA signature for T1T2 nodenegative breast cancer patients,” Genes, Chromosomes and Cancer, vol. 49, no. 12, pp. 1125–1134, 2010.
[20] T. Sørlie, C. M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen and A. L. Børresen-Dale, “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” National Academy of Sciences, vol. 98, no. 19, pp. 0869–10874, 2001.
[21] S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, M. Angelo, and M. E. McLaughlin, “Prediction of central nervous system embryonal tumour outcome based on gene expression,” Nature, vol. 415, no. 6870, pp. 436–442, 2002.
[22] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov and E.S. Lander, “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–537, 1999.
[23] M.A. Shipp, K.N. Ross, P. Tamayo, A. P. Weng, J. L. Kutok and R.C. Aguiar, “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nature Medicine, vol. 8, no. 1, pp. 68–74, 2002.
[24] J. Khan, J. S. Wei, M. Ringner, L. H.Saal, M. Ladanyi, F. Westermann and P. S. Meltzer, “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Medicine, vol.7, no. 6, pp. 673–679, 2001.
[25] L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no.11, 2008.
[26] T. Kohonen, Self-Organizing Maps, 1st ed., Berlin, Germany: Springer, 1995, pp. 245–26.

Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Fırat İsmailoğlu 0000-0002-6680-7291
Yayımlanma Tarihi	31 Ekim 2021
Yayımlandığı Sayı	Yıl 2021 Cilt: 9 Sayı: 5

Kaynak Göster

APA	İsmailoğlu, F. (2021). Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması. Duzce University Journal of Science and Technology, 9(5), 1739-1753. https://doi.org/10.29130/dubited.886353
AMA	İsmailoğlu F. Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması. DÜBİTED. Ekim 2021;9(5):1739-1753. doi:10.29130/dubited.886353
Chicago	İsmailoğlu, Fırat. “Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması”. Duzce University Journal of Science and Technology 9, sy. 5 (Ekim 2021): 1739-53. https://doi.org/10.29130/dubited.886353.
EndNote	İsmailoğlu F (01 Ekim 2021) Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması. Duzce University Journal of Science and Technology 9 5 1739–1753.
IEEE	F. İsmailoğlu, “Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması”, DÜBİTED, c. 9, sy. 5, ss. 1739–1753, 2021, doi: 10.29130/dubited.886353.
ISNAD	İsmailoğlu, Fırat. “Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması”. Duzce University Journal of Science and Technology 9/5 (Ekim 2021), 1739-1753. https://doi.org/10.29130/dubited.886353.
JAMA	İsmailoğlu F. Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması. DÜBİTED. 2021;9:1739–1753.
MLA	İsmailoğlu, Fırat. “Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması”. Duzce University Journal of Science and Technology, c. 9, sy. 5, 2021, ss. 1739-53, doi:10.29130/dubited.886353.
Vancouver	İsmailoğlu F. Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması. DÜBİTED. 2021;9(5):1739-53.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin