A Weakly Supervised Clustering Method for Cancer Subgroup Identification

Duygu Ozcelik; Öznur Taştan

doi:10.17694/bajece.1033807

EN

A Weakly Supervised Clustering Method for Cancer Subgroup Identification

Öz

Identifying subgroups of cancer patients is important as it opens up possibilities for targeted therapeutics. A widely applied approach is to group patients with unsupervised clustering techniques based on molecular data of tumor samples. The patient clusters are found to be of interest if they can be associated with a clinical outcome variable such as the survival of patients. However, these clinical variables of interest do not participate in the clustering decisions. We propose an approach, WSURFC (Weakly Supervised Random Forest Clustering), where the clustering process is weakly supervised with a clinical variable of interest. The supervision step is handled by learning a similarity metric with features that are selected to predict this clinical variable. More specifically, WSURFC involves a random forest classifier-training step to predict the clinical variable, in this case, the survival class. Subsequently, the internal nodes are used to derive a random forest similarity metric among the pairs of samples. In this way, the clustering step utilizes the nonlinear subspace of the original features learned in the classification step. We first demonstrate WSURFC on hand-written digit datasets, where WSURFC is able to capture salient structural similarities of digit pairs. Next, we apply WSURFC to find breast cancer subtypes using mRNA, protein, and microRNA expressions as features. Our results on breast cancer show that WSURFC could identify interesting patient subgroups more effectively than the widely adopted methods.

Anahtar Kelimeler

Kaynakça

[1] L. Hood and S. H. Friend, “Predictive, personalized, preventive, participatory (p4) cancer medicine,” Nature reviews Clinical oncology, vol. 8, no. 3, p. 184, 2011.
[2] I. Dagogo-Jack and A. T. Shaw, “Tumour heterogeneity and resistance to cancer therapies,” Nature reviews Clinical oncology, vol. 15, no. 2, pp. 81–94, 2018.
[3] D. Koboldt, R. Fulton, M. McLellan, H. Schmidt, J. Kalicki-Veizer, J. McMichael, L. Fulton, D. Dooling, L. Ding, E. Mardis et al., “Comprehensive molecular portraits of human breast tumours,” Nature, vol. 490, no. 7418, pp. 61–70, 2012.
[4] P. S. B. Joel S. Parker, “Supervised risk predictor of breast cancer based on intrinsic subtypes,” Journal of Clinical Oncology, vol. 27, no. 8, p.1 160–1167, 2009.
[5] R. G. Verhaak, K. A. Hoadley, E. Purdom, V. Wang, Y. Qi, M. D. Wilkerson, C. R. Miller, L. Ding, T. Golub, J. P. Mesirov et al., “Integrated genomic analysis identiﬁes clinically relevant subtypes of glioblastoma characterized by abnormalities in pdgfra, idh1, egfr, and nf1,” Cancer cell, vol. 17, no. 1, pp. 98–110, 2010.
[6] The Cancer Genome Atlas Network, “Comprehensive molecular portraits of human breast tumours,” Nature, vol. 490, pp. 61–70, 2012.
[7] A. Ally, M. Balasundaram, R. Carlsen, E. Chuah, A. Clarke, N. Dhalla, R. A. Holt, S. J. Jones, D. Lee, Y. Ma et al., “Comprehensive and integrative genomic characterization of hepatocellular carcinoma,” Cell, vol. 169, no. 7, pp.1327–1341, 2017.
[8] The Cancer Genome Atlas Network, “Integrated genomic analyses of ovarian carcinoma,” Nature, vol. 474, pp. 609–615, 2011.

[9] K. A. Hoadley, C. Yau, D. M. Wolf, A. D. Cherniack, D. Tamborero, S. Ng, M. D. Leiserson, B. Niu, M. D. McLellan, V.Uzunangelov et al., “Multiplatform analysis of 12 cancer types reveals molecular classiﬁcation within and across tissues of origin,” Cell, vol. 158, no. 4, pp. 929–944, 2014.
[10] C. J. Vaske, S. C. Benz, J. Z. Sanborn, D. Earl, C. Szeto, J. Zhu, D. Haussler, and J. M. Stuart, “Inference of patient-speciﬁc pathway activities from multi-dimensional cancer genomics data using paradigm,” Bioinformatics, vol. 26, no. 12, pp. i237–i245, 2010.
[11] R. Shen, A. B. Olshen, and M. Ladanyi, “Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis,” Bioinformatics, vol. 25, no. 22, pp. 2906–2912, 2009.
[12] E. Bair and R. Tibshirani, “Semi-supervised methods to predict patient survival from gene expression data,” PLOS Biology, vol. 2, no. 4, 2004.
[13] D. C. Koestler et. al., “Semi-supervised recursively partitioned mixture models for identifying cancer subtypes,” Bioinformatics, vol. 26, no. 20, pp. 2578–85, 2010.
[14] E. A. Houseman et al., “Model-based clustering of dna methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions,” BMC Bioinformatics, vol. 9, p. 365,2008.
[15] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001
[16] Y. Lecun and C. Cortes, “The MNIST database of handwritten digits.” [Online]. Available: http://yann.lecun.com/exdb/mnist/
[17] National Cancer Institute. (2011) The cancer genome atlas. [Online]. Available: http://cancergenome.nih.gov/
[18] M. Hofree, J. P. Shen, H. Carter, A. Gross, and T. Ideker, “Network-based stratification of tumor mutations,” Nature methods, vol. 10, no. 11, pp. 1108–1115, 2013.
[19] N. K. Speicher and N. Pfeifer, “Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery,” Bioinformatics, vol. 31, no. 12, pp. i268–i275, 2015.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapay Zeka

Bölüm

Araştırma Makalesi

Yazarlar

Duygu Ozcelik Bu kişi benim
0000-0001-8980-6200
Türkiye

Öznur Taştan ^*
0000-0001-7058-5372
Türkiye

Yayımlanma Tarihi

30 Nisan 2022

Gönderilme Tarihi

7 Aralık 2021

Kabul Tarihi

5 Mart 2022

Yayımlandığı Sayı

Yıl 2022 Cilt: 10 Sayı: 2

DOI

https://doi.org/10.17694/bajece.1033807

IZ

https://izlik.org/JA24WF55AN

Kaynak Göster

RIS / Bibtex

APA

Ozcelik, D., & Taştan, Ö. (2022). A Weakly Supervised Clustering Method for Cancer Subgroup Identification. Balkan Journal of Electrical and Computer Engineering, 10(2), 178-186. https://doi.org/10.17694/bajece.1033807

AMA

1.Ozcelik D, Taştan Ö. A Weakly Supervised Clustering Method for Cancer Subgroup Identification. Balkan Journal of Electrical and Computer Engineering. 2022;10(2):178-186. doi:10.17694/bajece.1033807

Chicago

Ozcelik, Duygu, ve Öznur Taştan. 2022. “A Weakly Supervised Clustering Method for Cancer Subgroup Identification”. Balkan Journal of Electrical and Computer Engineering 10 (2): 178-86. https://doi.org/10.17694/bajece.1033807.

EndNote

Ozcelik D, Taştan Ö (01 Nisan 2022) A Weakly Supervised Clustering Method for Cancer Subgroup Identification. Balkan Journal of Electrical and Computer Engineering 10 2 178–186.

IEEE

[1]D. Ozcelik ve Ö. Taştan, “A Weakly Supervised Clustering Method for Cancer Subgroup Identification”, Balkan Journal of Electrical and Computer Engineering, c. 10, sy 2, ss. 178–186, Nis. 2022, doi: 10.17694/bajece.1033807.

ISNAD

Ozcelik, Duygu - Taştan, Öznur. “A Weakly Supervised Clustering Method for Cancer Subgroup Identification”. Balkan Journal of Electrical and Computer Engineering 10/2 (01 Nisan 2022): 178-186. https://doi.org/10.17694/bajece.1033807.

JAMA

1.Ozcelik D, Taştan Ö. A Weakly Supervised Clustering Method for Cancer Subgroup Identification. Balkan Journal of Electrical and Computer Engineering. 2022;10:178–186.

MLA

Ozcelik, Duygu, ve Öznur Taştan. “A Weakly Supervised Clustering Method for Cancer Subgroup Identification”. Balkan Journal of Electrical and Computer Engineering, c. 10, sy 2, Nisan 2022, ss. 178-86, doi:10.17694/bajece.1033807.

Vancouver

1.Duygu Ozcelik, Öznur Taştan. A Weakly Supervised Clustering Method for Cancer Subgroup Identification. Balkan Journal of Electrical and Computer Engineering. 01 Nisan 2022;10(2):178-86. doi:10.17694/bajece.1033807