BibTex RIS Cite

Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı

Year 2015, Volume: 8 Issue: 1, 1 - 12, 24.06.2016

Abstract

Gen dizilimlerinin sınıflandırılması, hastalıkların ön görülebilmesi veya teşhis edilebilmesinde çok önemli rol oynamaktadır. Bütün gen dizilimi üzerinde etkili bir sınıflandırma yapabilmek mümkün olmadığından sağlıklı bir sınıflandırma yapılabilmesi için gerekli bilgiyi içeren genlerin (özelliklerin) özellik azaltma algoritmaları ile ayıklanması önem taşımaktadır. Bu çalışmada, özellikleri azaltmak için sezgisel arama teknikleri, özellik azaltma yaklaşımları(filter, wrapper, vb.) gibi farklı yöntemler analiz edilerek ön işleme adımının daha etkin bir şekilde gerçekleştirilmesi; bunun sonucunda elde edilen veri kümelerinin LR (Lojistik Regresyon) ve SVM (Destek Vektör Makineleri) gibi güçlü sınıflandırma araçları ile daha etkin şekilde sınıflandırılması hedeflenmiştir. Makine öğrenmesinde güçlü bir sınıflandırıcı olarak kabul edilen LR sınıflandırıcısı, özellik eksiltme yöntemleri ile gen dizilimlerinin sınıflandırılmasında SVM kadar geçerli ve etkin sınıflama aracı haline gelmiştir.

References

  • Ben-Dor, A., Shamir, R., Yakhini, Z., 1999, Clustering gene expression patterns ,J Comput Biol, 6(3): 281–97.
  • Roberts, C.J., Nelson, B., Marton, M.J., Stoughton, R., Meyer, M.R., Bennett, H.A., 2000, Signaling and circuitry ofmultiple Mapk pathways revealed by a matrix of global gene expression profiles, Science, 287: 873–80.
  • Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z., 2000, Tissue classification with geneexpression profiles, In: Proceedings of the Fourth International Conference on Computational Molecular Biology. Tokyo: Universal Academic Press.
  • Alizadeh, A., Eisen, M.B., Davis, R.E., Ma C Lossos, I.S., Rosenwald, A., 2000, Distinct types of diffuse largeB-cell lymphoma identified by gene expression profiling, Nature, 403: 503–11.
  • Wang, X., Gotoh, O., 2010, A robust gene selection method for microarray-basedcancer classification, Cancer Inf, 9:15–30.
  • Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J. S., 2005, Incremental wrapper-based gene selection from microarray datafor cancer classification, Pattern Recognition, 39: 2383 – 2392.
  • Langley, P., 1994, Selection of relevant features in machine learning, In: Proceedings of the AAAI Fall Symposium on Relevance.
  • Kohavi, R., John,G., 1997, Wrappers for feature subset selection, Artif. Intell. 1–2: 273–324.
  • Alter, O., Brown, P.O., Botstein, D., 2000, Singular value decomposition for genomewide expression data processing and modeling, Proc. Natl. Acad. Sci., 97(18).
  • Cangelosi, R., Goriely, A., 2007, Component retention in principal component analysiswith application to cdna microarray data, Biol. Direct, 2:1–21.
  • Liu, K., Li, B., Wu,Q.Q., Zhang, J. , Du, J.X., Liu,G.Y., 2009, Microarray data classification based on ensemble independent component selection, Comput. Biol. Med., 39(11): 953–960.
  • Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A., 2004, Filter versus wrappergene selection approaches in DNA microarray domains. Artif. Intell. Med., 31(2): 91–103.
  • Pohar, M., Blas, M., Turk, S., 2004, Comparison of Logistic Regression and Linear. Discriminant Analysis: A Simulation Study”, Metodološki zvezki, 1: 143-161.
  • Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., Mendonça, A., 2011, Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests, BMC Research Notes, 4:299.
  • Hall, M.A., Smith, L.A., 1997, Feature subset selection: A Correlation Based Filter Approach, In International Conference on Neural Information Processing and Intelligent Information Systems. Berlin: Springer, 855-858.
  • Jackson, J., 1991, A users guide to principal components, Wiley & Sons, New York.
  • Loh, W., 2006, Logistic regression tree analysis, In Springer Handbook of Engineering Statistics, 537-551.
  • Breiman, L., Friedman, H., Olshen, J., Stone, C., 1984, Classification and Regression Trees, Belmont, CA: Wadsworth.
  • Le Cessie, S., Van Houwellingen, J.C., 1992, Ridge Estimators in Logistic Regression, University of Leiden, the Netherlands. Appl. Statist., 41(1): 191-201.
  • Liu, D., Ghosh, D., lin, X., 2008, Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models, BMC Bioinformatics.
  • Bartenhagen, C.,Klein, H.U., Ruckert, C., Jiang, X., Dugas, M., 2010, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data, BMC Bioinformatics, 11(567).
  • Kim , K.J., Cho , S.B., 2006, Ensemble classifiers based on correlation analysisfor DNA microarray classification, Neurocomputing, 70:187-199.
  • Nguyen, D.V., Rocke, D.M., 2002, Tumor classification by partial leastsquares using microarray gene expression data, Bioinformatics, 18: 39–50.
  • Cortes, C., Vapnik, V., 1995, Support-Vector Networks, Machine Learning, 20: 273-297.
  • Smith, L.I., 2002, A tutorial on Principal Components Analysis.
  • Dagliyan, O., Uney-Yuksektepe, F., Kavakli, I.H, Turkay, M., 2011, Optimization Based Tumor Classification from Microarray Gene Expression Data.
  • Vimaladevi, M., Kalaavathi, B., 2014, Cancer Classification using Hybrid Fast ParticleSwarm Optimization with BackpropagationNeural Network, International Journal of Advanced Research in Computer and Communication Engineering, 3(11).
  • Paulya, F., Smedbyc, K.E., Jerkemand, M., Hjalgrime, H., Ohlssonf, M., Rosenquist, R., Borrebaecka, C.A.K., Wingrena, C., 2014, Identification of B-cell lymphoma subsets by plasma protein profilingusing recombinant antibody microarrays, Leukemia Research, 38: 682–690.
  • Yan, Z., Li, J.Xiong, Y., Xu, W., Zheng, G., 2012, Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data, Oncology Reports, 28: 1036-1042.
  • Thorsteinsson, M., Kirkeby, L.T., Hansen, R., Lund L.R., Sørensen L.T., Gerds, T.A., Jess, P., Olsen, J., 2012, Gene expression profiles in stages II and III colon cancers:application of a 128-gene signature, Int J Colorectal Dis, 27: 1579–1586.
  • Bennet, J., Ganaprakasam, C.A., Arputharaj, K., 2014, A Discrete Wavelet Based Feature Extraction and HybridClassification Technique for Microarray Data Analysis, Hindawi Publishing Corporation The Scientific World Journal.
  • www.biomedcentral.com/1471-2105/12/390 /#B12
  • www.cs.waikato.ac.nz/ml/weka/

The Success Of Logistic Regression With Feature Reduction Techniques On Microarray Gene Classification

Year 2015, Volume: 8 Issue: 1, 1 - 12, 24.06.2016

Abstract

DNA microarray classification is important to
discovery of differentially expressed genes between
normal and diseased patients are a central research
problem in bioinformatics. All the genes used in the
expression profile are not informative. Further, many
of them are redundant. A pre-processing step in order
to reduce the number of genes by feature selection
and still retaining best class prediction accuracy for the cla1
ssifier is crucial for precise tumor
classification. In this study comparison between class
prediction accuracy of two different classifiers, LR
(Logistic Regression) and SVM (Support Vector
Machines), was carried out using the best genes
select by wrapper and filter technique to use heuristic
search methods. We conclude that LR together with
heuristic search based feature selection is the as
efficient as SVM to the microarray gene prediction
techniques.

References

  • Ben-Dor, A., Shamir, R., Yakhini, Z., 1999, Clustering gene expression patterns ,J Comput Biol, 6(3): 281–97.
  • Roberts, C.J., Nelson, B., Marton, M.J., Stoughton, R., Meyer, M.R., Bennett, H.A., 2000, Signaling and circuitry ofmultiple Mapk pathways revealed by a matrix of global gene expression profiles, Science, 287: 873–80.
  • Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z., 2000, Tissue classification with geneexpression profiles, In: Proceedings of the Fourth International Conference on Computational Molecular Biology. Tokyo: Universal Academic Press.
  • Alizadeh, A., Eisen, M.B., Davis, R.E., Ma C Lossos, I.S., Rosenwald, A., 2000, Distinct types of diffuse largeB-cell lymphoma identified by gene expression profiling, Nature, 403: 503–11.
  • Wang, X., Gotoh, O., 2010, A robust gene selection method for microarray-basedcancer classification, Cancer Inf, 9:15–30.
  • Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J. S., 2005, Incremental wrapper-based gene selection from microarray datafor cancer classification, Pattern Recognition, 39: 2383 – 2392.
  • Langley, P., 1994, Selection of relevant features in machine learning, In: Proceedings of the AAAI Fall Symposium on Relevance.
  • Kohavi, R., John,G., 1997, Wrappers for feature subset selection, Artif. Intell. 1–2: 273–324.
  • Alter, O., Brown, P.O., Botstein, D., 2000, Singular value decomposition for genomewide expression data processing and modeling, Proc. Natl. Acad. Sci., 97(18).
  • Cangelosi, R., Goriely, A., 2007, Component retention in principal component analysiswith application to cdna microarray data, Biol. Direct, 2:1–21.
  • Liu, K., Li, B., Wu,Q.Q., Zhang, J. , Du, J.X., Liu,G.Y., 2009, Microarray data classification based on ensemble independent component selection, Comput. Biol. Med., 39(11): 953–960.
  • Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A., 2004, Filter versus wrappergene selection approaches in DNA microarray domains. Artif. Intell. Med., 31(2): 91–103.
  • Pohar, M., Blas, M., Turk, S., 2004, Comparison of Logistic Regression and Linear. Discriminant Analysis: A Simulation Study”, Metodološki zvezki, 1: 143-161.
  • Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., Mendonça, A., 2011, Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests, BMC Research Notes, 4:299.
  • Hall, M.A., Smith, L.A., 1997, Feature subset selection: A Correlation Based Filter Approach, In International Conference on Neural Information Processing and Intelligent Information Systems. Berlin: Springer, 855-858.
  • Jackson, J., 1991, A users guide to principal components, Wiley & Sons, New York.
  • Loh, W., 2006, Logistic regression tree analysis, In Springer Handbook of Engineering Statistics, 537-551.
  • Breiman, L., Friedman, H., Olshen, J., Stone, C., 1984, Classification and Regression Trees, Belmont, CA: Wadsworth.
  • Le Cessie, S., Van Houwellingen, J.C., 1992, Ridge Estimators in Logistic Regression, University of Leiden, the Netherlands. Appl. Statist., 41(1): 191-201.
  • Liu, D., Ghosh, D., lin, X., 2008, Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models, BMC Bioinformatics.
  • Bartenhagen, C.,Klein, H.U., Ruckert, C., Jiang, X., Dugas, M., 2010, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data, BMC Bioinformatics, 11(567).
  • Kim , K.J., Cho , S.B., 2006, Ensemble classifiers based on correlation analysisfor DNA microarray classification, Neurocomputing, 70:187-199.
  • Nguyen, D.V., Rocke, D.M., 2002, Tumor classification by partial leastsquares using microarray gene expression data, Bioinformatics, 18: 39–50.
  • Cortes, C., Vapnik, V., 1995, Support-Vector Networks, Machine Learning, 20: 273-297.
  • Smith, L.I., 2002, A tutorial on Principal Components Analysis.
  • Dagliyan, O., Uney-Yuksektepe, F., Kavakli, I.H, Turkay, M., 2011, Optimization Based Tumor Classification from Microarray Gene Expression Data.
  • Vimaladevi, M., Kalaavathi, B., 2014, Cancer Classification using Hybrid Fast ParticleSwarm Optimization with BackpropagationNeural Network, International Journal of Advanced Research in Computer and Communication Engineering, 3(11).
  • Paulya, F., Smedbyc, K.E., Jerkemand, M., Hjalgrime, H., Ohlssonf, M., Rosenquist, R., Borrebaecka, C.A.K., Wingrena, C., 2014, Identification of B-cell lymphoma subsets by plasma protein profilingusing recombinant antibody microarrays, Leukemia Research, 38: 682–690.
  • Yan, Z., Li, J.Xiong, Y., Xu, W., Zheng, G., 2012, Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data, Oncology Reports, 28: 1036-1042.
  • Thorsteinsson, M., Kirkeby, L.T., Hansen, R., Lund L.R., Sørensen L.T., Gerds, T.A., Jess, P., Olsen, J., 2012, Gene expression profiles in stages II and III colon cancers:application of a 128-gene signature, Int J Colorectal Dis, 27: 1579–1586.
  • Bennet, J., Ganaprakasam, C.A., Arputharaj, K., 2014, A Discrete Wavelet Based Feature Extraction and HybridClassification Technique for Microarray Data Analysis, Hindawi Publishing Corporation The Scientific World Journal.
  • www.biomedcentral.com/1471-2105/12/390 /#B12
  • www.cs.waikato.ac.nz/ml/weka/
There are 33 citations in total.

Details

Other ID JA37NE27NJ
Journal Section Makaleler(Araştırma)
Authors

Yeliz Yengi This is me

Sevinç İlhan Omurca This is me

Publication Date June 24, 2016
Published in Issue Year 2015 Volume: 8 Issue: 1

Cite

APA Yengi, Y., & İlhan Omurca, S. (2016). Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, 8(1), 1-12.
AMA Yengi Y, İlhan Omurca S. Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı. TBV-BBMD. June 2016;8(1):1-12.
Chicago Yengi, Yeliz, and Sevinç İlhan Omurca. “Lojistik Regresyonun Özellik Azaltma Teknikleri Ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi 8, no. 1 (June 2016): 1-12.
EndNote Yengi Y, İlhan Omurca S (June 1, 2016) Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 8 1 1–12.
IEEE Y. Yengi and S. İlhan Omurca, “Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı”, TBV-BBMD, vol. 8, no. 1, pp. 1–12, 2016.
ISNAD Yengi, Yeliz - İlhan Omurca, Sevinç. “Lojistik Regresyonun Özellik Azaltma Teknikleri Ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 8/1 (June 2016), 1-12.
JAMA Yengi Y, İlhan Omurca S. Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı. TBV-BBMD. 2016;8:1–12.
MLA Yengi, Yeliz and Sevinç İlhan Omurca. “Lojistik Regresyonun Özellik Azaltma Teknikleri Ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, vol. 8, no. 1, 2016, pp. 1-12.
Vancouver Yengi Y, İlhan Omurca S. Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı. TBV-BBMD. 2016;8(1):1-12.

Article Acceptance

Use user registration/login to upload articles online.

The acceptance process of the articles sent to the journal consists of the following stages:

1. Each submitted article is sent to at least two referees at the first stage.

2. Referee appointments are made by the journal editors. There are approximately 200 referees in the referee pool of the journal and these referees are classified according to their areas of interest. Each referee is sent an article on the subject he is interested in. The selection of the arbitrator is done in a way that does not cause any conflict of interest.

3. In the articles sent to the referees, the names of the authors are closed.

4. Referees are explained how to evaluate an article and are asked to fill in the evaluation form shown below.

5. The articles in which two referees give positive opinion are subjected to similarity review by the editors. The similarity in the articles is expected to be less than 25%.

6. A paper that has passed all stages is reviewed by the editor in terms of language and presentation, and necessary corrections and improvements are made. If necessary, the authors are notified of the situation.

0

.   This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.