Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım

Handan Ankaralı; Süleyman Yıldırım; Nurgül Bulut

Research Article

Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım

Year 2021, Volume: 4 Issue: 1, 23 - 30, 15.01.2021

Handan Ankaralı , Süleyman Yıldırım Nurgül Bulut

Abstract

İnsan derisi, nazofaringeal ve ağız boşlukları, vajinal sistem ve gastrointestinal sistem ile ilişkili mikroorganizmalar insan mikrobiyotasını oluşturur. Fizyolojik, metabolik ve immun sistem üzerinde oldukça etkilidir ve birçok hastalık ile ilişkisi gösterilmiştir. DNA dizileme teknolojisindeki son gelişmeler, bakteriler için 16S rRNA, 18s rRNA veya ITS gibi marker genlerinin amplikonlarının yüksek verim dizilimi yoluyla, mikrobiyal toplulukların profillenmesi kolaylaşmıştır. Elde edilen veriler, çok büyük sayılarda mikrobiyota türlerine ait frekans değerlerinden oluşur ve bol miktarda sıfır değeri içerir. Mikrobiyota verileri gibi büyük boyutlu verilerin çeşitli istatistik modellerle analiz edilebilmesi için ön işleme aşamasında, sonuca anlamlı katkısı bulunmayan türlerin veri analizinden çıkarılması gerekmektedir. İstatistik literatüründe bu işlem, boyut indirgeme veya değişken eleme olarak adlandırılmaktadır.
Bu çalışmada, çok sayıda sıfır değeri içeren frekans tipi büyük boyutlu veri setlerinde, boyut indirgeme amacıyla kullanılabilecek yeni bir yaklaşım önerildi. Bu amaçla, tek değişkenli testler, sıfır etkili negatif binomiyal model, sınıflama ve regresyon ağaçları ve değişken seçimi algoritması kullanıldı.
Önerilen yaklaşım, Parkinson hastaları, erken demans ve kontrol bireylerinden elde edilen mikrobiyota cinsleri üzerinde denendi. Değişken seçimi sonucunda 199 bakteri cinsi içinden seçilen 19 adet aday cinsin, klinik açıdan da birçok çalışmada vurgulanan bakteri cinsleri olduğu görüldü. Aday olarak seçilen cinslerin hastalık tanısındaki başarısını değerlendirmek için kurulan multiple logistic regresyon modelinde yeniden stepwise değişken eleme yöntemi kullanıldı ve bu model sonucunda birkaç bakteri cinsi ile başarılı bir şekilde hasta ve kontrol gruplarının ayrımı yapıldı.
Bu çalışma ile önerilen yeni hibrit yaklaşım, birden çok yöntemin ortak kararı neticesinde belirlenen değişkenleri veri analizine alma imkanı sunmaktadır. Benzeri yaklaşımlar farklı yöntemlerle denenerek farklı veri tipleri üzerinde kullanılabilir.

Keywords

Sıfır etkili modeller , Frekans verisi , Sınıflama ve Regresyon ağaçları , Değişken seçim algoritmaları , Mikrobiyota , Parkinson hastalığı

References

Altuntaş Y, Batman A. “Mikrobiyota ve metabolik sendrom”. Turk Kardiyol Dern Ars , 45(3), 286–296, 2017.
Chen WP, Chang SH, Tang CY, Liou ML, Tsai SJ, Lin YL. “Composition analysis and feature selection of the oral microbiota associated with periodontal disease”. Biomed Res Int, 2018, 1-14, 2018.
Saeys Y, Inza I, Larra˜naga P. “A review of feature selection techniques in bioinformatics”. Bioinformatics, 23(19),2507–2517, 2007.
Knights D, Costello EK, Knight R. “Supervised classification of human microbiota”. FEMS Microbiol Rev, 35(2), 343–359, 2011.
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. “Metagenomic biomarker discovery and explanation”. Genome Biol., 12(6), 1-18, 2011.
Ditzler G, Morrison JC, Lan Y, Rosen GL. “Fizzy: feature subset selection for metagenomics”. BMC Bioinformatics, 16(358), 1-8, 2015.
Torbati ME, Mitreva M, Gopalakrishnan V. “Application of taxonomic modeling to microbiota data mining for detection of Helminth infection in global populations”. Data (Basel), 1(3), 1-23, 2016.
Zhang B, Cao P. “Classification of high dimensional biomedical data based on feature selection using redundant removal”. PLoS ONE, 14(4), 1-19, 2019.
Mahadeo U, Dhanalakshmi KR. “Stability of feature selection algorithm: A review”. Journal of King Saud University –Computer and Information Sciences, Article in Press, 1-14, 2019. https://doi.org/10.1016/j.jksuci.2019.06.012
Somol P, Baesens B, Pudil P Vanthienen J, Leuven KU. “Filter- versus Wrapper-based feature selection for credit scoring”. Int J Intell Syst, 20(10), 985-99, 2005.
Oudah M, Henschel A. “Taxonomy-aware feature engineering for microbiome classification”. Bioinformatics, 19(227), 1-13, 2018.
Haikal C, Chen QQ, Li JY. “Microbiome changes: an indicator of Parkinson’s disease?”. Transl Neurodegener, 8(38),1-9, 2019.

A New Approach to Dimension Reduction for Microbiota Data

Year 2021, Volume: 4 Issue: 1, 23 - 30, 15.01.2021

Handan Ankaralı , Süleyman Yıldırım Nurgül Bulut

Abstract

Microorganisms associated with human skin, nasopharyngeal and oral cavities, vaginal tract, and gastrointestinal system make up the human microbiota. It is highly effective on the physiological, metabolic and immune system and has been shown to be associated with many diseases. Recent advances in DNA sequencing technology have facilitated profiling of these microbial communities through high throughput sequencing of amplicons of the marker genes such as 16S rRNA for bacteria, 18S rRNA or ITS. Data generated from such sequencing efforts are preprocessed into composition or relative abundance that are often presented in species abundance (OTU/ASV) tables. The data obtained consists of the frequency of microbiota species in very large numbers and it contains a large amount of zero values. Nonetheless, the high dimensional data in such tables must be treated with dimension reduction techniques to draw sensible conclusions from the data. In the statistical literature, this process is called dimension reduction or variable selection.
The aim in this study is to propose a novel approach to reduce dimensions in high dimensional and inherently zero inflated and frequency character microbiota data. For this purpose, univariate tests, a zero-inflated negative binomial model, classification and regression trees, and a feature selection and variable screening algorithm were used. Using these four methods enabled us to select most important features of the microbiota dataset for the subsequent downstream analyses.
We tested the above approach on our recent microbiota dataset we generated from stool samples of Parkinson’s disease patients cohort. Of 199 bacteria genera our approach enabled us to select 19 candidate biomarker genera, which are often implicated in serving critical metabolic activities in human body such as production of short-chain fatty acids. To assess the potential of these candidate biomarkers in differentiating disease and healthy states we developed a multiple logistic regression model and further selected their biomarker potential in a stepwise variable screening.
Big data analysis necessarily entails use of increasingly more sophisticated and combinatorial modalities. Here we successfully demonstrated that hitherto untested combinatorial use of feature selection methods enables more useful predictive models. Similar approaches can be tried with different methods and used on different data types.

Keywords

Zero-inflated models , Frequency data , Classification and Regression tree , Variable Screening algorithm , Microbiota , Parkinson’s disease

References

Altuntaş Y, Batman A. “Mikrobiyota ve metabolik sendrom”. Turk Kardiyol Dern Ars , 45(3), 286–296, 2017.
Chen WP, Chang SH, Tang CY, Liou ML, Tsai SJ, Lin YL. “Composition analysis and feature selection of the oral microbiota associated with periodontal disease”. Biomed Res Int, 2018, 1-14, 2018.
Saeys Y, Inza I, Larra˜naga P. “A review of feature selection techniques in bioinformatics”. Bioinformatics, 23(19),2507–2517, 2007.
Knights D, Costello EK, Knight R. “Supervised classification of human microbiota”. FEMS Microbiol Rev, 35(2), 343–359, 2011.
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. “Metagenomic biomarker discovery and explanation”. Genome Biol., 12(6), 1-18, 2011.
Ditzler G, Morrison JC, Lan Y, Rosen GL. “Fizzy: feature subset selection for metagenomics”. BMC Bioinformatics, 16(358), 1-8, 2015.
Torbati ME, Mitreva M, Gopalakrishnan V. “Application of taxonomic modeling to microbiota data mining for detection of Helminth infection in global populations”. Data (Basel), 1(3), 1-23, 2016.
Zhang B, Cao P. “Classification of high dimensional biomedical data based on feature selection using redundant removal”. PLoS ONE, 14(4), 1-19, 2019.
Mahadeo U, Dhanalakshmi KR. “Stability of feature selection algorithm: A review”. Journal of King Saud University –Computer and Information Sciences, Article in Press, 1-14, 2019. https://doi.org/10.1016/j.jksuci.2019.06.012
Somol P, Baesens B, Pudil P Vanthienen J, Leuven KU. “Filter- versus Wrapper-based feature selection for credit scoring”. Int J Intell Syst, 20(10), 985-99, 2005.
Oudah M, Henschel A. “Taxonomy-aware feature engineering for microbiome classification”. Bioinformatics, 19(227), 1-13, 2018.
Haikal C, Chen QQ, Li JY. “Microbiome changes: an indicator of Parkinson’s disease?”. Transl Neurodegener, 8(38),1-9, 2019.

There are 12 citations in total.

Details

Primary Language	Turkish
Journal Section	Articles
Authors	Handan Ankaralı 0000-0002-3613-0523 Süleyman Yıldırım This is me 0000-0002-2752-1223 Nurgül Bulut 0000-0002-7247-6302
Publication Date	January 15, 2021
Published in Issue	Year 2021 Volume: 4 Issue: 1

Cite

APA	Ankaralı, H., Yıldırım, S., & Bulut, N. (2021). Mikrobiyota Verileri İçin Boyut İndirgemede Yeni Bir Yaklaşım. Veri Bilimi, 4(1), 23-30.