K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data

Polina Lemenkova

Research Article

BibTex

RIS

Cite

Year 2019, Volume: 2 Issue: 1, 1 - 26, 23.09.2019

Polina Lemenkova

Abstract

References

Ciaccio, A.D., Coli, M., Angulo Ibanez, J.M.: Studies in Theoretical and Applied Statistics Selected Papers of the Statistical Societies, chap. Advanced Statistical Methods for the Anaysis of Large Data Sets, p. 464. Springer (2012). https://doi.org/10.1007/978-3-642-21037-2
Cielen,D., Meysman, A. D. B., M., A.: Introducing Data Science. Big Data, Machine Learning and More, Using Python Tools. Manning, Shelter Island, U.S. (2016)
van Haren, H., Berndt, C., Klaucke, I.: Ocean mixing in deep-sea trenches: New insights from the Challenger Deep, Mariana Trench. Deep-Sea Research Part I: Oceanographic Research Papers (2017). https://doi.org/10.1016/j.dsr.2017.09.003
Hartwell, A.M., Voight, J.R., Wheat, C.G.: Clusters of deep-sea egg-brooding octopods associated with warm fluid discharge: An ill-fated fragment of a larger, discrete population? Deep-Sea Research Part I: Oceanographic Research Papers 135, 1–8 (2018). https://doi.org/10.1016/j.dsr.2018.03.011
Hessler, R.R., Ingram, C.L., Yayanos, A.A., Burnett, B.: Scavenging amphipods from the floor of the Philippine Trench. Deep-Sea Research Part I: Oceanographic Research Papers 25, 1029–1047 (1978)
Ichino, M.C., Clark, M.R., Drazen, J.C., Jamieson, A., Jones, D.O.B., Martin, A.P., Rowden, A.A., Shank, T.M., Yancey, P.H., Ruhl, H.A.: The distribution of benthic biomass in hadal trenches: A modelling approach to investi- gate the effect of vertical and lateral organic matter transport to the seafloor. Deep-Sea Research Part I: Oceanographic Research Papers 100, 21–33 (2015). https://doi.org/10.1016/j.dsr.2015.01.010
Itoh, M., Kawamura, K., Kitahashi, T., kiKojima, S., Katagiri, H., Shimanaga, M.: Bathymetric patterns of meiofaunal abundance and biomass associated with the Kuril and Ryukyu trenches, western North Pacific Ocean. Deep-Sea Research Part I: Oceanographic Research Papers 58, 86–97 (2011). https://doi.org/10.1016/j.dsr.2010.12.004
Jamieson, A.J., Fujii, T.: Trench Connection. Biology Letters 7, 641–643 (2011). https://doi.org/10.1098/rsbl.2011.0231
Jamieson, A.J., Fujii, T., Mayor, D.J., Solan, M., Priede, I.G.: Hadal trenches: the ecology of the deepest places on Earth. Trends in Ecology and Evolution 25(3), 190–197 (2009). https://doi.org/10.1016/j.tree.2009.09.009
Michel, V., Gramfort, A., Varoquaux, G., Eger, E., Keribin, C., Thirion, B.: A supervised clustering approach for fMRI-based inference of brain states. Patt Rec (2011). https://doi.org/10.1016/j.patcog.2011.04.006
Myers, J.L., Well, A.D.: Research Design and Statistical Analysis. Lawrence Erlbaum, 2 edn. (2003)
R Development Core Team: R: a language and environment for statistical computing. R Foundation, Vienna, Austria (2014), available at http://www.R-project.org
Roberts, N.M., Tikoff, B., Davis, J.R., Stetson-Lee, T.: The utility of statis- tical analysis in structural geology. Journal of Structural Geology pp. 1–39 (2018). https://doi.org/10.1016/j.jsg.2018.05.030, reference: SG 3671; PII: S0191-8141(17)30339-5
Romankevich, E.A., Vetrov, A.A., Peresypkin, V.I.: Organic matter of the World Ocean. Russian Geology and Geophysics 50, 299–307 (2008). https://doi.org/10.1016/j.rgg.2009.03.013
Marques de Sa ́, J.P.: Applied Statistics Using SPSS, Statistics, Matlab and R. Springer, Porto, Portugal, 2 edn. (2007), library of Congress Control Number: 2007926024
Stewart, H.A., Jamieson, A.J.: Habitat heterogeneity of hadal trenches: Considerations and implications for future studies. Progress in Oceanography 161, 47–65 (2018). https://doi.org/10.1016/j.pocean.2018.01.007
Swan, A.R.H., Sandilands, M. Introduction to Geological Data Analysis. Blackwell Science, Cambridge, Mass., USA (1995), Library of Congress: QE33.2.S82S931995; Dewey Decimal Classification (UDC): 550/.72
Vermeesch, P., Resentini, A., Garzanti, E.: An R package for statistical provenance analysis. Sedimentary Geology 336, 14–25 (2016). https://doi.org/10.1016/j.sedgeo.2016.01.009
Webb, T.J., Berghe, E.V., O‘Dor, R.: Biodiversity’s Big Wet Secret: The Global Distribution of Marine Biological Records Reveals Chronic Under-Exploration of the Deep Pelagic Ocean. PlosOne 5, 1–6 (8 2010). https://doi.org/10.1371/journal.pone.0010223
Xu, Y., Ge, H., Fang, J.: Biogeochemistry of hadal trenches: Recent developments and future perspectives. Deep-Sea Research Part II: Topical Studies in Oceanography 155, 19–26 (2018). https://doi.org/10.1016/j.dsr2.2018.10.006
Yancey, P.H., Gerringer, M.E., Drazen, J.C., Rowden, A.A., Jamieson, A.: Marine fish may be biochemically constrained from inhabiting the deepest ocean depths. PNAS (Proceedings of the National Academy of Sciences of the United States of America) 111, 4461–4465 (2014). https://doi.org/10.1073/pnas.1322003111

K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data

Year 2019, Volume: 2 Issue: 1, 1 - 26, 23.09.2019

Polina Lemenkova

Abstract

Cluster analysis by k-means algorithm by R programming
is the scope of the current paper. The study assesses the similarity of
the sampling data derived from the GIS project by homogeneity of their
attribute parameters aimed to analyze similar clusters of the observa-
tion data by the variety of parameters: geology (similar location on the
tectonic plates, sediment thickness, igneous volcanic areas), bathymetry
(similar depth ranges) and geomorphology (similar slope steepness and
aspect). The geological case study is Mariana Trench. Clustering as ef-
fective statistical method to detect similar groups in the data set. Tech-
nically, major used R libraries include {cluster}, {factoextra}, {ggplot2}.
Minor R libraries include {wordcloud}, {tm}. Several clusters were tested
from 2 to 7, optical number is 5. The findings include following computed
and visualized results illustrated by 8 figures: 1) correlation matrix show-
ing crossing correlations in the combination of factors; 2) comparison of
the bi-factors in-between the factors revealed pairwise correlation; 3)
pairwise comparative analysis enabled to observe an influence on the
variables as bi-factors: in response to the decreasing sediment thickness,
slope angles go in parallel; 4) the location of the volcanic igneous ar-
eas cause a cyclic repetition of the curve for the slope angles, and those
of the volcanic zones have correlation with the slope angle and aspect
degree. Findings reveals that four variables affect geomorphology of the
trench: slope angle, sediment thickness, aspect degree and volcanic ig-
neous areas. The paper includes 7 listings of R programming codes for
repeatability of the algorithms in similar research.

Keywords

R, programming language, statistics, geospatial data, k-means clustering, cluster analysis, data grouping, marine geology

References

Ciaccio, A.D., Coli, M., Angulo Ibanez, J.M.: Studies in Theoretical and Applied Statistics Selected Papers of the Statistical Societies, chap. Advanced Statistical Methods for the Anaysis of Large Data Sets, p. 464. Springer (2012). https://doi.org/10.1007/978-3-642-21037-2
Cielen,D., Meysman, A. D. B., M., A.: Introducing Data Science. Big Data, Machine Learning and More, Using Python Tools. Manning, Shelter Island, U.S. (2016)
van Haren, H., Berndt, C., Klaucke, I.: Ocean mixing in deep-sea trenches: New insights from the Challenger Deep, Mariana Trench. Deep-Sea Research Part I: Oceanographic Research Papers (2017). https://doi.org/10.1016/j.dsr.2017.09.003
Hartwell, A.M., Voight, J.R., Wheat, C.G.: Clusters of deep-sea egg-brooding octopods associated with warm fluid discharge: An ill-fated fragment of a larger, discrete population? Deep-Sea Research Part I: Oceanographic Research Papers 135, 1–8 (2018). https://doi.org/10.1016/j.dsr.2018.03.011
Hessler, R.R., Ingram, C.L., Yayanos, A.A., Burnett, B.: Scavenging amphipods from the floor of the Philippine Trench. Deep-Sea Research Part I: Oceanographic Research Papers 25, 1029–1047 (1978)
Ichino, M.C., Clark, M.R., Drazen, J.C., Jamieson, A., Jones, D.O.B., Martin, A.P., Rowden, A.A., Shank, T.M., Yancey, P.H., Ruhl, H.A.: The distribution of benthic biomass in hadal trenches: A modelling approach to investi- gate the effect of vertical and lateral organic matter transport to the seafloor. Deep-Sea Research Part I: Oceanographic Research Papers 100, 21–33 (2015). https://doi.org/10.1016/j.dsr.2015.01.010
Itoh, M., Kawamura, K., Kitahashi, T., kiKojima, S., Katagiri, H., Shimanaga, M.: Bathymetric patterns of meiofaunal abundance and biomass associated with the Kuril and Ryukyu trenches, western North Pacific Ocean. Deep-Sea Research Part I: Oceanographic Research Papers 58, 86–97 (2011). https://doi.org/10.1016/j.dsr.2010.12.004
Jamieson, A.J., Fujii, T.: Trench Connection. Biology Letters 7, 641–643 (2011). https://doi.org/10.1098/rsbl.2011.0231
Jamieson, A.J., Fujii, T., Mayor, D.J., Solan, M., Priede, I.G.: Hadal trenches: the ecology of the deepest places on Earth. Trends in Ecology and Evolution 25(3), 190–197 (2009). https://doi.org/10.1016/j.tree.2009.09.009
Michel, V., Gramfort, A., Varoquaux, G., Eger, E., Keribin, C., Thirion, B.: A supervised clustering approach for fMRI-based inference of brain states. Patt Rec (2011). https://doi.org/10.1016/j.patcog.2011.04.006
Myers, J.L., Well, A.D.: Research Design and Statistical Analysis. Lawrence Erlbaum, 2 edn. (2003)
R Development Core Team: R: a language and environment for statistical computing. R Foundation, Vienna, Austria (2014), available at http://www.R-project.org
Roberts, N.M., Tikoff, B., Davis, J.R., Stetson-Lee, T.: The utility of statis- tical analysis in structural geology. Journal of Structural Geology pp. 1–39 (2018). https://doi.org/10.1016/j.jsg.2018.05.030, reference: SG 3671; PII: S0191-8141(17)30339-5
Romankevich, E.A., Vetrov, A.A., Peresypkin, V.I.: Organic matter of the World Ocean. Russian Geology and Geophysics 50, 299–307 (2008). https://doi.org/10.1016/j.rgg.2009.03.013
Marques de Sa ́, J.P.: Applied Statistics Using SPSS, Statistics, Matlab and R. Springer, Porto, Portugal, 2 edn. (2007), library of Congress Control Number: 2007926024
Stewart, H.A., Jamieson, A.J.: Habitat heterogeneity of hadal trenches: Considerations and implications for future studies. Progress in Oceanography 161, 47–65 (2018). https://doi.org/10.1016/j.pocean.2018.01.007
Swan, A.R.H., Sandilands, M. Introduction to Geological Data Analysis. Blackwell Science, Cambridge, Mass., USA (1995), Library of Congress: QE33.2.S82S931995; Dewey Decimal Classification (UDC): 550/.72
Vermeesch, P., Resentini, A., Garzanti, E.: An R package for statistical provenance analysis. Sedimentary Geology 336, 14–25 (2016). https://doi.org/10.1016/j.sedgeo.2016.01.009
Webb, T.J., Berghe, E.V., O‘Dor, R.: Biodiversity’s Big Wet Secret: The Global Distribution of Marine Biological Records Reveals Chronic Under-Exploration of the Deep Pelagic Ocean. PlosOne 5, 1–6 (8 2010). https://doi.org/10.1371/journal.pone.0010223
Xu, Y., Ge, H., Fang, J.: Biogeochemistry of hadal trenches: Recent developments and future perspectives. Deep-Sea Research Part II: Topical Studies in Oceanography 155, 19–26 (2018). https://doi.org/10.1016/j.dsr2.2018.10.006
Yancey, P.H., Gerringer, M.E., Drazen, J.C., Rowden, A.A., Jamieson, A.: Marine fish may be biochemically constrained from inhabiting the deepest ocean depths. PNAS (Proceedings of the National Academy of Sciences of the United States of America) 111, 4461–4465 (2014). https://doi.org/10.1073/pnas.1322003111

There are 21 citations in total.

Details

Primary Language	English
Subjects	Software Engineering (Other)
Journal Section	Articles
Authors	Polina Lemenkova 0000-0002-5759-1089
Publication Date	September 23, 2019
Acceptance Date	August 3, 2019
Published in Issue	Year 2019 Volume: 2 Issue: 1

Cite

APA	Lemenkova, P. (2019). K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. International Journal of Informatics and Applied Mathematics, 2(1), 1-26.
AMA	Lemenkova P. K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. IJIAM. September 2019;2(1):1-26.
Chicago	Lemenkova, Polina. “K-Means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data”. International Journal of Informatics and Applied Mathematics 2, no. 1 (September 2019): 1-26.
EndNote	Lemenkova P (September 1, 2019) K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. International Journal of Informatics and Applied Mathematics 2 1 1–26.
IEEE	P. Lemenkova, “K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data”, IJIAM, vol. 2, no. 1, pp. 1–26, 2019.
ISNAD	Lemenkova, Polina. “K-Means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data”. International Journal of Informatics and Applied Mathematics 2/1 (September 2019), 1-26.
JAMA	Lemenkova P. K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. IJIAM. 2019;2:1–26.
MLA	Lemenkova, Polina. “K-Means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data”. International Journal of Informatics and Applied Mathematics, vol. 2, no. 1, 2019, pp. 1-26.
Vancouver	Lemenkova P. K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. IJIAM. 2019;2(1):1-26.

Article Files

Full Text

International Journal of Informatics and Applied Mathematics