Analyzing and detecting anomalies in
huge amount of data are a big challenge. On one hand we are faced with the
problem of storing a large amount of data, on the other to process it and
detect anomalies in reasonable or even real time. Real time analytics can be
defined as the capacity to use all available enterprise data and sources in the
moment they arrive or happen in the system. In this paper, we present an
infrastructure that we have implemented in order to analyze data from big log
files in real time. Also we present algorithms that are used for anomaly
detection in big data. The algorithms are implemented in R language. The main components
of the infrastructure are Redis, Logstash, Elasticsearch, elastic-R client and
Kibana. We explore implementation of several filters in order to post-process
the log information and produce various statistics that suit our needs in
analyzing log files containing SQL queries from a big national system in
education. The post-processing of the SQL queries is mainly focused on
preparing the log information in adequate format and information extraction.
The other interesting part of the paper is to compare the anomaly detection
algorithms and to conclude which of them is better to us for our needs. Also we
add the elastic-R client to the infrastructure we develop for big data analytic
in order to detect anomalies. The purpose of the analysis is to monitor performance
and detect anomalies in order to prevent possible problems in real time.
Big data anomaly detection elgorithm log data logstash elasticsearch elastic-R client kibana
Journal Section | Articles |
---|---|
Authors | |
Publication Date | June 30, 2017 |
Published in Issue | Year 2017 |
PressAcademia Procedia (PAP) publishes proceedings of conferences, seminars and symposiums. PressAcademia Procedia aims to provide a source for academic researchers, practitioners and policy makers in the area of social and behavioral sciences, and engineering.
PressAcademia Procedia invites academic conferences for publishing their proceedings with a review of editorial board. Since PressAcademia Procedia is an double blind peer-reviewed open-access book, the manuscripts presented in the conferences can easily be reached by numerous researchers. Hence, PressAcademia Procedia increases the value of your conference for your participants.
PressAcademia Procedia provides an ISBN for each Conference Proceeding Book and a DOI number for each manuscript published in this book.
PressAcademia Procedia is currently indexed by DRJI, J-Gate, International Scientific Indexing, ISRA, Root Indexing, SOBIAD, Scope, EuroPub, Journal Factor Indexing and InfoBase Indexing.
Please contact to procedia@pressacademia.org for your conference proceedings.