| | | |

## INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA

#### Zirje Hasani [1]

Analyzing and detecting anomalies in huge amount of data are a big challenge. On one hand we are faced with the problem of storing a large amount of data, on the other to process it and detect anomalies in reasonable or even real time. Real time analytics can be defined as the capacity to use all available enterprise data and sources in the moment they arrive or happen in the system. In this paper, we present an infrastructure that we have implemented in order to analyze data from big log files in real time. Also we present algorithms that are used for anomaly detection in big data. The algorithms are implemented in R language. The main components of the infrastructure are Redis, Logstash, Elasticsearch, elastic-R client and Kibana. We explore implementation of several filters in order to post-process the log information and produce various statistics that suit our needs in analyzing log files containing SQL queries from a big national system in education. The post-processing of the SQL queries is mainly focused on preparing the log information in adequate format and information extraction. The other interesting part of the paper is to compare the anomaly detection algorithms and to conclude which of them is better to us for our needs. Also we add the elastic-R client to the infrastructure we develop for big data analytic in order to detect anomalies. The purpose of the analysis is to monitor performance and detect anomalies in order to prevent possible problems in real time.

Big data, anomaly detection elgorithm, log data, logstash, elasticsearch, elastic-R client, kibana
• Ian Delahorne. Postgresql Metrics With Logstash. Retrieved April 04, 2015, from http://ian.delahorne.com/blog/2014/06/10/postgresqlmetrics-pipeline
• Logstash. Retrieved April 05, 2015, from http://logstash.net/docs/1.4.2/filters/metrics.
• James Turnbull. The Logstash Book Log management made easy. January 26, 2014.
• Radu Gheorghe and Matthew Lee Hinman. Elasticsearch in action. Manning Publications 2014.
• Mitchell Anicas. How To Use Logstash and Kibana To Centralize Logs On Ubuntu 14.04. Retrieved April 06, 2015, from https://www.digitalocean.com/community/tutorials/how-to-use-logstash-and-kibana-to-centralize-and-visualize-logs-on-ubuntu-14-04.
• Zirije Hasani, Margita Kon-Popovska, Goran Velinov. Survey of Technologies for Real Time Big Data Streams Analytic. 11th International Conference on Informatics and Information Technologies. April 11-13, 2014 – Bitola, Macedonia.
• Zirije Hasani, Margita Kon-Popovska, Goran Velinov. Lambda Architecture for Real Time Big Data Analytic. ICT Innovations 2014 Web Proceedings ISSN 1857-7288
• Zirije Hasani. Performance comparison throw running job in Hadoop by defining the number of maps and reduces. 12th International Conference on Informatics and Information Technologies 2015. April 24-26, 2015 – Bitola, Macedonia.
• Zirije Hasani. Virtuoso, System for Saving Semantic Data. 12th International Conference on Informatics and Information Technologies 2015. April 24-26, 2015 – Bitola, Macedonia
• Apache Lucena. Retrieved April 30, 2015, from https://lucene.apache.org/.
• Redis. Retrieved April 30, 2015, from http://redis.io/.
• DBSCAN. Retrieved December 20, 2016, from https://cran.r-project.org/web/packages/dbscan/dbscan.pdf elastic r client. Retrieved November 20 2016, from http://finzi.psych.upenn.edu/library/elastic/html/elastic.html
• doubleMAD algorithm. Retrieved November 10 2016, from http://eurekastatistics.com/using-the-median-absolute-deviation-to-findoutliers/
 Bibtex @research article { pap371651, journal = {PressAcademia Procedia}, issn = {}, eissn = {2459-0762}, address = {Siteler Sok. No.12/18 Maltepe, 34843, Istanbul}, publisher = {PressAcademia}, year = {2017}, pages = {181 - 189}, doi = {10.17261/Pressacademia.2017.588}, title = {INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA}, key = {cite}, author = {Hasani, Zirje} } APA Hasani, Z . (2017). INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA . PressAcademia Procedia , 5 (1) , 181-189 . DOI: 10.17261/Pressacademia.2017.588 MLA Hasani, Z . "INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA" . PressAcademia Procedia 5 (2017 ): 181-189 Chicago Hasani, Z . "INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA". PressAcademia Procedia 5 (2017 ): 181-189 RIS TY - JOUR T1 - INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA AU - Zirje Hasani Y1 - 2017 PY - 2017 N1 - doi: 10.17261/Pressacademia.2017.588 DO - 10.17261/Pressacademia.2017.588 T2 - PressAcademia Procedia JF - Journal JO - JOR SP - 181 EP - 189 VL - 5 IS - 1 SN - -2459-0762 M3 - doi: 10.17261/Pressacademia.2017.588 UR - https://doi.org/10.17261/Pressacademia.2017.588 Y2 - 2020 ER - EndNote %0 PressAcademia Procedia INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA %A Zirje Hasani %T INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA %D 2017 %J PressAcademia Procedia %P -2459-0762 %V 5 %N 1 %R doi: 10.17261/Pressacademia.2017.588 %U 10.17261/Pressacademia.2017.588 ISNAD Hasani, Zirje . "INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA". PressAcademia Procedia 5 / 1 (June 2017): 181-189 . https://doi.org/10.17261/Pressacademia.2017.588 AMA Hasani Z . INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA. PAP. 2017; 5(1): 181-189. Vancouver Hasani Z . INFRASTRUCTURE WITH R PACKAGE FOR ANOMALY DETECTION IN REAL TIME BIG LOG DATA. PressAcademia Procedia. 2017; 5(1): 181-189.