Research Article
BibTex RIS Cite

Log Analysis with Hadoop MapReduce

Year 2021, Volume: 1 Issue: 1, 1 - 5, 30.06.2021

Abstract

Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of these records. Log data often grows quickly and the conventional database solutions run short for dealing with a large volume of log files. Hadoop, having a wide area of applications for Big Data analysis, provides a solution for this problem. In this study, Hadoop was installed on two virtual machines. Log files generated by a Python script were analyzed in order to evaluate the system activities. The aim was to validate the importance of Hadoop in meeting the challenge of dealing with Big Data. The performed experiments show that analyzing logs with Hadoop MapReduce makes the data processing and detection of malfunctions and defects faster and simpler.

References

  • Sethy, R. et al. Big Data Analysis using Hadoop: A Survey. International Journal of Advanced Research in Computer Science and Software Engineering 5(7), 2015, pp. 1153-1157.
  • Schneider, R.D. Hadoop For Dummies, Special Edition. John Wiley & Sons Canada, Ltd. 2012.
  • Borthakur, D. HDFS architecture. Document on Hadoop Wiki. http://hadoop. apache. org/common/docs/r0 20. 2010.
  • Vavilapalli, V. K.; et al. Apache hadoop yarn: Yet another resource negotiator. Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
  • Hadoop, Apache. Hadoop Archives Guide. The Apache Software Foundation, http:// hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html (2019). Retrieved Oct. 15, 2019.
  • Kaur, I. et al. Research Paper on Big Data and Hadoop. IJCST, 7(4), 2016, pp. 50-53.
  • Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. Proceedings of Operating Systems Design and Implementation, 2004.
  • Yang, H. et al. Map-reduce-merge: simplified relational data processing on large clusters. Proceedings of the ACM SIGMOD international conference on Management of data. ACM, 2007.
  • Rohloff, K. and Schantz, R.E. High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. Programming support innovations for emerging distributed applications. ACM, 2010.
  • Point, Tutorials. Retrieved Oct. 15, 2019 from Internet Site https://www.tutorialspoint.com.html. Tutorials Point.
  • Miner, D. and Radtka, Z. Hadoop with Python. O’Rilley Media. 2016.
  • Log analysis https://en.wikipedia.org/wiki/Log_analysis
  • Sayalee Narkhede and Tripti Baraskar - Hmr Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce International Journal of UbiComp (IJU), Vol.4, No.3, July 2013.
  • “Python Fake Logs.” Internet: https://github.com/s4tori/fake“Python Fake Logs.” Internet: https://github.com/s4tori/fake-logs.-logs.

Log Analysis with Hadoop MapReduce

Year 2021, Volume: 1 Issue: 1, 1 - 5, 30.06.2021

Abstract

Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of these records. Log data often grows quickly and the conventional database solutions run short for dealing with a large volume of log files. Hadoop, having a wide area of applications for Big Data analysis, provides a solution for this problem. In this study, Hadoop was installed on two virtual machines. Log files generated by a Python script were analyzed in order to evaluate the system activities. The aim was to validate the importance of Hadoop in meeting the challenge of dealing with Big Data. The performed experiments show that analyzing logs with Hadoop MapReduce makes the data processing and detection of malfunctions and defects faster and simpler.

References

  • Sethy, R. et al. Big Data Analysis using Hadoop: A Survey. International Journal of Advanced Research in Computer Science and Software Engineering 5(7), 2015, pp. 1153-1157.
  • Schneider, R.D. Hadoop For Dummies, Special Edition. John Wiley & Sons Canada, Ltd. 2012.
  • Borthakur, D. HDFS architecture. Document on Hadoop Wiki. http://hadoop. apache. org/common/docs/r0 20. 2010.
  • Vavilapalli, V. K.; et al. Apache hadoop yarn: Yet another resource negotiator. Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
  • Hadoop, Apache. Hadoop Archives Guide. The Apache Software Foundation, http:// hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html (2019). Retrieved Oct. 15, 2019.
  • Kaur, I. et al. Research Paper on Big Data and Hadoop. IJCST, 7(4), 2016, pp. 50-53.
  • Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. Proceedings of Operating Systems Design and Implementation, 2004.
  • Yang, H. et al. Map-reduce-merge: simplified relational data processing on large clusters. Proceedings of the ACM SIGMOD international conference on Management of data. ACM, 2007.
  • Rohloff, K. and Schantz, R.E. High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. Programming support innovations for emerging distributed applications. ACM, 2010.
  • Point, Tutorials. Retrieved Oct. 15, 2019 from Internet Site https://www.tutorialspoint.com.html. Tutorials Point.
  • Miner, D. and Radtka, Z. Hadoop with Python. O’Rilley Media. 2016.
  • Log analysis https://en.wikipedia.org/wiki/Log_analysis
  • Sayalee Narkhede and Tripti Baraskar - Hmr Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce International Journal of UbiComp (IJU), Vol.4, No.3, July 2013.
  • “Python Fake Logs.” Internet: https://github.com/s4tori/fake“Python Fake Logs.” Internet: https://github.com/s4tori/fake-logs.-logs.
There are 14 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Research Articles
Authors

Gligor Risteski This is me

Mihiri Chathurika This is me

Beyza Ali This is me

Atanas Hristov This is me 0000-0003-2741-8370

Publication Date June 30, 2021
Published in Issue Year 2021 Volume: 1 Issue: 1

Cite

APA Risteski, G., Chathurika, M., Ali, B., Hristov, A. (2021). Log Analysis with Hadoop MapReduce. Journal of Emerging Computer Technologies, 1(1), 1-5.
Journal of Emerging Computer Technologies
is indexed and abstracted by
Index Copernicus, ROAD, Academia.edu, Google Scholar, Asos Index, Academic Resource Index (Researchbib), OpenAIRE, IAD, Cosmos, EuroPub, Academindex

Publisher
Izmir Academy Association