Log Analysis with Hadoop MapReduce
Year 2021,
Volume: 1 Issue: 1, 1 - 5, 30.06.2021
Gligor Risteski
Mihiri Chathurika
Beyza Ali
Atanas Hristov
Abstract
Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of these records. Log data often grows quickly and the conventional database solutions run short for dealing with a large volume of log files. Hadoop, having a wide area of applications for Big Data analysis, provides a solution for this problem. In this study, Hadoop was installed on two virtual machines. Log files generated by a Python script were analyzed in order to evaluate the system activities. The aim was to validate the importance of Hadoop in meeting the challenge of dealing with Big Data. The performed experiments show that analyzing logs with Hadoop MapReduce makes the data processing and detection of malfunctions and defects faster and simpler.
References
- Sethy, R. et al. Big Data Analysis using Hadoop: A Survey. International Journal of Advanced Research in Computer Science and Software Engineering 5(7), 2015, pp. 1153-1157.
- Schneider, R.D. Hadoop For Dummies, Special Edition. John Wiley & Sons Canada, Ltd. 2012.
- Borthakur, D. HDFS architecture. Document on Hadoop Wiki. http://hadoop. apache. org/common/docs/r0 20. 2010.
- Vavilapalli, V. K.; et al. Apache hadoop yarn: Yet another resource negotiator. Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
- Hadoop, Apache. Hadoop Archives Guide. The Apache Software Foundation, http:// hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html (2019). Retrieved Oct. 15, 2019.
- Kaur, I. et al. Research Paper on Big Data and Hadoop. IJCST, 7(4), 2016, pp. 50-53.
- Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. Proceedings of Operating Systems Design and Implementation, 2004.
- Yang, H. et al. Map-reduce-merge: simplified relational data processing on large clusters. Proceedings of the ACM SIGMOD international conference on Management of data. ACM, 2007.
- Rohloff, K. and Schantz, R.E. High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. Programming support innovations for emerging distributed applications. ACM, 2010.
- Point, Tutorials. Retrieved Oct. 15, 2019 from Internet Site https://www.tutorialspoint.com.html. Tutorials Point.
- Miner, D. and Radtka, Z. Hadoop with Python. O’Rilley Media. 2016.
- Log analysis https://en.wikipedia.org/wiki/Log_analysis
- Sayalee Narkhede and Tripti Baraskar - Hmr Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce International Journal of UbiComp (IJU), Vol.4, No.3, July 2013.
- “Python Fake Logs.” Internet: https://github.com/s4tori/fake“Python Fake Logs.” Internet: https://github.com/s4tori/fake-logs.-logs.
Log Analysis with Hadoop MapReduce
Year 2021,
Volume: 1 Issue: 1, 1 - 5, 30.06.2021
Gligor Risteski
Mihiri Chathurika
Beyza Ali
Atanas Hristov
Abstract
Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of these records. Log data often grows quickly and the conventional database solutions run short for dealing with a large volume of log files. Hadoop, having a wide area of applications for Big Data analysis, provides a solution for this problem. In this study, Hadoop was installed on two virtual machines. Log files generated by a Python script were analyzed in order to evaluate the system activities. The aim was to validate the importance of Hadoop in meeting the challenge of dealing with Big Data. The performed experiments show that analyzing logs with Hadoop MapReduce makes the data processing and detection of malfunctions and defects faster and simpler.
References
- Sethy, R. et al. Big Data Analysis using Hadoop: A Survey. International Journal of Advanced Research in Computer Science and Software Engineering 5(7), 2015, pp. 1153-1157.
- Schneider, R.D. Hadoop For Dummies, Special Edition. John Wiley & Sons Canada, Ltd. 2012.
- Borthakur, D. HDFS architecture. Document on Hadoop Wiki. http://hadoop. apache. org/common/docs/r0 20. 2010.
- Vavilapalli, V. K.; et al. Apache hadoop yarn: Yet another resource negotiator. Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
- Hadoop, Apache. Hadoop Archives Guide. The Apache Software Foundation, http:// hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html (2019). Retrieved Oct. 15, 2019.
- Kaur, I. et al. Research Paper on Big Data and Hadoop. IJCST, 7(4), 2016, pp. 50-53.
- Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. Proceedings of Operating Systems Design and Implementation, 2004.
- Yang, H. et al. Map-reduce-merge: simplified relational data processing on large clusters. Proceedings of the ACM SIGMOD international conference on Management of data. ACM, 2007.
- Rohloff, K. and Schantz, R.E. High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. Programming support innovations for emerging distributed applications. ACM, 2010.
- Point, Tutorials. Retrieved Oct. 15, 2019 from Internet Site https://www.tutorialspoint.com.html. Tutorials Point.
- Miner, D. and Radtka, Z. Hadoop with Python. O’Rilley Media. 2016.
- Log analysis https://en.wikipedia.org/wiki/Log_analysis
- Sayalee Narkhede and Tripti Baraskar - Hmr Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce International Journal of UbiComp (IJU), Vol.4, No.3, July 2013.
- “Python Fake Logs.” Internet: https://github.com/s4tori/fake“Python Fake Logs.” Internet: https://github.com/s4tori/fake-logs.-logs.