Research Article

Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark

Volume: 6 Number: 1 March 31, 2018
  • Ferhat Karataş *
  • Sevcan Aytaç Korkmaz
EN

Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark

Abstract

Continuous changes and the high calculation volume in network data distribution have made it more difficult to detect abnormal behaviors within and analyze data.  For this cause, large data solutions have gained important. With the advancement of internet technologies and the digital age, cyber-attacks have increased steadily. The k-Means clustering algorithm is one of the most widely used algorithms in the world of data mining.  Clustering algorithms are algorithms that automatically divide data into smaller clusters or sub-clusters. The algorithm places statistically similar records in the same group. In this article, we have used k-Means method from the Machine Learning libraries on Spark to determine whether the incoming network values are normal behavior. 400 thousand network data were used in this article. This data was obtained from KDD Cup 1999 Data. We have detected 10 abnormal behaviors from 400 thousand network data with k-means method.

Keywords

References

  1. Terzi, Duygu Sinanc, Ramazan Terzi, and Seref Sagiroglu. "Big data analytics for network anomaly detection from netflow data." Computer Science and Engineering (UBMK), 2017 International Conference on. IEEE, 2017.
  2. Budget-in-Brief Fiscal Year 2016, US Department of Homeland Security, Editor. 2016.
  3. 2016 Norton Cyber Security Insights Report. 2016.
  4. Meng, Xiangrui, et al. "Mllib: Machine learning in apache spark." The Journal of Machine Learning Research 17.1, 1235-1241, 2016.
  5. Terzi, Duygu Sinanc, Ramazan Terzi, and Seref Sagiroglu. "Big data analytics for network anomaly detection from netflow data." Computer Science and Engineering (UBMK), 2017 International Conference on. IEEE, 2017.
  6. Bhuyan, Monowar H., Dhruba Kumar Bhattacharyya, and Jugal K. Kalita. "Network anomaly detection: methods, systems and tools." IEEE communications surveys & tutorials 16.1, 303-336, 2014.
  7. Goldstein, Markus, and Seiichi Uchida. "A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data." PloS one 11.4, 2016.
  8. Lakhina, Anukool, Mark Crovella, and Christophe Diot. "Diagnosing network-wide traffic anomalies." ACM SIGCOMM Computer Communication Review. Vol. 34. No. 4. ACM, 2004.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Authors

Ferhat Karataş * This is me

Sevcan Aytaç Korkmaz This is me

Publication Date

March 31, 2018

Submission Date

February 10, 2018

Acceptance Date

-

Published in Issue

Year 2018 Volume: 6 Number: 1

APA
Karataş, F., & Korkmaz, S. A. (2018). Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark. International Journal of Applied Mathematics Electronics and Computers, 6(1), 1-5. https://doi.org/10.18100/ijamec.2018138629
AMA
1.Karataş F, Korkmaz SA. Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark. International Journal of Applied Mathematics Electronics and Computers. 2018;6(1):1-5. doi:10.18100/ijamec.2018138629
Chicago
Karataş, Ferhat, and Sevcan Aytaç Korkmaz. 2018. “Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark”. International Journal of Applied Mathematics Electronics and Computers 6 (1): 1-5. https://doi.org/10.18100/ijamec.2018138629.
EndNote
Karataş F, Korkmaz SA (March 1, 2018) Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark. International Journal of Applied Mathematics Electronics and Computers 6 1 1–5.
IEEE
[1]F. Karataş and S. A. Korkmaz, “Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark”, International Journal of Applied Mathematics Electronics and Computers, vol. 6, no. 1, pp. 1–5, Mar. 2018, doi: 10.18100/ijamec.2018138629.
ISNAD
Karataş, Ferhat - Korkmaz, Sevcan Aytaç. “Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark”. International Journal of Applied Mathematics Electronics and Computers 6/1 (March 1, 2018): 1-5. https://doi.org/10.18100/ijamec.2018138629.
JAMA
1.Karataş F, Korkmaz SA. Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark. International Journal of Applied Mathematics Electronics and Computers. 2018;6:1–5.
MLA
Karataş, Ferhat, and Sevcan Aytaç Korkmaz. “Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark”. International Journal of Applied Mathematics Electronics and Computers, vol. 6, no. 1, Mar. 2018, pp. 1-5, doi:10.18100/ijamec.2018138629.
Vancouver
1.Ferhat Karataş, Sevcan Aytaç Korkmaz. Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark. International Journal of Applied Mathematics Electronics and Computers. 2018 Mar. 1;6(1):1-5. doi:10.18100/ijamec.2018138629

Cited By