The
K-Nearest Neighbor classifier is a well-known and widely applied method in data
mining applications. Nevertheless, its high computation and memory usage cost
makes the classical K-NN not feasible for today’s Big Data analysis
applications. To overcome the cost drawbacks of the known data mining methods,
several distributed environment alternatives have emerged. Among these
alternatives, Hadoop MapReduce distributed ecosystem attracted significant
attention. Recently, several K-NN based classification algorithms have been
proposed which are distributed methods tested in Hadoop environment and
suitable for emerging data analysis needs. In this work, a new distributed
Z-KNN algorithm is proposed, which improves the classification accuracy
performance of the well-known K-Nearest Neighbor (K-NN) algorithm by benefiting
from the representativeness relationship of the instances belonging to
different data classes. The proposed algorithm relies on the data class
representations derived from the Z data instances from each class, which are
the closest to the test instance. The Z-KNN algorithm was tested in a physical
Hadoop Cluster using several real-datasets belonging to different application
areas. The performance results acquired after extensive experiments are
presented in this paper and they prove that the proposed Z-KNN algorithm is a
competitive alternative to other studies recently proposed in the literature
Primary Language | English |
---|---|
Subjects | Engineering |
Journal Section | Araştırma Articlessi |
Authors | |
Publication Date | April 30, 2018 |
Published in Issue | Year 2018 Volume: 6 Issue: 2 |
All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.