TY - JOUR T1 - Improve Image Classification Using Data Optimization AU - Berrabah, Djamel AU - Gafour, Yacine PY - 2023 DA - December DO - 10.55549/epstem.1409569 JF - The Eurasia Proceedings of Science Technology Engineering and Mathematics JO - EPSTEM PB - ISRES Publishing WT - DergiPark SN - 2602-3199 SP - 262 EP - 271 VL - 26 LA - en AB - Image classification is a fundamental task in machine learning that involves assigning labels or classes to images based on their content. It is often performed using convolutional neural networks (CNNs). These networks are capable of learning and generalizing patterns from large amounts of data. However, if the data is not sufficiently voluminous, overfitting can occur. In such cases, it is recommended to turn to classical machine learning techniques. Moreover, the data that was insufficient for deep learning may exceed the processing capacity of the machine. This can pose significant challenges in terms of storage, memory availability, and computational power required to perform the learning operations. Our proposed approach involves addressing these challenges by optimizing the content of the dataset. This optimization is performed while preserving the essential information necessary for classification. Indeed, identical or highly similar are identified, grouped together and represented by the most representative one among them. At the same time, their sizes can be reduced. Furthermore, another significant challenge in our proposed approach revolves around managing class imbalances within the dataset. Our approach has been evaluated and the results are promising. KW - Unsupervised linear/non-linear dimensionality reduction KW - data visualization technique unsupervised learning algorithm KW - dataset optimization CR - Comon, P. (1994) Independent component analysis: A new concept?. Signal Processing, 36(3), 287-314. CR - Cox, T. & Cox M., (1994). Multidimensional scaling. London: Chapman & Hall. CR - Dutta, S., & Ghosh, A. K. (2016) On some transformations of high dimension, low sample size data for nearest neighbor classification. Mach Learn, 102, 57–83. UR - https://doi.org/10.55549/epstem.1409569 L1 - https://dergipark.org.tr/en/download/article-file/3618990 ER -