Classification of Animals with Different Deep Learning Models
Abstract
The purpose of this study is that using different deep learning models for classification of 14 different animals. Deep Learning, an area of artificial intelligence, has been used in a wide range of recent years. Especially, it using in advanced level of image processing, voice recognition and natural language processing fields. One of the most important reasons for using a large field in image analysis is that it performs the feature extraction itself on the image and gives high accuracy results. It performs learning by creating at different levels representations for each image. Unlike other machine learning methods, there is no need of an expert for feature extraction on the images. Convolution Neural Network (CNN), which is the basic architecture of deep learning models, consists of different layers. These are Convolution Layer, ReLu Layer, Pooling Layer and Full Connected Layer. Deep learning models are designed using different numbers of these layers. AlexNet and VggNet models are used for classified of 14 different animals. These animals are Horse, Camel, Cow, Goat, Sheep, Wolf, Dog, Cat, Deer, Pig, Bear, Leopard, Elephant and Kangaroo respectively. Animals that are most likely to encounter when during driving road were selected. Because thinking this work to be a preliminary work for the control of autonomous vehicle driving. The images of animals are collected in color (RGB) on the internet. In order to increase the data diversity, images were also taken from the ready data sets. A total of 150 images were collected with 125 training and 25 test data for each animal. Two different data sets have been created, with each image having dimensions of 224x224 and 227x227. As a result of the study, the classification of the animals was realized with %91.2 accuracy with VggNet and %67.65 with AlexNet. The high error rate in AlexNet is due to the small number of layers in the network and the high selection of parameter values. For example, the filter size in the convolution layer in AlexNet architecture is 11x11 and the number of stride is 4. This situation causes data loss in transferring the information to the next layer. In contrast, VggNet has a filter size of 3x3 and a number of steps of 1, there is no data loss in the transfer to the next layer.
Keywords
References
- Girshick, R., Donahue, J., Darrell, T., Malik, J., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 580-587.
- Graves, A., Mohamed, A.-r., Hinton, G., 2013. Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, pp. 6645-6649.
- He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J., 2016. Deep Residual Learning for Image Recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cpvr), 770-778.
- Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P., 2015. Teaching machines to read and comprehend, Advances in Neural Information Processing Systems, pp. 1693-1701.
- Heuritech, 2018. https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/.
- Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82-97.
- Jarrett, K., Kavukcuoglu, K., LeCun, Y., 2009. What is the best multi-stage architecture for object recognition?, Computer Vision, 2009 IEEE 12th International Conference on. IEEE, pp. 2146-2153.
- Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y., 2016. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410.
- Krizhevsky, A., Sutskever, I., Hinton, G., 2012. ImageNet classification with deep convolutional neural networks. In NIPS’2012 . 23, 24, 27, 100, 200, 371, 456, 460.
- Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C., 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.