Research Article
BibTex RIS Cite
Year 2025, Volume: 9 Issue: 2, 281 - 289
https://doi.org/10.31127/tuje.1579795

Abstract

References

  • Lu, M., Hu, Y., & Lu, X. (2020). Driver action recognition using deformable and dilated faster r-cnn with optimized region proposals. Applied Intelligence, 50(4), 1100–1111. https://doi.org/10.1007/s10489-019-01603-4.
  • Lin, W., Sun, M.-T., Poovandran, R., & Zhang, Z. (2008). Human activity recognition for video surveillance. In Proceedings of the 2008 IEEE International Symposium on Circuits and Systems (ISCAS), 2737–2740. IEEE. https://doi.org/10.1109/ISCAS.2008.4542023.
  • Rodomagoulakis, I., Kardaris, N., Pitsikalis, V., Mavroudi, E., Katsamanis, A., Tsiami, A., & Maragos, P. (2016). Multimodal human action recognition in assistive human-robot interaction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2702–2706. IEEE. https://doi.org/10.1109/ICASSP.2016.7472168
  • Dentamaro, V., Gattulli, V., Impedovo, D., & Manca, F. (2024). Human activity recognition with smartphone-integrated sensors: A survey. Expert Systems with Applications, 123, 143. https://doi.org/10.1016/j.eswa.2024.123143
  • Aydın, V. A. (2024). Comparison of CNN-based methods for yoga pose classification. Turkish Journal of Engineering, 8(1), 65–75. https://doi.org/10.31127/tuje.1275826
  • Gülgün, O. D., & Erol, H. (2020). Classification performance comparisons of deep learning models in pneumonia diagnosis using chest x-ray images. Turkish Journal of Engineering, 4(3), 129–141. https://doi.org/10.31127/tuje.652358
  • Polater, S. N., & Sevli, O. (2024). Deep learning based classification for Alzheimer’s disease detection using MRI images. Turkish Journal of Engineering, 8(4), 729–740. https://doi.org/10.31127/tuje.1434866.
  • Willems, G., Tuytelaars, T., & Van Gool, L. (2008). An efficient dense and scale-invariant spatio-temporal interest point detector. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part II (Vol. 10, pp. 650–663). Springer. https://doi.org/10.1007/978-3-540-88688-4_48.
  • Klaser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In BMVC 2008-19th British Machine Vision Conference (pp. 275–1). British Machine Vision Association. https://doi.org/10.5244/C.22.99
  • Bux, A., Angelov, P., & Habib, Z. (2017). Vision based human activity recognition: A review. In Advances in Computational Intelligence Systems: Contributions Presented at the 16th UK Workshop on Computational Intelligence, September 7–9, 2016, Lancaster, UK (pp. 341–371). Springer. https://doi.org/10.1007/978-3-319-46562-3_23.
  • Charalampous, K., & Gasteratos, A. (2016). Online deep learning method for action recognition. Pattern Analysis and Applications, 19, 337–354. https://doi.org/10.1007/s10044-014-0404-8.
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105. https://doi.org/10.1145/3065386.
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9). https://doi.org/ 10.1109/CVPR.2015.7298594.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.
  • Ronao, C. A., & Cho, S.-B. (2016). Human activity recognition with smartphone sensors using deep learning neural networks. Expert Systems with Applications, 59, 235–244. https://doi.org/10.1016/j.eswa.2016.04.032.
  • Hughes, D., & Correll, N. (2018). Distributed convolutional neural networks for human activity recognition in wearable robotics. In Distributed Autonomous Robotic Systems: The 13th International Symposium (pp. 619–631). Springer. https://doi.org/10.1007/978-3-319-73008-0_43
  • Dong, M., Han, J., He, Y., & Jing, X. (2019). Har-net: Fusing deep representation and hand-crafted features for human activity recognition. In Signal and Information Processing, Networking and Computers: Proceedings of the 5th International Conference on Signal and Information Processing, Networking and Computers (ICSINC) (pp. 32–40). Springer. https://doi.org/10.1007/978-981-13-7123-3_4
  • Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231. https://doi.org/ 10.1109/TPAMI.2012.59
  • Taylor, G. W., Fergus, R., LeCun, Y., & Bregler, C. (2010). Convolutional learning of spatio-temporal features. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part VI (Vol. 11, pp. 140–153). Springer. https://doi.org/10.1007/978-3-642-15567-3_11.
  • Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4489–4497. https://doi.org/10.1109/ICCV.2015.510
  • Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27. https://doi.org/10.48550/arXiv.1406.2199.
  • Lv, M., Xu, W., & Chen, T. (2019). A hybrid deep convolutional and recurrent neural network for complex activity recognition using multimodal sensors. Neurocomputing, 362, 33–40. https://doi.org/10.1016/j.neucom.2019.06.051
  • Ketykó, I., Kovács, F., & Varga, K. Z. (2019). Domain adaptation for sEMG-based gesture recognition with recurrent neural networks. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), 1–7. IEEE. https://doi.org/10.1109/IJCNN.2019.8852018
  • Inoue, M., Inoue, S., & Nishida, T. (2018). Deep recurrent neural network for mobile human activity recognition with high throughput. Artificial Life and Robotics, 23, 173–185. https://doi.org/10.1007/s10015-017-0422-x
  • Shi, J., Zuo, D., & Zhang, Z. (2021). A GAN-based data augmentation method for human activity recognition via the caching ability. Internet Technology Letters, 4(5), 257. https://doi.org/10.1002/itl2.257.
  • Wang, J., Chen, Y., Gu, Y., Xiao, Y., & Pan, H. (2018). SensoryGANS: An effective generative adversarial framework for sensor-based human activity recognition. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE. https://doi.org/10.1109/IJCNN.2018.8489106.
  • Chan, M. H., & Noor, M. H. M. (2021). A unified generative model using generative adversarial network for activity recognition. Journal of Ambient Intelligence and Humanized Computing, 12(7), 8119–8128. https://doi.org/10.1007/s12652-020-02548-0.
  • Li, X., Luo, J., & Younes, R. (2020). ActivityGAN: Generative adversarial networks for data augmentation in sensor-based human activity recognition. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers (pp. 249–254). https://doi.org/10.1145/3410530.3414367.
  • Soleimani, E., & Nazerfard, E. (2021). Cross-subject transfer learning in human activity recognition systems using generative adversarial networks. Neurocomputing, 426, 26–34. https://doi.org/10.1016/j.neucom.2020.10.056.
  • Tan, M., & Le, Q. (2021). Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, 10096–10106. PMLR. https://doi.org/10.48550/arXiv.2104.00298
  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7132–7141). https://doi.org/10.1109/CVPR.2018.00745
  • Kaggle. (2024). Human action recognition dataset. Kaggle. https://www.kaggle.com/datasets/meetnagadia/human-action-recognition-har-dataset

A Comparative Study of Deep Learning Approaches for Human Action Recognition

Year 2025, Volume: 9 Issue: 2, 281 - 289
https://doi.org/10.31127/tuje.1579795

Abstract

Human Action Recognition (HAR) plays a crucial role in understanding and categorizing human activities from visual data, with applications ranging from surveillance, healthcare to human-computer interaction. However, accurately recognizing a diverse range of actions remains challenging due to variations in appearance, occlusions, and complex motion patterns. This study investigates the effectiveness of various deep learning model architectures on HAR performance across a dataset encompassing 15 distinct action classes. Our evaluation examines three primary architectural approaches: baseline EfficientNet models, EfficientNet models augmented with Squeeze-and-Excitation (SE) blocks, and models combining SE blocks with Residual Networks. Our findings demonstrate that incorporating SE blocks consistently enhances classification accuracy across all tested models, underscoring the utility of channel attention mechanisms in refining feature representation for HAR tasks. Notably, the model architecture combining SE blocks with Residual Networks achieved the highest accuracy, increasing performance from 69.68% in baseline EfficientNet to 76.75%, marking a significant improvement. Additionally, alternative models, such as EfficientNet integrated with Support Vector Machines (EfficientNet-SVM) and ZeroShot Learning models, exhibit promising results, highlighting the adaptability and potential of diverse methodological approaches for addressing the complexities of HAR. These findings provide a foundation for future research in optimizing HAR systems, with implications for enhancing robustness and accuracy in action recognition applications.

References

  • Lu, M., Hu, Y., & Lu, X. (2020). Driver action recognition using deformable and dilated faster r-cnn with optimized region proposals. Applied Intelligence, 50(4), 1100–1111. https://doi.org/10.1007/s10489-019-01603-4.
  • Lin, W., Sun, M.-T., Poovandran, R., & Zhang, Z. (2008). Human activity recognition for video surveillance. In Proceedings of the 2008 IEEE International Symposium on Circuits and Systems (ISCAS), 2737–2740. IEEE. https://doi.org/10.1109/ISCAS.2008.4542023.
  • Rodomagoulakis, I., Kardaris, N., Pitsikalis, V., Mavroudi, E., Katsamanis, A., Tsiami, A., & Maragos, P. (2016). Multimodal human action recognition in assistive human-robot interaction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2702–2706. IEEE. https://doi.org/10.1109/ICASSP.2016.7472168
  • Dentamaro, V., Gattulli, V., Impedovo, D., & Manca, F. (2024). Human activity recognition with smartphone-integrated sensors: A survey. Expert Systems with Applications, 123, 143. https://doi.org/10.1016/j.eswa.2024.123143
  • Aydın, V. A. (2024). Comparison of CNN-based methods for yoga pose classification. Turkish Journal of Engineering, 8(1), 65–75. https://doi.org/10.31127/tuje.1275826
  • Gülgün, O. D., & Erol, H. (2020). Classification performance comparisons of deep learning models in pneumonia diagnosis using chest x-ray images. Turkish Journal of Engineering, 4(3), 129–141. https://doi.org/10.31127/tuje.652358
  • Polater, S. N., & Sevli, O. (2024). Deep learning based classification for Alzheimer’s disease detection using MRI images. Turkish Journal of Engineering, 8(4), 729–740. https://doi.org/10.31127/tuje.1434866.
  • Willems, G., Tuytelaars, T., & Van Gool, L. (2008). An efficient dense and scale-invariant spatio-temporal interest point detector. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part II (Vol. 10, pp. 650–663). Springer. https://doi.org/10.1007/978-3-540-88688-4_48.
  • Klaser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In BMVC 2008-19th British Machine Vision Conference (pp. 275–1). British Machine Vision Association. https://doi.org/10.5244/C.22.99
  • Bux, A., Angelov, P., & Habib, Z. (2017). Vision based human activity recognition: A review. In Advances in Computational Intelligence Systems: Contributions Presented at the 16th UK Workshop on Computational Intelligence, September 7–9, 2016, Lancaster, UK (pp. 341–371). Springer. https://doi.org/10.1007/978-3-319-46562-3_23.
  • Charalampous, K., & Gasteratos, A. (2016). Online deep learning method for action recognition. Pattern Analysis and Applications, 19, 337–354. https://doi.org/10.1007/s10044-014-0404-8.
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105. https://doi.org/10.1145/3065386.
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9). https://doi.org/ 10.1109/CVPR.2015.7298594.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.
  • Ronao, C. A., & Cho, S.-B. (2016). Human activity recognition with smartphone sensors using deep learning neural networks. Expert Systems with Applications, 59, 235–244. https://doi.org/10.1016/j.eswa.2016.04.032.
  • Hughes, D., & Correll, N. (2018). Distributed convolutional neural networks for human activity recognition in wearable robotics. In Distributed Autonomous Robotic Systems: The 13th International Symposium (pp. 619–631). Springer. https://doi.org/10.1007/978-3-319-73008-0_43
  • Dong, M., Han, J., He, Y., & Jing, X. (2019). Har-net: Fusing deep representation and hand-crafted features for human activity recognition. In Signal and Information Processing, Networking and Computers: Proceedings of the 5th International Conference on Signal and Information Processing, Networking and Computers (ICSINC) (pp. 32–40). Springer. https://doi.org/10.1007/978-981-13-7123-3_4
  • Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231. https://doi.org/ 10.1109/TPAMI.2012.59
  • Taylor, G. W., Fergus, R., LeCun, Y., & Bregler, C. (2010). Convolutional learning of spatio-temporal features. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part VI (Vol. 11, pp. 140–153). Springer. https://doi.org/10.1007/978-3-642-15567-3_11.
  • Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4489–4497. https://doi.org/10.1109/ICCV.2015.510
  • Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27. https://doi.org/10.48550/arXiv.1406.2199.
  • Lv, M., Xu, W., & Chen, T. (2019). A hybrid deep convolutional and recurrent neural network for complex activity recognition using multimodal sensors. Neurocomputing, 362, 33–40. https://doi.org/10.1016/j.neucom.2019.06.051
  • Ketykó, I., Kovács, F., & Varga, K. Z. (2019). Domain adaptation for sEMG-based gesture recognition with recurrent neural networks. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), 1–7. IEEE. https://doi.org/10.1109/IJCNN.2019.8852018
  • Inoue, M., Inoue, S., & Nishida, T. (2018). Deep recurrent neural network for mobile human activity recognition with high throughput. Artificial Life and Robotics, 23, 173–185. https://doi.org/10.1007/s10015-017-0422-x
  • Shi, J., Zuo, D., & Zhang, Z. (2021). A GAN-based data augmentation method for human activity recognition via the caching ability. Internet Technology Letters, 4(5), 257. https://doi.org/10.1002/itl2.257.
  • Wang, J., Chen, Y., Gu, Y., Xiao, Y., & Pan, H. (2018). SensoryGANS: An effective generative adversarial framework for sensor-based human activity recognition. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE. https://doi.org/10.1109/IJCNN.2018.8489106.
  • Chan, M. H., & Noor, M. H. M. (2021). A unified generative model using generative adversarial network for activity recognition. Journal of Ambient Intelligence and Humanized Computing, 12(7), 8119–8128. https://doi.org/10.1007/s12652-020-02548-0.
  • Li, X., Luo, J., & Younes, R. (2020). ActivityGAN: Generative adversarial networks for data augmentation in sensor-based human activity recognition. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers (pp. 249–254). https://doi.org/10.1145/3410530.3414367.
  • Soleimani, E., & Nazerfard, E. (2021). Cross-subject transfer learning in human activity recognition systems using generative adversarial networks. Neurocomputing, 426, 26–34. https://doi.org/10.1016/j.neucom.2020.10.056.
  • Tan, M., & Le, Q. (2021). Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, 10096–10106. PMLR. https://doi.org/10.48550/arXiv.2104.00298
  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7132–7141). https://doi.org/10.1109/CVPR.2018.00745
  • Kaggle. (2024). Human action recognition dataset. Kaggle. https://www.kaggle.com/datasets/meetnagadia/human-action-recognition-har-dataset
There are 32 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Articles
Authors

Gülsüm Yiğit 0000-0001-7010-169X

Early Pub Date January 19, 2025
Publication Date
Submission Date November 5, 2024
Acceptance Date December 21, 2024
Published in Issue Year 2025 Volume: 9 Issue: 2

Cite

APA Yiğit, G. (n.d.). A Comparative Study of Deep Learning Approaches for Human Action Recognition. Turkish Journal of Engineering, 9(2), 281-289. https://doi.org/10.31127/tuje.1579795
AMA Yiğit G. A Comparative Study of Deep Learning Approaches for Human Action Recognition. TUJE. 9(2):281-289. doi:10.31127/tuje.1579795
Chicago Yiğit, Gülsüm. “A Comparative Study of Deep Learning Approaches for Human Action Recognition”. Turkish Journal of Engineering 9, no. 2 n.d.: 281-89. https://doi.org/10.31127/tuje.1579795.
EndNote Yiğit G A Comparative Study of Deep Learning Approaches for Human Action Recognition. Turkish Journal of Engineering 9 2 281–289.
IEEE G. Yiğit, “A Comparative Study of Deep Learning Approaches for Human Action Recognition”, TUJE, vol. 9, no. 2, pp. 281–289, doi: 10.31127/tuje.1579795.
ISNAD Yiğit, Gülsüm. “A Comparative Study of Deep Learning Approaches for Human Action Recognition”. Turkish Journal of Engineering 9/2 (n.d.), 281-289. https://doi.org/10.31127/tuje.1579795.
JAMA Yiğit G. A Comparative Study of Deep Learning Approaches for Human Action Recognition. TUJE.;9:281–289.
MLA Yiğit, Gülsüm. “A Comparative Study of Deep Learning Approaches for Human Action Recognition”. Turkish Journal of Engineering, vol. 9, no. 2, pp. 281-9, doi:10.31127/tuje.1579795.
Vancouver Yiğit G. A Comparative Study of Deep Learning Approaches for Human Action Recognition. TUJE. 9(2):281-9.
Flag Counter