Human activity recognition and classification using of convolutional neural networks and recurrent neural networks

version of this paper was presented at 9th International Conference on Advanced Technologies (ICAT'20), 10-12 August 2020, Istanbul, Turkey with the title of “A Comparison of Convolutional and Recurrent Network Algorithms On Human Activity Recognition”.


Introduction
Thanks to recent advances in sensing technologies, recognition of human activity has taken high attention. There are many applications that depend on the recognition of human activity in fields such as sports, fitness tracking, healthcare and much more [1][2][3][4]. The type of human activity could be recognized in various ways such as using cameras or sensors. In the first one, detection is made according to the video or image which is captured from the camera, second one is from the data taken from sensors built in smartphones [5]. There are different sensor types like accelerometer, sound, luminance and other sensors [6][7][8]. The extraction of an activity's features is rather essential as it grabs the details and information and classifies them into the right type of activities after applying some machine learning or deep learning algorithms. The more prominent features extracted from the dataset, the higher accuracy when classifying activities [9].
The extensive use of sensors like accelerometers made it easy to identify and predict the activity which the person is doing. Not long ago, real-time processing became feasible, which enabled the computation of algorithms much faster without the need of transferring data to a more powerful server [10].
Human activity classification possesses a lot of potential in healthcare applications such as assisting doctors to make the right decision through keeping track of the patients and monitoring their behaviors in daily life in order to deliver a more precise diagnosis [11].
In this paper, we utilized the open source WIreless Sensor Data Mining (WISDM) dataset [12]. The dataset contains six activities with their values for the three X, Y and Z axes. The activities are divided into six categories that are standing, sitting, walking, jogging, ascending and descending stairs as shown in Figure 1. The three axes are as shown in Figure  2 [13]. Some neural networks in deep learning models have the ability of classifying and extracting features such as Convolutional Neural Networks (CNNs) which is a type of Deep Neural Network (DNN) [14], also Recurrent Neural Network (RNN) that has Long Short-Term Memory (LSTM) cell [8]. In this paper, human activity recognition was analyzed by performing two deep learning models that are CNN and RNN-LSTM. The aim of this paper is to find and use the best model in terms of accuracy by enhancing the recognition of daily activities to improve the healthcare case of people and monitor their actions.

Conventional Neural Network (CNN)
There are some studies that used CNN to recognize human activity. In [15], the study was done by gathering the values from the sensors in smartphones, and the accuracy was 93.27%. In [16], depending on the raw data of the sensor, the accuracy was 91.9%.

Recurrent Neural Network (RNN-LSTM)
The study in [17] focused on RNN-LSTM model which had an accuracy of 97%. The data of human activity was gathered from a gyroscope and accelerometer in a smartphone. Another study [18] depended on camera data to classify patient human activity, the result was 92.49%.

Method
Our methods were trained on WISDM dataset which is a raw file that contains six different human activities as mentioned before, the values were calculated after 36 persons performed those activities. The activities have three values for each according to the axes which are (X, Y and Z). The epoch number was settled up to 15 and Python with version of 3.7 was used in both; CNN and RNN-LSTM models.

Conventional Neural Network (CNN)
CNN model in our study consists of two convolutional modules, as shown in Figure 3 which illustrates the architecture of CNN.
The data was divided into 80% for training and 20% for testing, the epoch number is 15. During the training process, the highest accuracy value was at epoch 15 and it started to lower after that. The epoch of 15 will be adopted for the final result.

Recurrent Neural Network (RNN-LSTM)
Our (RNN-LSTM) model which is illustrated in Figure  4, has two layers. The model has a fully-connected layers scheme. The epoch number was settled as 15.  Figure 5 shows the six activities' categories. Some libraries were used in Python such as TensorFlow and plot to draw the graphs. Three-axes graphs were shown for each activity, one of them is jogging activity as it is illustrated in Figure 6. According to Figures (7-8), it may be seen that the accuracy is 84.31% and the loss is 51.50% after applying CNN algorithm on the dataset and training the dataset for 15 epochs.

Recurrent Neural Network (RNN-LSTM)
By using it, the accuracy was 91.77% and loss was 40.89% as shown in Figures (9-10) for the 15 epochs.

Conventional Neural Network (CNN)
By implementing CNN algorithm on the dataset, during the training process for every epoch, we noticed that the highest value of accuracy was at epoch 15. Therefore, we adopted this number of epochs. The whole accuracy of the testing dataset was 81%. In terms of the confusion matrix which is shown in Figure 11, it is obvious that there are some confusions happening among some activities in the prediction and true actions.

Recurrent Neural Network (RNN-LSTM)
When applying the RNN-LSTM model on the database for 15 epochs, the final test accuracy was 91%. As shown in Figure 12, the confusion matrix illustrates six activities in terms of prediction and true labels. There is a confusion between upstairs in the true label, and downstairs in the predicted label, that happened during the process, but it has a low value.

Conclusion
In this work, we made a comparison between two different models in deep learning by using the WISDM dataset. We applied Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN-LSTM) models. We reached the result that the RNN-LSTM outperforms CNN as it has higher accuracy by comparing it with CNN, as their accuracy values are 91% and 81% respectively.
As a future work, a larger dataset may be used with more advanced deep learning methodologies to get more accurate and more reliable results to classify the activities with less confusion.