Comparison of methods for determining activity from physical movements Fiziksel hareketlerden aktivite belirleme yöntemlerinin karşılaştırılması

In this study, the methods which can detect the basic physical movements of a person (downward, upward, sitting, stop, walking, running) from inertial sensor (IMU) data are evaluated. The performances of classical (ANN, SVM, k-NN) and current approaches (Convolutional Neural Networks-ESA) to map IMU data to activity classes were compared. A three-stage study was carried out for this aim: 1) data acquisition; 2) creating training/test sets; 3) construction and classification of network architectures. At the stage of data acquisition, to obtain 6 different physical movements from 10 different people, the accelerometer sensor is placed on the persons. Repetitive movements of persons were recorded. At the second stage, the recorded long-term accelerometer data is divided into packages in the form of short-term windows. The training set of classical approaches was constructed by features extracting from each packet data containing one-dimensional acceleration information. The transformation of one-dimensional signals to a two-dimensional image matrix for the training set of the deep learning-based approaches was performed. In the third stage, ANN, SVM, k-NN and CNN architectures were constructed, and classification process was carried out. As a result of the experimental studies, it was found that the accuracy of IMU-activity mapping was 99% with the ANN method and 95% with the CNN method.


INTRODUCTION
Wearable technology covers more space in our lives every day.Researchers has been directed to new studies such as recognizing the daily activities of individuals, health and defense technologies.Intelligent systems are among the most important factors that increase the popularity in the interaction of individuals with the computer.Also, recognize and interpret of the human behaviors can be correctly analyzed.The basis of these intelligent systems comes from the different data types.
While these data are obtained, many sensors such as camera, gyroscope and accelerometer are used.These data obtained; different applications such as health, security, transportation, monitoring are developed [1].Among these sensors, visual-based systems are generally used in areas such as a home or laboratory whose environmental parameters are set.The use of vision-based systems can also interfere with the privacy of individuals and this may cause the individual to behave differently than normal.Simultaneous use of multiple cameras is a second challenge in order to take 3-D images.This means that each camera is calibrated separately.Wireless sensors is not require any camera viewpoint, and it produces high-precision data.
Therefore, sensor technology has a intensive use.These flexible sensors are more advantageous than the visual systems by recording data related to the individual's physical condition such as position change, movement direction and speed.Activity detection studies have been developed with the help of the inertial sensor with less constraint than the visual systems [2].Some studies have compared the video cameras with inertial sensor data [3].Studies focused on the detection of various human activity, especially daily life monitoring for pedestrian status with inertial sensors [4], personal biometric signature and navigation [5], detection of fall conditions [6], rehabilitation and physical therapy [7], detect the type of transportation [8].When the literature is examined, it is seen that different machine learning approaches are used on the examination of physical conditions of individuals.With the accelerometer sensors attached to the body of the individuals, the actions such as the person's standing, walking or running are determined and the performance is achieved over 90% [9].On the other hand, the data obtained from the accelerometers and gyroscopes included in the smart phones were classified by different machine learning methods [10].Activity recognition performance was improved by using a combination of gyroscope data and an accelerometer with a Kalman filter [11].However, in this study using kalman filter, the cost of calculation is high.Automatic detection of the swimming stages with the accelerometer sensors installed on the arms and legs of the people was performed [12].In addition to smart phones, there are several studies on the smart watch accelerometer data.The performance of gyroscope data with the accelerometer sensor was evaluated in the perception of the gymnastic activities with the Galaxy Gear smart watch [13].In the study using both smartphone and smart watch, 9 different activities were classified [14].In the activities related to wrist movements, it was observed that smart watches obtained high activity detection rates [15].When we look at the general studies, both smart phones/watches and sensors placed in the body are used together for classification of human movements.In this study, the comparison of artificial learning methods that can automatically detect the six different activities is performed.These activities include downstairs, upstairs, sitting, standing, walking and running.The wearable sensors are attached to the body to collect the IMU data for these activities.As is known, the IMU sensor incorporates accelerometers, gyroscopes and magnetometer sensors.Because it is aimed to detect activities with less energy, memory and processing power, only the accelerometer sensor data is used.Accordingly, when performing any activity, the acceleration values x, y, z are recorded on the three axes.In order to map the recorded accelerometer data to activity groups, conventional and deep learning-based approaches were compared.The proposed method was carried out at three-stage study: 1) data acquisition; 2) data preprocessing (constructing training / test sets); 3) construction and classification of network architectures.In the next sections of the study, data collection, feature extraction and classification stages are discussed in detail.In the last chapter, planning for future studies is mentioned.

METHODOLOGY
In this study, sensor-based action recognition methodology has been developed.The main stpes of the developed methodology are shown in Figure 1.As seen from Figure 1, conventional approach contains two main step: 1) Feature extraction, 2) Classificaiton.Deep learning-based approaches contains two main step: 1) Image transform, 2) Classificaiton.

Data Acquisition
Inertial sensors are seamless, compact and lightweight, making them a popular choice for applications such as motion tracking, human-computer interface and animation.In this study which aims to determine human activities, MTw inertial sensor of the commercially available Xsens company was used.This sensor can provide 3-axis gyroscope, 3-axis accelerometers and 3axis magnetometer data.With a LiPo battery, the MTw can be operated with a flexible straps for up to 6 hours.Data from the MTw can be transferred to the computer via a wireless adapter.To record the data transferred to the computer, Xsens MT Manager 4.8 software was used.Figure 2 shows the interface of the MTw sensor and the data acquisition software.Three-axis accelerometer data from MTw was used to build the data set used in the study.MTw sensors are set to receive data at a sampling frequency of 50Hz.Thus, 50 Hz × 3 (axes) = 150 acceleration information is recorded per second.In order to obtain the data, 10 volunteers placed on the MTv sensor were asked to repeat 6 different activities several times (to obtain a reliable kinematics).The sensor is placed at the chest height of the subjects in order to be the center of gravity of the body.Experiments were performed on a group of volunteers between 20 and 40 years of age.Activity periods vary from individual to individual, with a duration of 30 seconds to 1 minute.The number of activities recorded by classes is shown in Table 1.It is considered that it is sufficient to look at a 2.56 second period of activity record in order to understand which class belongs to any activity.Therefore, each activity record was analyzed as packets of 2.56 seconds.
Considering the fact that 3-axis acceleration data is sent at 50Hz per second using IMU sensor, the acceleration information of 50×2.56 = 128, 3 axes is included in each package.When all activity data of the classes are packaged, it is seen that the total number of packages is 138930.
Features of each packet data are extracted to classify with conventional approaches.Thus, the high dimensional input vector can be represented in a lower dimensional space.Mean, standard deviation and greatest eigenvalue information were calculated as the features of the acceleration information.These three features are calculated for each axis.Thus, a packet information containing 128×3 = 384 acceleration features is reduced to 9 features.The feature variables and explanations used in Figure 3 are expressed in Table 2.As is known, convolutional neural networks can map a two-dimensional image given to the input layer to a class in the output layer [17].It has attracted the attention of many researchers due to its ability to classify images with high accuracy without the need for feature extraction.
The three-axis acceleration information collected for the detection of the activity is a one-dimensional signal.One-dimensional (1×384) is converted to a twodimensional image matrix (16×24) to give this signal as input to CNN. Figure 4 shows the image transforms of the acceleration data in 20 packages selected randomly.

Classification Techniques
The classification techniques used in this study are: knearest neighbor algorithm (k-NN), support vector machines (SVM), and artificial neural networks (ANN).
 k-NN is a supervised classification technique which can be considered as direct classification method.This method does not require a learning process, it only requires the storage of all data.To classify a new observation, the k-NN algorithm uses the similarity (distance) principle between a new observation to be classified by the training set.The distance of the neighbors of an observation is calculated using a distance calculation called the similarity function, such as the Euclidean distance [18], [19].
 SVM is a classifier derived from statistical learning theory.In standard formulations, SVM is a linear classifier.Non-linear SVM classification can be obtained by extending the linear SVM model with kernel methods.This classifier has been successfully applied to solve problems in many areas such as object, voice, fingerprint and handwriting recognition [20], [21].
 ANN, because of its features such as the ability to solve the problems in different subjects, has been used in many areas in our day.Artificial neural networks are typically represented by a network diagram of nodes connected by directional links.Nodes are arranged in layers, and the structure of the most commonly used neural network consists of three layers: input, hidden and output layer [22].
The convolutional neural network architecture used in this study is as shown in Figure 5.The architecture shown in Figure 5 consists of convolution, relu, maxpooling and fully connected neural network layers.In order to update the parameters of network, Adam approach was preferred in the back propagation method [23].

EXPERIMENTAL RESULTS
In this study, the classification of human activities is carried out by using the accelerometer data.The patterns generated from MTw sensor data were tested by k-NN, SVM and ANN methods and their performances were compared.A 10-fold crossvalidation test method was used for each algorithm.Specifically, each data set is divided into 10 parts; 9 of them were used to train the classifier and the rest of data set were used to evaluate the effectiveness of the classifiers.In this study, the confusion matrixes obtained from different methods are shown in Table 4.

Table 4. The confusion matrix of the ANN, k-NN and SVM approaches
When the performances of different classifiers are examined, it is seen that they have high accuracy in all the activities by using only accelerometer sensor in all three methods.Although activities contains similar patterns, they are well defined by a single three-axis accelerometer.It can be stated that both the upstairs and downstairs activities are similar to each other and they are mixed with other activities.Since both activities have no forward/backward (x axis) movement, the mixing ratio of activities increases.The ANN method, which achieves a 99% accuracy rate on the average (from Table 5), has determined the running activity with 100% accuracy.However, this success has not been achieved in other activities.
Upstairs activity is detected as 99.2%.Since the accelerometer data changes more, classification performance of running activity has higher success.Confusion matrix in the k-NN method, which is not a training stage given in Table 4, sitting and standing activities give higher accuracy rates compared to other classes.The activities with the lowest values in the k-NN method are the downstairs and upstairs activities.
In the SVM method which has the confusion matrix in Table 4, similar to the k-NN method, sitting and standing activities give higher accuracy values than other activities.Downstairs and upstairs activities have lower accuracy values.

ANN a k-NN a SVM
Output Class Table 5 shows that the success rate of ANN method is more successful than other methods with 99.8%.Accuracy obtained with twelve uses of k-NN and SVM classifier models in total is given in Table 6.Combinations of sampling rate with window sizes were performed.50Hz and 100Hz values were selected as the sampling rate.The window size (2, 2.56, 3 seconds) was selected and their accuracy was compared.ANN, K-NN and SVM results of the accuracy rates obtained from these combinations are given in Table 7.
The best sampling rate and window size combinations were observed to be 50 Hz with 128 samples per window.In general, the highest accuracy is 99.8% obtained at 50 Hz with a 2.56 second window.

RESULTS AND FUTURE STUDIES
In this study, the performances of conventional and deeplearning based approaches were compared in order to automatically detect six different basic physical activities.10 different individuals were asked to perform 6 different activities and the acceleration data of MTw (inertial) sensors were recorded while activities were performed.Afterwards, the recorded acceleration data were classified by using ANN, k-NN, SVM and CNN methods.When the results are examined, it is seen that ANN approach provides optimum matching accuracy.The reason for this was interpreted as the selection of appropriate features to obtain an appropriate representation of the low dimensional data space at low size.
In future studies, it is planned to increase the number of sensors placed on individuals performing activities and to perform some studies on the detection and treatment of diseases based on human activities in the field of health.In addition to the accelerometer sensor, gyroscope and magnetometer are also planned to be used.

2. 2 .
Obtaining of Train/Test DatasetsEach activity, which can last up to one minute, includes three-axis accelerometer data.In order for this data to be classified by conventional and deep leraning-based methods, three different processes are performed: 1) packaging; 2) feature extraction; 3) image transform.The construction of the training set is summarized in Figure3.

Figure 3 .
Figure 3. Obtaining training cluster data

A
total of 138930 packet data for six different activity classes were recorded as images.90% of these images were used for training (125040) and 10% (13890) for testing.The learning rate was chosen as α = 0.001.The accuracy and loss values obtained in the training process lasting about 20 minutes are shown in Figure6.Accordingly, it was observed that the accuracy of training and test was approximately 95% after 1000 iterations.The training process was terminated manually at the end of the observation of all data four times.

Figure 6 .
Figure 6.Training results with CNN approach (a) Accuracy, (b) Lost

Table 1 .
Number of activity recorded by classes downstairs 72 sitting 48 walking 72 upstairs 48 standing 48 running 72

Table 2 .
Features and explanations

Table 5 :
Comparison of classification achievements of algorithms for human activities

Table 6 :
Accuracy rates for each model

Table 7 :
The accuracy of sampling rate window size combinations