Analyzing the Relationship between Student’s Assignment Submission Behaviors and Course Achievement through Process Mining Analysis

Bu calismada universite ogrencilerinin odev gonderme davranislari ile ders basarilari arasindaki iliskinin incelenmesi amaclanmistir. Bu amacla, bir devlet universitesinde Isletim Sistemleri ve Uygulamalari dersine kayitli 75 ogrencinin Moodle ogrenme yonetim sistemi uzerinden dersin dorduncu haftasinda verilen bir odevi gonderme davranislari analiz edilmistir. Ayni zamanda farkli odev gonderme davranisi sergileyen ogrenciler donem sonu notlari acisindan da analiz edilmistir. Analiz asamasinda, ogrencilerin odev gonderirken izledikleri adimlar sirali olarak belirlenmis ve kumeleme analizi yardimi ile benzer oruntu sergileyen ogrenciler gruplara ayrilmistir. Ayni zamanda surec madenciligi analizi kullanilarak farkli gruplardaki ogrencilerin odev gonderme surecleri detayli olarak analiz edilmistir. Yapilan analizler, ogrencilerin odev gonderme davranislarina gore uc farkli gruba ayrilabilecegini gostermistir. Ders basarisi acisindan bakildiginda ise odevi gonderen ogrencilerin onemli bir bolumunun dersten basarili oldugu gozlemlenirken, odev gonderiminde bulunmayan ogrencilerin onemli bir bolumunun dersten basarisiz oldugu gorulmustur. Elde edilen bulgular, dersten basarisiz olma ihtimali yuksek olan ogrencilerin erken haftalarda belirlenmesinde ve bu ogrencilere yonelik olasi mudahalelerin tasarlanmasinda yol gosterici olacaktir.


Introduction
Traces of log records that derived from learner interactions in online learning environments guide researchers to understand students' learning processes. The data obtained in this way are analyzed by machine learning methods, and used for different purposes such as predicting students' academic achievement, determining the strategies they follow while preparing for the course. The interest of researchers in these studies, which falls under the titles of learning analytics, educational data mining, and artificial intelligence in education, has been increasing in recent years. In relevant studies, significant tangible outcomes have been produced for both online learning and other learning models supported by online learning. Following a similar approach, this study aims to model the assignment submission behaviors of students by putting to use the clustering and process mining analysis, and to examine the relationship between the revealed profiles and the course achievement of the students.

Online Learning Experiences in terms of Blended Learning
Blended learning environments which are supported by online learning activities promote students' learning processes significantly. Supporting face-to-face lessons at higher education level through online learning environments provides more flexible learning opportunities for students and instructors compared to face-to-face learning environments (Graham, 2006;Symth, Houhton, Cooney, & Casey, 2012). The things that count at this point is to designate how effective learning experiences emerge in the blended learning process by considering individual differences and to use the obtained results in the terms of learning design.
It is reflected in the outcomes of studies that the individual characteristics and online learning experiences of students have a significantly impact on course achievement and learning outcomes. Some studies in the related literature; shows that students' individual characteristics such as self-efficacy, self-regulation skills, academic procrastination, technology literacy, locus of control have a significant role on the effectiveness of blended learning (Broadbent, 2017;Prasad, Maag, Redestowics, & Hoe, 2018;Rasheed, Kamsin, & Abdullah, 2020). In this regard, it is emphasized that the fulfilling of the learning design based upon the individual characteristics of the students increases the interaction and flexibility in blended learning and makes the learning process more efficient (Porter, Graham, Spring, & Welch, 2014). After all, to make blended learning effective, students' online learning experiences need to be deeply understood based on behavioral data (Kokoç & Altun, 2019).
Analysis of learning analytics data is substantial in terms of understanding the online learning experiences of the students in the online learning process and predicting the students' online learning performance. In this regard, there have been studies in the literature that aim predicting learning outcomes based on log data collected from online learning environments. Zacharis (2015), as part of a programming language course conducted through blended learning, it is examined whether students' interaction logs predicted their end-of-term achievements or not. It is determined that 14 out of the 29 variables produced based on interactions on the Moodle learning management system are significant in predicting students' final grades; and four interaction variables (reading and sending messages, contributing to content creation, efforts on exam and number of files displayed) explained 52% of the variance in the final grades. In a study conducted by Jo, Park, Kim, and Song (2014), it is investigated whether students' online behavior influence predicting learning performance in blended learning environments. Within the scope of the study, data collected from two different blended learning lessons supported by learning management system are processed. The regression model that emerged as a result of the analysis shows that the interaction data reflecting the students' online learning behavior can markedly predict 70% of the variance in learning performance. Moreover, in another study, an early warning system is developed to estimate the academic performance of the at-risk students by using learning analytics data in a Computer Hardware course conducted through blended learning (Akçapınar, Altun, & Aşkar, 2019). As a result of the study in which different classification algorithms are put to work; It is revealed that the prediction models created based on the interaction data of the students can predict the students who are going to fail in the third week of the course. In the study carried out by Lu et al. (2018), unlike other studies, it is determined that student data in both face to face and online learning environment played a statistically significant role in predicting student performance in the context of blended learning. The results of the background studies prove that interaction data obtained from students' online learning experiences are effective in predicting student performance in the context of blended learning.
In the scope of blended learning, students' online learning experiences are correlate with the learning resources, activities, and tasks offered to them during the learning process. Kokoç and Altun (2019) found that students' online interaction data are grouped under six factors and these factors reflect online learning experiences of students. Online assignments are one of the learning activities that play a crucial role in the occurrence of effective online learning experiences. Therefore, in the scope of learning analytics, it will be useful to address students' interactions with online assignments and learning tasks, to be well understood their learning experiences.

Online Assignments and Learning Analytics Studies
Assignment is a frequently used learning activity to test the effectiveness of the learning process. Online learning by definition, students are expected to be more responsible for their own learning processes (Dabbagh & Kitsantas, 2005). Therefore, students who fulfill the responsibility of online learning tasks and complete their learning tasks on regular basis are defined as successful ones (You, 2016). It is known that students who complete and submit at least one online assignment are more likely to accomplish distance learning courses (Lim, 2016). At this point, excepting the number of assignment completions, the question how students interact with online assigment and learning tasks pages arises.
The data of assignment interactions generally correspond to the number of times students upload their assignment files on learning tasks, the time they spend on assignment pages, and the frequency and duration of exams provided by instructors. It is seen in the literature on learning analytics that there are leastwise studies on the relationship of students' assignment interactions and learning performances. MacFadyen and Dawson (2010) investigated the relationship between students' academic performance and interaction data through logistic regression analysis. It is concluded that the number of completed online assignments is one of the important variables that predict students' end-of-term achievements. In a study carried out in the scope of a blended learning process supported by Moodle learning management system, it was found that the number of students' assignment submission predicted students' final grades significantly (Zacharis, 2015). However, in some studies that aim to predict student's success in blended learning environments, it is determined that the marks students get from weekly online assignments are a significant predictor of learning performance (Huang & Fang, 2013;Lu et al., 2018). Another study purposes to estimate the academic performance of students based on their efforts in performing their learning tasks (Kovanovic et al., 2015). Findings obtained from different regression models are shown that students' frequency of viewing assignments pages and the time they spend on assignments are important in predicting their academic performance. Therefore, the results of the relevant research indicate that the interactions of students regarding learning tasks and assignments play a significant role in gaining efficient online learning experiences in blended learning environments.
Some of the studies that show the importance of students' assignment submitting, regarding blended learning behaviors, are related to academic procrastination. Academic procrastination is associated with variables such as whether students submit their assignments and whether they submit them late or not, and the simultaneously effort on assignment as an individual or group. Although a study conducted in the context of blended learning found that there is no significant relationship between students' tendency to postpone their academic tasks and time to submit activities (Bayrak, 2018), it is emphasized that assignment submitting behaviors are substantial indicators of academic procrastination and affect course success. One of the studies on this topic is aimed at modeling together with learning performances by examining students' interaction and procrastination behaviors as a part of blended learning (Cerezo, Esteban, Sánchez-Santillán, & Núñez, 2017). In the relevant study, the number of days that students wait to check their weekly assignments is considered as the interaction variable associated with procrastination. In the study in which the Association Rule Mining analysis is put to work, the obtained rules set forth with high confidence indicate that the students who submit late assignments will underperform. In the study carried out by You (2015You ( , 2016, it is concluded that the assignments are submitted late or not has a significant negative effect on the students' grades and exam scores. In the related study, assignment submitting behaviors are considered as an indicator of academic procrastination. In other studies of similar nature, it is concluded that the learning performance will be higher as the time taken until the students upload their assignments shorten (Paule-Ruiz, Riestra-Gonzalez, Sánchez-Santillan, & Pérez-Pérez, 2015). On the other hand, when the students perform their learning tasks late and the time it takes to submit the assignment longer, their learning performance will be lower (Cerezo, Sánchez-Santillán, Paule-Ruiz, & Núñez, 2016).
Relevant studies in the literature show that the interaction of students with online assignments affects learning outcomes regarding blended learning. In addition, it is recommended that students' assignment submitting behaviors are handled within the framework of learning analytics and examined in detail (Bayrak, 2018;Cerezo et al., 2017;You, 2016). When the relevant literature is examined; It is determined that there is a very limited number of studies focusing on online assignment submitting process regarding blended learning and modeling assignment submitting behaviors in the scope of authentic learning process. Therefore, in this study, it is purposed to investigate the relationship between the students' assignment submission behaviors and course achievement in the scope of blended learning.
For this purpose, students' assignment submission logs in Moodle environment are converted to time series format by applying some data transformation algorithms, and then students who followed a similar assignment submitting pattern are grouped using cluster analysis. The assignment submission processes and the academic performances of the students in different clusters are also analyzed. Within the scope of the research, answers to the three questions below are sought.
1. Can students exhibiting similar assignment submission behavior be determined by analyzing log records? 2. What kind of differences are there among students who have different assignment submission behavior? 3. Is there a significant relationship between assignment submission behavior and students' course achievement?
It is hoped that the results of the present study would provide insight into understanding students' assignment submission process and early identification of students at risk of failing the course. In addition, it is aimed to figure out useful suggestions for online learning design in the scope of blended learning. Analyzing assignment submission behaviors based on learning analytics and modeling the related behaviors using the process mining technique is what makes the study different from the others.

Methodology
This study is an educational data mining study in which interaction data based on students' assignmentsubmission behaviors are modeled and associated with course achievement. The steps that are followed and the data analysis process are explained in detail.

Participants and Data Collection Process
75 students enrolled in the Operating Systems course, studying at the Department of Computer Education and Instructional Technologies at a public university participated in the study. In addition to face-to-face lessons, the Moodle online learning environment is actively used as part of the lesson. The activities of students in Moodle environment can be summarized as, following the course resources, participating in the discussions, doing assignments. The role of the Moodle system in the course is explained to the students at the beginning of the semester and it is stated that their assignment on this system would be 25% part of their final score. The assignments consist of open-ended questions related to the subject that will be taught that week. The questions are prepared by the instructor to reflect the students' knowledge level on the subject and enable them to analyze and report the information they compiled from different sources. Assignments are uploaded to the system within 1-2 days following the course and students are given 5-6 days to complete the assignments. The starting time of the lesson is set as the deadline for the assignment of the last week. Students are not allowed to send late. The assignments are evaluated by the instructor and the scores of the students are announced on the system. During the semester, 10 assignments with different properties are given to the students, and the data related to the assignment given to the students in the 4 th week are analyzed. This assignment is chosen because of the fact being the first comprehensive assignment of the period.

Data Pre-Processing
The data of six students who did not take the final exam are excluded from the analysis. Eight students who did not have any log records related to the assignment but entered the final exam are included in the study. A total of 2928 line log data for 69 students are analyzed. All activities that students can perform regarding an assignment are presented in Table 1. The log sequence for a student can include all these activities, or only one or several. Also, each case can take place more than once in a log sequence. Among the examined records, the shortest log sequence contains only one record, while the longest log sequence consists of 268 records. While an average log consists of 48 records, the median value is 41. An example log sequence consisting of 14 records of a student can be as in the following;

Assignment viewed -> Attempt started -> Question viewed -> Question viewed -> Question viewed -> Question viewed -> Question viewed -> Question viewed -> Assignment submitted -> Assignment viewed -> Question reviewed-> Question reviewed-> Question reviewed-> Question reviewed.
During the data pre-processing, Moodle log records are processed and the activities carried out by each student regarding the assignment are recorded sequentially in the analysis file. In this process, transitions less than three seconds between them are eliminated to standardize the records. Table 1. Activities that the students can perform in the assignment submission process Activity Description Assignment viewed The student viewed the assignment module, saw the assignment description, but did not open the questions. Attempt started This is only the case when the student views the assignment for the first time, this does not happen again on subsequent visits.

Question viewed
The student's displaying each question in the assignment is logged in this way. Displaying the question also means recording the text in the answer field. Assignment submitted This happens when the student completes the assignment. The student can submit the assignment once and, then cannot change the answers.

Question reviewed
If the student displays the assignment after the deadline, it will be labeled as a review. At this stage, the student can view the answer s/he gave or see the grade if the assignment is graded.
Log activities for all students are presented visually in Figure 1. As can be seen, students followed different patterns in the process of submitting assignments. Some students spent a lot more time in the process of submitting assignment, whereas others seem to have completed this process quickly. Likewise, some students checked their responses after submission, while others did not reopen the assignment after submission. It is seen that some students did not view the assignment and answer the questions, did not submit the assignment.

Data Analysis
Data analysis is carried out with the R programming language (R Core Team, 2017). Visualization and analysis of sequential data are performed through the package TraMineR (Gabadinho, Ritschard, Müller, & Studer, 2011). Ward distance criterion is used to group similar students. Ward distances are visualized as dendrogram to decide the number of clusters. The assignment submission processes of the students in different clusters are analyzed by process mining analysis. One-way analysis of variance (ANOVA) analysis is conducted to examine whether there are any differences in terms of academic performance of students in different clusters.

Cluster Analysis
According to the cluster analysis results presented in Figure 2, it is seen that the students can be divided into two, three or five groups according to their assignment submission data. When Ward distance measurement is taken into account, it is decided that three clusters are suitable for this data set.

Figure 2. Results of cluster analysis
Assignment submission behaviors of students in each cluster are presented visually in Figure 3. According to this; while the students who submit their assignment are grouped in Cluster 1 or Cluster 2, it is seen that the Cluster 3 consists of students who have never viewed the assignment or have viewed it but did not submit it. When the submission behaviors of students in Cluster 1 and Cluster 2 are analyzed, it is seen that the students in Cluster 1 have more visits between questions and string the submission out by displaying the assignment more than once instead of completing the assignment in a single session. Students in Cluster 2 completed the assignment in fewer steps, but an important part of it completed the assignment in a single session. When the review behaviors after submitting the assignment are examined, it is seen that the students in both groups checked the grading done by the instructor by viewing the assignment after the submission is completed and looked at the feedback, if any.

Modelling the Process of Students' Assignment Submission
The assignment consists of sub questions and each question is on a different page. Each page view of the assignment module is recorded as Assignment viewed. The student's displaying of the assignment does not mean opening the assignment. Therefore, a student who displayed the assignment is supposed to start submitting first of all to see the questions. Then, s/he can navigate between the questions. This step is recorded as Question viewed. As long as the assignment deadline is not over, the student can leave this stage and continue at any time. The answers written are automatically recorded during the navigation between the questions, so there is not an extra saving event. After the student completes the assignment, s/he should submit the assignment before the deadline. This step is recorded as Assignment submitted. After the submission, a student can display the assignment again at any time and review the answers given to the questions, after the assignment is scored by the instructor, a student can see the score s/he has got from each question and read the feedback that the instructor has written for each question. After the assignment is submitted, viewing each question is recorded as Question reviewed. Attempt started and Assignment submitted steps, which are the steps of the process, are one-time situations for each student. The steps namely Assignment viewed, Question viewed and Question reviewed can be taken place for several times.
The assignment submission process of the students in the Cluster 1 and Cluster 2 are examined through process mining analysis and the results are presented in Figure 4. When results are analyzed, two patterns come to the forefront; one related to the submission process and one related to the review process. The pattern regarding the assignment submission process as in the following: Assignment viewed -> Attempt started -> Question viewed (again) -> Assignment submitted. The pattern regarding the review process: Assignment viewed -> Question reviewed. In addition to these two patterns, another pattern emerging in the first cluster is Assignment viewed -> Question viewed (again). This situation can be interpreted as that the students in the first cluster submit their assignment in more than one session, and the students in the second cluster submit at once.

Examination of the Relationship Between Submission of Assignment and Course Achievement
Descriptive statistics on the academic performance of students in each cluster are presented in Table 2. The academic performance of the students at the end of the semester is determined by considering the midterm, a final exam and the grades they have got from the assignments submitted during the term. One-way ANOVA analysis was carried out to examine whether students in different clusters differ in terms of their academic performance at the end of the semester. ANOVA results are shown in Table 3. Normality was tested with Kruskal-Wallis analysis and the equality of variances was tested with Levene analysis and it is seen that the assumptions are met. When the results presented in Table 3 are analyzed, it is seen that there is a significant difference between the clusters in terms of average course success [F(2.66) = 17.27, p < 0.001, effect size = 0.343]. Post-hoc analyzes were carried out using the Tukey test to determine which groups are different from each other's. According to the Tukey test results, it is observed that there is no significant difference between Cluster 1 and Cluster 2 in terms of average course success (p> 0.1), on the other hand, there is a significant difference between Cluster 1 and Cluster 3 (p <0.001) and between Cluster 2 and Cluster 3 (p <0.001). In other words, students in Cluster 1 got a score with an average of 23.7 points higher than students in Cluster 3; Students in Cluster 2 got score with an average of 18.4 points higher than students in Cluster 3. Students in Cluster 1 got a score with an average of 5.3 points higher than students in Cluster 2, but this difference is not considered as statistically significant.

Discussion and Conclusion
In this study, cluster analysis and process mining analysis are used to analyze the assignment submission behavior of the students in the Moodle learning management system. Assignment submission behaviors are obtained based on students' interaction data in an authentic learning context. Students are profiled based upon their assignment submissions behaviors, and submission behaviors related to explored profiles are modeled with process mining. In addition, it is analyzed whether there is a difference between the course successes of the students who exhibit different assignment submission behaviors.
As a conclusion of the study, it is found that students are clustered in three groups according to their assignment submission behavior. Two of these groups consisted of students who completed the assignment, while one group consisted of entire students who did not submit the assignment. The process mining analysis enabled to compare the two groups that submit assignments in more detail. Accordingly, it is observed that there is a similarity between the students in Cluster 1 and Cluster 2 to submit their assignments, but there are also some differences. Unlike the others, it is observed that students in Cluster 1 submitted their assignments in more than one session. This result shows that students' assignment submission behaviors differ from each other and some students invest in more time with their assignment pages. In the study by Kovanovic et al. (2015), it is concluded that the academic performance of students increased based on their efforts in performing their learning tasks. In this study, it is found that there is no significant difference in the course success of the student groups who sent their assignment on time but who differed in their assignment behaviors. For this reason, in future studies, new modeling studies can be conducted by taking into consideration the time dimension.
The end-of-year achievements of students with different assignment submission behaviors showed that 86% of the students in the 3 rd Cluster that did not submit the assignment fail the course; on the other hand, it is observed that 88% of students in Cluster 1 who submitted the assignment and 67% of students in Cluster 2 were successful in the course. Similar to this finding, in the study conducted by Zacharis (2015), in a blended learning environment supported by the Moodle learning management system, it is shown that students with a small number of assignment submissions are more likely to fail. In addition, this finding is found to be consistent with the relevant study findings, which reveal that assignment and interaction data are significant in predicting student's achievement (Kovanovic et al., 2015;MacFadyen & Dawson, 2010). The findings that the students' who are in the Cluster 3 and did not submit their assignment, have underachievement; supports the findings that the students' failure to submit their assignment is an indicator of academic procrastination and that the mentioned students have underachievement (Cerezo et al., 2017;You, 2015You, , 2016. Therefore, according to this finding, it may be suggested to provide automatic interventions and instant feedback to increase learner interaction with online assignments in order to increase learner success in the blended learning process. In addition, it can be stated that data on students' assignment submission behaviors should be taken into consideration in learning analytics dashboards. The findings can be used to determine the students who will fail at the end of the semester in the first weeks of the course. Interventions at this stage can be effective in preventing students' possible failures. Studies demonstrate that the students' end-of-year performances can be predicted accurately from the first weeks of the course (Akçapınar, Altun, & Aşkar, 2019). It is fundamental to design and test interventions especially for students who are likely to fail the course in future studies. In this study, the data related to the assignment given to the students in the fourth week of the course were analyzed. This assignment was selected because it was the first comprehensive assignment of the course. This is one of the limitations of the study. In future studies, it can be examined whether the behavior patterns of the students change based on the difficulty level of the assignments and individual differences of students.
Ethics Committee Approval Information: Ethics committee approval for this research was obtained from Trabzon University, Social and Humanities Ethics Committee, with the date 5 March 2020 and document number 81614018-000.E109.