Student Activity Analytics in an e-Learning Platfom: Anticipating Potential Failing Students

The evolution of learning technology and tools changed the way students access information and build their knowledge. Registering the interaction of students with these tools generates a large amount of data that, once critically analysed, can provide important clues about the students’ learning progress. Nevertheless research has still to be conducted to fully understand how (and if) the students’ interaction with the learning technologies relates to their learning success. In parallel, new analytical tools must be developed to allow teachers to fully exploit the information embedded in this data, in a friendly but flexible way. This article is a contribution to this effort and presents a study where the use of a learning management system (LMS) in a specific semester-long course in an Engineering school produced data that was analysed and correlated to the students’ success. Results indicate that some correlation exists between the effective use of some of the tools integrated in the LMS and student success which points the way to buld specific applications to provide teachers with indicators of students in danger of failing.


INTRODUCTION
For some time now, a multitude of technology-based learning tools have been used to support online and offline courses.From these tools, Learning Management Systems (LMS) are probably the most used platforms, especially in Higher Education.These applications provide teachers students with an extensive range of information and communication tools, depending on the structure defined by the teacher (Mota et al., 2014).Furthermore, access to the LMS is ubiquitous, in time and location, which necessarily changes the way students approach the learning process.
These platforms record and store all the user activity, from entry to exit, like the number of accesses, duration of accesses, paths traversed in the platform, tools used, resources used or downloaded, access to files and folders, performed tasks and activities, messages and posts read and sent, quizzes attempted and answered, assignments submitted, etc (Marques et al., 2010) (Preidys and Sakalauskas, 2010) (Mostow and Beck, 2006).
Teachers have access to this data but the sheer size of the collected information, the lack of synthetic views over this data and the inability to apply adequate techniques and tools to mine this data usually drives teachers away from making an effective use of it.Furthermore, data is normally obtained from three different sources: (1) recorded text, (2) web server log files, and (3) learning software log files (Black et al., 2008) and as such it is not stored in a systematic way so its thorough analysis requires long and tedious preprocessing (Kruger et al., 2010).Therefore, it is mostly researchers that apply concepts of Data Analysis, Big Data and Learning Analytics to exploit the data (Alves et al., 2015), (Black et al., 2008), (Garcia andSecada, 2013), (Kruger et al., 2010), (Lino et al., 2017) as teachers do not have the required knowledge to apply these techniques and they don't have friendly tools to do that analysis and provide them with the processed information.These techniques have been applied to the assessment of student's performance, to support course adaptation, to scaffold recommender systems, detect atypical student's behavior and even to detect learning styles of students (Romero et al., 2007) (Liyanage et al., 2015) (Khribi et al., 2009).
For instance, Alves studied the access to virtual learning environments (VLEs) and reported on the large quantities of data resulting from both students and teachers' activities developed in those environments (Alves et al., 2015).Black used e-learning tools to generate relevant information, for the teacher and the students, to optimize their learning process.The study combines data-processing and learning analytical to improve higher education learning processes.Authors concluded that activity logs of virtual learning environments provided real knowledge of the use of these environments, but also identified the need for new pedagogical approaches (Black et al., 2008) to exploit that data.
The AAT tool was created to "…access and analyse student behaviour data in learning systems by enabling users [learning designers and/or teachers] to extract detailed information about how students interact with and learn from online courses in a learning system" (Graf et al., 2011).The Moodle Data Mining (MDM) tool addressed the knowledge discovery process from student data registered in Moodle courses (Luna et al., 2017).CourseVis is a visualization tool that shows a graphical representations of the students access data (Mazza and Dimitrova, 2007).GISMO complements the previous tool by giving information on the actual use of contents and activities Mazza and Milani, 2004).Mostow et al. (2005) created a tool to represent the teacher-student interaction based on their communication logs.MATEP gets data from the LMS log files but also from the academic portal to generate dynamic reports (Zorrilla and Álvarez, 2008).
Some authors have used content analysis methods to study the interaction in discussion forums (Lin et al., 2009), (Dringus and Ellis, 2005).Using text analysis methods (a process that uses algorithms capable of analyzing collections of text documents in order to extract knowledge), authors found online discussion types.The results of this experiment helped teachers monitor online activities that took place in the discussion forums.Other authors have created software agents based on mathematical methods and statistical analysis to perform that data analysis (Castro et al., 2007), (Mamčenko and Šileikienė, 2006) and (Preidys and Sakalauskas, 2010).
In this paper, we present the process of data analysis of the behavior of students in a LMS to measure and contextualize their access (where, how many times, and in what form), their digital paths in the platform (what tools are used, actions or queries, use of resources, forum participation, etc.) and the correlation with student learning success.The intention was to determine significant correlations that leads to the creation of a tool that allows teachers to make an early identification of students with problems.This work was developed in the context of a PhD project whose objective was to study, discuss, propose and validate a support model for the adoption of Information and Communication Technologies (ICT) for pedagogical purposes in the HE scenario, and to propose a coherent and consistent model of institutional and pedagogical activity, centered on teachers (Marques, 2015).This article is an improved, revised and extended version of the publication by Marques (2017).

STUDENT INTERACTION WITH LEARNING TOOLS VS. STUDENT SUCCESS
The LMS MOODLE collects a set of data that, when analyzed, can give teachers indicators to follow the student's behavior in order to identify critical situations and preventing dropouts.For this study, data was collected during a period of 6 months (from September 1, 2014 to February 29, 2015) from the course on Computer Principles (PRCMP) of the BSc in Computer Engineering (LEI) of the Department of Informatics Engineering (DEI).The population consisted of 364 individuals.
This study followed the proposal of Gaudioso and Talavera (2006) in the sense that the work started from a question derived from intuition based on the own experience and data was collected to confirm it.In this case, the hypothesis was that there was a correlation between the level of involvement of the students with the learning tools and their final learning success.
Figure 1 shows the histogram of the final grades obtained by the students.Approximately 3% of the students failed the course because they did not obtain the minimum grade required.Nevertheless it is possible to see that the majority of students (85%) achieved a grade of 10 or higher which means they were successful.Analysing the access data is important to prevent the 15% of failing students, particularly the 3% of students that dropped out.
For the collection of data related to the use of the MOODLE LMS and its tools it was necessary to combine the internal functionality of the platform that allows to generate some usage reports with the creation of a set of applications that extracted directly the information from the MOODLE databases.This was a time-consuming and specialized programming work that is clearly out of reach for most of the teachers, even at the Higher Education level.
Data related to the distribution of accesses per hour and day was collected to characterize the students' use of the platform (Table 1).
This table is better viewed in the two following figures.In Figure 2, the average number of accesses per student is shown in hourly intervals, at workdays and at weekends.There are more accesses on working days than on the  weekend, and mainly between 8:00 and 24:00 (with a drop in the lunch period and during the dinner period).This hourly trend is also visible at the weekend, but less significant.Figure 3 shows the number of enrolled students that accessed the platform in each hour.This distribution can be explained by the fact that, on working days, students access the LMS to accompany the classes (night classes go until 23:30).In the weekend, however, students enter the LMS to work autonomously.This explains why the total number of accesses is higher in working days than in the weekend.This ratio between working days and weekend is also visible in Figure 3, which shows the percentage of students who accessed the platform throughout the whole semester.Naturally, the use of the tool on weekends is more significative as it reflects the autonomous and self-motivated use of the platform contrary to most working days' use.Therefore it is quite relevant that about 75% of all the students accessed the platform on weekends for at least one time as it shows a very high commitment level.The summary of total accesses made by students is presented in Table 2, ranked by their grades (the grading system uses a scale from 0 to 20 points).The table also presents the normalization (total number of accesses/number of students per range) of those ratings.
From the collected data, also represented on Figure 4, we can see that the number of most relevant accesses corresponds to students who scored between 12 and 16, representing more than 40% of the accesses.
So, although it is not mandatory, a frequent use of the platform seems to lead to good results.Figure 5 shows the average number of accesses per student at each classification level.It is clear that students that had success (grade of 10 or higher) have more accesses on average.Curiously the higher ranked students are not the ones with the higher access average which is a phenomenon that deserves further study.Our observation leads to the preliminary remark that this students do not fell the need to go as often as the others to the platform because they are quite confident on their knowledge of the course contents.Looking at the failing students, it is obvious that the dropouts have the lower average number of accesses which is natural as they stopped accessing the platform during the semester.Nevertheless, it is clear that the remaining failing students (those that proceeded in the course until the end but failed) had the lowest average number of accesses.Although it was not possible to get a full correlation between the average number of accesses grade and the final grade, it was possible to conclude that there is a correlation between the least number of accesses and the failing students.
Table 3 shows several correlation scores calculated using access and participation data from LMS activities and the grades obtained by the students.The idea was to refine the previous analysis and try to identify which online learning activities would better correlate with the students' results.
It can be seen that the access to documents (information) has the highest correlation with the learning success.In fact it is the only number with significance as the other values do not reflect any significant correlation.Therefore a more detailed analysis was conducted to evaluate that aspect.
Figure 6 shows the students' relation between the access to course documentation and the final grade obtained.A higher granularity analysis was conducted to identify stronger correlations.Thus, we analyzed the number of accesses to the course documents considering Figure 9 shows the correlation between the students who obtained the minimum score of the course (8.0 values) and did more than 150 accesses to the course platform.
The correlations obtained with this more granular subdivision were much lower than the correlation already obtained for the complete group, so it did not allow drawing additional conclusions.Nevertheless it was considered that by identifying students with a lower number of accesses particularly when these accesses did not conduct to reading or downloading the available documentation, would be possible to signal potential failing students.

CONCLUSIONS
This study was part of a more general approach to identify key factors of the process of acceptance and adoption of learning technologies by the teachers in order to foster the use of technological-based pedagogical tools.In fact, although teachers sometimes do not demonstrate a strong initial motivation, it was shown that it is possible to foster that adoption.However, the need for better tools was highlighted.The use of tools to analyse the data resulting from the student's interaction with learning tools os one example.This data, when critically analysed, can provide important clues about the students' learning progress.Nevertheless new analytical tools must be developed to allow teachers to fully exploit the information embedded in this data, in a friendly but flexible way.
In particular this article focused on the need to alert teachers to potential problems of students (dropping out or failing).This alert can come from analyzing indicators resulting from the quantitative usage data of institutional learning management system.An exhaustive data collection process was organized about every aspect of access and use of the platform tools.This data was then correlated with the students' success, namely through their final grade in the course.An initial difficulty was related to the actual collection of data and the treatment process.Nevertheless, the collected results showed two potential indicators for these students: 1) lower number of accesses to the platform and 2) lower access to the provided documentation available on the platform.
Even if the extracted data allowed obtaining relevant course information, it still did not allow to design an automatic tool for the treatment of the data collected from the LMS logs.In fact, other factors not included in this study might increase the complexity of the analysis, for example by considering the presence effect in classes.Another relevant problem is that the data used (logs) is very dependent on the structural organization of the course in the learning support platform.To obtain significant results implies a compulsory use of the platform and restructuring the courses according to a given model.To sum up, although there were some positive results observed, more data must be collected to achieve more significant conclusions.

ACKNOWLEDGEMENT
We would like to acknowledge the support of ISEP's management to this work.

Figure 1 .
Figure 1.Distribution of students according to the final grade of the course

Figure 2 .Figure 3 .
Figure 2. Average number of accesses per student

Figure 4 .Figure 5 .
Figure 4. Total number of platform accesses in total mode by grade

Figure 6 .Figure 8
Figure 6.Distribution of accesses to course documents versus students' score

Figure 7 .Figure 8 .
Figure 7. Students who did not obtain the minimum grade of the course (8.0 values) and did any number of accesses

Figure 9 .
Figure 9. Students who obtained at least the minimum score of the course (8.0 values) and did more than 150 accessions

Table 1 .
Accesses by Students (Working days and Weekends)

Table 2 .
Access data and number of students per