Abstract
Currently, Learning Management Systems (LMS) are being used in the majority of educational institutions to provide learning materials online. As a by-product of these systems—every click is recorded—one gets a rich amount of data about students’ online behaviour. Recently many researchers have started to investigate these data. Interpreting and contextualizing data about students, to improve learning and teaching, is also known as learning analytics. This document describes the project “EXCTRA - EXploiting the Click-TRAil. Assessing the benefits of Learning Analytics”. The main objective of the project, materialized through three reports, is to figure out whether and how learning management system (LMS) data can be used to predict student performance. In particular, early prediction of student performance constitutes an important input for a diversity of educational interventions aiming at reducing student failure.
The first report consists of a literature review, which explicitly identifies gaps in research on the prediction of student performance, which we address in the other reports. The second report consists of a manual that can be used to convert raw LMS log data into analysable data. The manual facilitates the analysis of LMS data by teachers who are not familiar with the data-handling techniques needed for preparation of the LMS data. The third report describes an empirical study using LMS data from seventeen blended courses with 4,989 students taught at Eindhoven university of Technology, combined with data from a test for prospective students (the “TU/e Study Choice Check”). Among other matters, it examines to what extent LMS data can be used for the construction of student performance across the different courses.
REPORT ONE: Literature review
The literature review describes three categories found in the relatively new field of learning analytics. By far the most common topic in learning analytics is the prediction of student performance. These studies show how a wide variety in variables extracted from the data, using a wide variety of analytical methods used, can reveal relations between online behaviour and course performance. Little theory is used to motivate the inclusion of predictor variables, which makes it hard to draw general conclusions about which variables are best in predicting student performance. In addition, most current studies predict student performance only at the end of the course, basically only considering whether predicting student performance is possible in principle, but at a time when interventions are not possible anymore. Additionally, often only LMS data are used, while additional student characteristics (for instance high-school grade point average) and performance data (for instance in-between test scores) have been shown to be robust predictors over decades.
The second category in learning analytics consists of analytics and visualization tools, which are made to assist researchers, teachers, and students to analyse and interpret the (complex) LMS data. Several of these tools exist, but they really are in their infancy at this point in time, and in any case quite diverse and mostly applied in just a couple of places, for instance restricted to a handful of courses. The third emerging theme in learning analytics focusses on the actual implementation of the analyses to improve learning and teaching. Research on this theme shows much more promise and should be extended to get insight in the impact of learning analytics and insight in which interventions are useful in which situations.
REPORT TWO: A manual for pre-processing LMS data
LMS data are stored in large “raw” log tables which are hard to transform into analysable data tables. Moreover, for the prediction of student performance, the data needs to be merged with performance data (grades), which are often stored in a different database with a different data structure. This so-called pre-processing of the data takes a lot of time and effort, especially for teachers and researchers who lack background in data transformation. In fact, we feel this is one of the main reasons why LMS log data are relatively rarely used by educational researchers: they typically do not have the data-handling skills necessary to convert the raw data to an analysable data set. Therefore, our second report offers a manual for pre-processing the raw LMS data and performance data into data which can be used for further analyses, including scripts and explanations of the decisions during the pre-processing process so that any researcher willing to invest a couple days should be able to create an data set that is analysable through standard statistical techniques.
REPORT THREE: Predicting student performance
In the third report we investigate how LMS data can be used. First of all, we characterize the TU/e courses with respect to the LMS features that they use. The courses utilizing Moodle LMS at Eindhoven University of Technology mainly use the LMS to provide content and quizzes. More interactive features such as a discussion forum, wikis, and peer-reviewed assignments were also used, but not consistently throughout many courses. Secondly, we show that LMS data can indeed be used to predict student performance at the course level. However, consistent with previous research, we find that the effects of the LMS predictors differ across courses. One could have naively hoped to find that, say, spending a lot of time online is predictive of a high grade, and we do find courses where this is the case, but we do not find many consistent results of this kind across all courses. Only the in-between assessment grades, the total number of sessions, and the time until the first activity were found to be robust predictors. Hence, it is hard to draw general conclusions, that is, conclusions that hold across all courses, about which LMS data are useful for predicting final exam. Still, the data can be used for prediction of student performance per course.
Thirdly, we find that learner data outperforms LMS data in the prediction of student performance. As soon as in-between assessment grades are added to LMS data, learner data has a much lower predictive value. The combination of LMS data and learner data is especially useful for the early prediction of student performance, before the in-between assessments are available. However, the predictions are quite far away from an accurate prediction (confidence intervals typically are the predicted grade plus or minus 1.35 points on scale of 0 to 10), indicating that one has to be careful in using these predictions for early interventions. Fourth, we considered the relationship between LMS data and learner data. We find that most learner data does not correlate strongly with LMS data. However, conscientiousness, time management, and in-between assessment grade did show significant correlations with most of the LMS variables, with low to moderate effect sizes. This offers some promise that, at least for these concepts, LMS data might be of use to measure them continuously, as the university year progresses.
The first report consists of a literature review, which explicitly identifies gaps in research on the prediction of student performance, which we address in the other reports. The second report consists of a manual that can be used to convert raw LMS log data into analysable data. The manual facilitates the analysis of LMS data by teachers who are not familiar with the data-handling techniques needed for preparation of the LMS data. The third report describes an empirical study using LMS data from seventeen blended courses with 4,989 students taught at Eindhoven university of Technology, combined with data from a test for prospective students (the “TU/e Study Choice Check”). Among other matters, it examines to what extent LMS data can be used for the construction of student performance across the different courses.
REPORT ONE: Literature review
The literature review describes three categories found in the relatively new field of learning analytics. By far the most common topic in learning analytics is the prediction of student performance. These studies show how a wide variety in variables extracted from the data, using a wide variety of analytical methods used, can reveal relations between online behaviour and course performance. Little theory is used to motivate the inclusion of predictor variables, which makes it hard to draw general conclusions about which variables are best in predicting student performance. In addition, most current studies predict student performance only at the end of the course, basically only considering whether predicting student performance is possible in principle, but at a time when interventions are not possible anymore. Additionally, often only LMS data are used, while additional student characteristics (for instance high-school grade point average) and performance data (for instance in-between test scores) have been shown to be robust predictors over decades.
The second category in learning analytics consists of analytics and visualization tools, which are made to assist researchers, teachers, and students to analyse and interpret the (complex) LMS data. Several of these tools exist, but they really are in their infancy at this point in time, and in any case quite diverse and mostly applied in just a couple of places, for instance restricted to a handful of courses. The third emerging theme in learning analytics focusses on the actual implementation of the analyses to improve learning and teaching. Research on this theme shows much more promise and should be extended to get insight in the impact of learning analytics and insight in which interventions are useful in which situations.
REPORT TWO: A manual for pre-processing LMS data
LMS data are stored in large “raw” log tables which are hard to transform into analysable data tables. Moreover, for the prediction of student performance, the data needs to be merged with performance data (grades), which are often stored in a different database with a different data structure. This so-called pre-processing of the data takes a lot of time and effort, especially for teachers and researchers who lack background in data transformation. In fact, we feel this is one of the main reasons why LMS log data are relatively rarely used by educational researchers: they typically do not have the data-handling skills necessary to convert the raw data to an analysable data set. Therefore, our second report offers a manual for pre-processing the raw LMS data and performance data into data which can be used for further analyses, including scripts and explanations of the decisions during the pre-processing process so that any researcher willing to invest a couple days should be able to create an data set that is analysable through standard statistical techniques.
REPORT THREE: Predicting student performance
In the third report we investigate how LMS data can be used. First of all, we characterize the TU/e courses with respect to the LMS features that they use. The courses utilizing Moodle LMS at Eindhoven University of Technology mainly use the LMS to provide content and quizzes. More interactive features such as a discussion forum, wikis, and peer-reviewed assignments were also used, but not consistently throughout many courses. Secondly, we show that LMS data can indeed be used to predict student performance at the course level. However, consistent with previous research, we find that the effects of the LMS predictors differ across courses. One could have naively hoped to find that, say, spending a lot of time online is predictive of a high grade, and we do find courses where this is the case, but we do not find many consistent results of this kind across all courses. Only the in-between assessment grades, the total number of sessions, and the time until the first activity were found to be robust predictors. Hence, it is hard to draw general conclusions, that is, conclusions that hold across all courses, about which LMS data are useful for predicting final exam. Still, the data can be used for prediction of student performance per course.
Thirdly, we find that learner data outperforms LMS data in the prediction of student performance. As soon as in-between assessment grades are added to LMS data, learner data has a much lower predictive value. The combination of LMS data and learner data is especially useful for the early prediction of student performance, before the in-between assessments are available. However, the predictions are quite far away from an accurate prediction (confidence intervals typically are the predicted grade plus or minus 1.35 points on scale of 0 to 10), indicating that one has to be careful in using these predictions for early interventions. Fourth, we considered the relationship between LMS data and learner data. We find that most learner data does not correlate strongly with LMS data. However, conscientiousness, time management, and in-between assessment grade did show significant correlations with most of the LMS variables, with low to moderate effect sizes. This offers some promise that, at least for these concepts, LMS data might be of use to measure them continuously, as the university year progresses.
Original language | English |
---|---|
Publisher | Eindhoven University of Technology |
Publication status | Published - 2016 |