Using educational data mining techniques to increase the prediction accuracy of student academic performance

Publication Date08 Jul 2019
AuthorGomathy Ramaswami,Teo Susnjak,Anuradha Mathrani,James Lim,Pablo Garcia
SubjectLibrary & information science
Using educational data mining
techniques to increase the
prediction accuracy of student
academic performance
Gomathy Ramaswami,Teo Susnjak and Anuradha Mathrani
School of Natural and Computational Sciences (SNCS), Massey University - Albany
Campus, North Shore City, New Zealand
James Lim
Department of Civil Engineering, University of Auckland, Auckland,
New Zealand, and
Pablo Garcia
Xorro Solutions, Auckland, New Zealand
Purpose This paperaims to evaluate educational data mining methodsto increase the predictive accuracy
of student academic performance for a university course setting. Studentengagement data collected in real
time and overself-paced activities assisted this investigation.
Design/methodology/approach Classication data mining techniques have been adapted to predict
studentsacademic performance. Four algorithms, Naïve Bayes, Logistic Regression, k-Nearest Neighbour
and Random Forest, were used to generate predictive models. Process mining features have also been
integratedto determine their effectiveness in improvingthe accuracy of predictions.
Findings The results show that when generalfeatures derived from student activities are combined with
process miningfeatures, there is some improvement in the accuracy of the predictions. Of the four algorithms,
the study nds Random Forest to be more accurate than the other three algorithms in a statistically
signicant way. The validationof the best-known classier model is then testedby predicting studentsnal-
year academicperformance for the subsequent year.
Research limitations/implications The present study was limited to datasets gathered over one
semester and for one course.The outcomes would be more promising if the dataset comprised morecourses.
Moreover, the addition of demographicinformation could have provided further representationsof students
performance.Future work will address some of these limitations.
Originality/value The model developed from this researchcan provide value to institutions in making
process-and data-driven predictions on studentsacademic performances.
Keywords Classication, Model evaluation, Predictions, Educational data mining, Process mining,
Data mining technique
Paper type Research paper
1. Introduction
Data mining has attracted much attention in recent years due to the availability of large
amounts of static and dynamic data that are captured over different operational processes
within an organisational context. Computational exploration of the underlying historical
data could lead to identication of some consistent patterns that depict systematic
Techniques to
Received17 March 2019
Revised25 June 2019
Accepted8 July 2019
Informationand Learning
Vol.120 No. 7/8, 2019
pp. 451-467
© Emerald Publishing Limited
DOI 10.1108/ILS-03-2019-0017
The current issue and full text archive of this journal is available on Emerald Insight at:
relationship between variables (Han et al., 2011). Data mining functions can then be used to
validate these ndingsby applying the detected patterns to new subsets of data. To enhance
the accuracy of the mining function, we have to prepare the data rst. Data preparation (or
pre-processing) involves integrating datasets from different operational environments,
transforming data into the required computational format, and cleaning it to eliminate
aspects that clutter the dataset (Ribeiro,2013). Data mining is being used in numerous elds
(e.g. banking, nance, retail sales, healthand education) to support organisations in nding
ways to improve their performance(Chamizo-Gonzalez et al.,2015).
Educational data mining (EDM) is a new eld in which data mining techniques are
applied to educational data (Baker et al., 2004). Raw data are extracted from learning
platforms and analysed to gather insights on current educational practice methods for the
purpose of enhancing existing capabilities (Romero and Ventura, 2010). These insights
could relate to tracking progress of individual students, setting up alerts for planning
actions that enhance student retention or monitoring course performances (Lewis, 2018).
Institutions are informed by user generated data, therefore are better placed to plan any
intervention strategy.
Machine learning algorithms combined with various visualisation techniques assist in
the development of predictive models for information discovery and in its visual
representation in an easy-to-understand format for education providers, students,
instructors and policymakers. Once a model performs well on previously seen data, the
analyst can feed in new data, and the model can be used to predict and understand aspects
of newly observed data (VanderPlas, 2016).Therefore, EDM allows educational institutions
an opportunity to make use of data-driven insightsand establish more efcient operational
Process mining (PM) is one of the techniques used in EDM. The idea behind PM is to
infer process-related knowledge from event logs that are generated within educational
environments. Learning paths are identied and scrutinised so as to improve existing
educational processes (Cairns et al., 2015). PM involves construction of a process model
(represented via a Petri net) that reproduces the behaviour observed in the log and
conformance checking that monitors deviations between the two behaviours. Umer et al.
(2017) add that EDM in combination with PM techniques can reveal dynamic aspects of
process-relatedknowledge which are otherwise missing in dataset extracts.
This study uses EDM techniques to explore data collected from an interactive
educational tool (Xorro-Q) used in an engineering course offering. Xorro-Q is a Web-based
interactive tool that is not limited to asynchronous instructorstudent or studentstudent
exchanges; rather, it offers dynamic interactions which can be very effective in classroom
teaching deliveries. The objective of this study is to use data from Xorro-Q activities to
evaluate the effectiveness of EDM methods in predict studentsacademic performance by
focussing on students who are at riskof failing the course. In addition, PM features obtained
as a result of process conformance testing are incorporated to determine whether these
features help in increasing the accuracyof predicting studentsperformance. A popular data
mining technique, namely classication, along with four widely used classication
algorithms Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbour (kNN) and
Random Forest (RF) are used in this study. Most of the studies on EDM and learning
analytics (LA) are limited and have been detailed mainly at a descriptive level (Oakleaf,
2018;Showers and Stone, 2014); however,our study utilises a real-life case study to provide
empirically based evidence on the leeway offered by predictive-modelling techniques. It
provides a computational and descriptive view to inform educational institutions on
predictive modellingtechniques that can be used to predict studentssuccess in a courseand

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT