How can causal inference be integrated with predictive learning ana...

Research Question

How can causal inference be integrated with predictive learning analytics to estimate which interventions actually improve student outcomes rather than only predict risk?

diah basuki

Created at May 7, 2026

AI Novelty Assessment

8/10

High Novelty

This research question explores a largely uncharted area with significant potential for new discoveries.

Detailed Analysis

Most EDM work remains predictive rather than causal. There is substantial room for studies that estimate intervention effects, disentangle prediction from action, and test whether analytics-guided support changes outcomes.

Related Academic Papers

10 papers found relevant to this research question. Each paper is scored by how closely it relates to the question.

An Extended Learning Analytics Framework Integrating Machine Learning and Pedagogical Approaches for Student Performance Prediction and Intervention

Khalid Alalawi, R. Athauda, Raymond Chiong (2024)

10/10Relevance

25 citations

Abstract

The use of educational data mining and machine learning to analyse large data sets collected by educational institutions has the potential to discover valuable insights for decision-making. One such area that has gained attention is to predict student performance by analysing large educational data sets. In the relevant literature, many studies have focused on developing prediction models on student performance but comparatively less work exists on actions taken based on predicted at-risk students and evaluating their impact. Learning Analytics Intervention (LAI) studies have emerged as an approach that aims to address this gap. In LAI studies, student risk levels are predicted and disseminated to relevant stakeholders (academics, administrators and students) using learning analytics (LA) tools for targeted interventions. The interventions themselves are mainly left under the discretion of the academics and/or administrators, who are aware of the learning context and have the authority to make decisions, with LA tools facilitating this process. LAI studies have shown success in improving outcomes (e.g. improve pass rates, retention, grades), but their uptake has been slow. The main impediment to piloting LAIs by academics has been the lack of access to LAI infrastructure, which requires institutional investments to develop predictive models collecting data from diverse IT systems. Another challenge in LAIs is the development of effective interventions. This paper presents an extended LAI framework, termed Student Performance Prediction and Action (SPPA), which provides access to LAI infrastructure for academics to pilot LAIs in their courses without the need for institution-wide efforts. SPPA and its features are seamlessly accessed via a web browser and academics can develop course-specific predictive models based on historical course assessment data. Furthermore, SPPA integrates sound pedagogical approaches and provides relevant information (such as students’ knowledge gaps, personalised study plans) to assist academics in providing effective interventions. SPPA was evaluated by a number of academics piloting LAIs in their courses. Quantitative and qualitative data was collected and analysed. Academics were able to provide effective interventions using SPPA and also had a positive outlook on using SPPA and its features. SPPA is also provided as an open-source project for further development and can be a catalyst for widescale uptake in LAIs. Furthermore, a model for continuous improvement in LAIs is outlined along with a number of areas for future research and development.

Why this paper is relevant

Framework integrates analytics and pedagogical approaches; relevant for intervention-oriented causal framing, though not full causal inference.

Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022)

Nabila Sghir, Amina Adadi, Mohammed Lahmer (2022)

8/10Relevance

179 citations

Abstract

The last few years have witnessed an upsurge in the number of studies using Machine and Deep learning models to predict vital academic outcomes based on different kinds and sources of student-related data, with the goal of improving the learning process from all perspectives. This has led to the emergence of predictive modelling as a core practice in Learning Analytics and Educational Data Mining. The aim of this study is to review the most recent research body related to Predictive Analytics in Higher Education. Articles published during the last decade between 2012 and 2022 were systematically reviewed following PRISMA guidelines. We identified the outcomes frequently predicted in the literature as well as the learning features employed in the prediction and investigated their relationship. We also deeply analyzed the process of predictive modelling, including data collection sources and types, data preprocessing methods, Machine Learning models and their categorization, and key performance metrics. Lastly, we discussed the relevant gaps in the current literature and the future research directions in this area. This study is expected to serve as a comprehensive and up-to-date reference for interested researchers intended to quickly grasp the current progress in the Predictive Learning Analytics field. The review results can also inform educational stakeholders and decision-makers about future prospects and potential opportunities.

Why this paper is relevant

Review highlights that prediction often dominates over intervention evaluation, motivating causal studies.

Mining Sequential Learning Trajectories With Hidden Markov Models For Early Prediction of At-Risk Students in E-Learning Environments

Anika Gupta, Deepak Garg, Parteek Kumar (2022)

8/10Relevance

17 citations

Abstract

With the onset of online education via technology-enhanced learning platforms, large amount of educational data is being generated in the form of logs, clickstreams, performance, etc. These Virtual Learning Environments provide an opportunity to the researchers for the application of educational data mining and learning analytics, for mining the students learning behavior. This further helps them in data-driven decision making through timely intervention via early warning systems (EWS), reflecting and optimizing educational environments, and refining pedagogical designs. In this, the role of EWS is to timely identify the at-risk students. This study proposes a modeling methodology deploying interpretable Hidden Markov Model for mining of the sequential learning behavior built upon derived performance features from light-weight assessments. The public OULA dataset having diversified courses and 32 593 student records is used for validation. The results on the unseen test data achieve a classification accuracy ranging from 87.67% to 94.83% and AUC from 0.927 to 0.989, and outperforms other baseline models. For implementation of EWS, the study also predicts the optimal time-period, during the first and second quarter of the course with sufficient number of light-weight assessments in place. With the outcomes, this study tries to establish an efficient generalized modeling framework that may lead the higher educational institutes toward sustainable development.

Why this paper is relevant

Sequential learning trajectories and early prediction can be used as the basis for causal intervention timing.

Educational data mining to predict students' academic performance: A survey study

Saba Batool, Junaid Rashid, M. W. Nisar, Jungeun Kim, Hyuk-Yoon Kwon, Amir Hussain (2022)

8/10Relevance

162 citations

Why this paper is relevant

Survey of prediction methods; useful to show the field's emphasis on forecasting rather than intervention effects.

Predicting student performance using sequence classification with time-based windows

Galina Deeva, Johannes De Smedt, Cecilia Saint-Pierre, R. Weber, Jochen De Weerdt (2022)

8/10Relevance

22 citations

Abstract

A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students’ learning experience and widening their educational prospects, but also an opportunity to gain insights into students’ learning processes with learning analytics. This study contributes to the topic of improving and understanding e-learning processes in the following ways. First, we demonstrate that accurate predictive models can be built based on sequential patterns derived from students’ behavioral data, which are able to identify underperforming students early in the course. Second, we investigate the specificity-generalizability trade-off in building such predictive models by investigating whether predictive models should be built for every course individually based on course-specific sequential patterns, or across several courses based on more general behavioral patterns. Finally, we present a methodology for capturing temporal aspects in behavioral data and analyze its influence on the predictive performance of the models. The results of our improved sequence classification technique are capable to predict student performance with high levels of accuracy, reaching 90% for course-specific models.

Why this paper is relevant

Sequence classification for performance prediction; shows strong prediction without causal intervention analysis.

Towards Fair Educational Data Mining: A Case Study on Detecting At-Risk Students

Hu, Qian, Rangwala, Huzefa (2020)

7/10Relevance

Abstract

Over the past decade, machine learning has become an integral part of educational technologies. With more and more applications such as students' performance prediction, course recommendation, dropout prediction and knowledge tracing relying upon machine learning models, there is increasing evidence and concerns about bias and unfairness of these models. Unfair models can lead to inequitable outcomes for some groups of students and negatively impact their learning. We show by real-world examples that educational data has embedded bias that leads to biased student modeling, which urges the development of fairness formalizations and fair algorithms for educational applications. Several formalizations of fairness have been proposed that can be classified into two types: (i) group fairness and (ii) individual fairness. Group fairness guarantees that groups are treated fairly as a whole, which might not be fair to some individuals. Thus individual fairness has been proposed to make sure fairness is achieved on individual level. In this work, we focus on developing an individually fair model for identifying students at-risk of underperforming. We propose a model which is based on the idea that the prediction for a student (identifying at-risk students) should not be influenced by his/her sensitive attributes. The proposed model is shown to effectively remove bias from these predictions and hence, making them useful in aiding all students. [For the full proceedings, see ED607784.]

Why this paper is relevant

Fair EDM case study; relevant if causal methods are used to avoid reinforcing inequities through interventions.

The use of video clickstream data to predict university students’ test performance: A comprehensive educational data mining approach

Ozan Raşít Yürüm, T. Taşkaya-Temizel, Soner Yıldırım (2022)

7/10Relevance

19 citations

Abstract

Video clickstream behaviors such as pause, forward, and backward offer great potential for educational data mining and learning analytics since students exhibit a significant amount of these behaviors in online courses. The purpose of this study is to investigate the predictive relationship between video clickstream behaviors and students’ test performance with two consecutive experiments. The first experiment was performed as an exploratory study with 22 university students using a single test performance measure and basic statistical techniques. The second experiment was performed as a conclusive study with 16 students using repeated measures and comprehensive data mining techniques. The findings show that a positive correlation exists between the total number of clicks and students’ test performance. Those students who performed a high number of clicks, slow backward speed or doing backwards or pauses achieved better test performance than those who performed a lower number of clicks, or who used fast-backward or fast-forward. In addition, students’ test performance could be predicted using video clickstream data with a good level of accuracy (Root Mean Squared Error Percentage (%RMSE) ranged between 15 and 20). Furthermore, the mean of backward speed, number of pauses, and number/percentage of backwards were found to be the most important indicators in predicting students’ test performance. These findings may help educators or researchers identify students who are at risk of failure. Finally, the study provides design suggestions based on the findings for the preparation of video-based lectures.

Why this paper is relevant

Clickstream prediction study offers predictive signals but not causal estimates of instructional change.

Predictive analytics in education: a comparison of deep learning frameworks

Tenzin Doleck, D. Lemay, Ram B. Basnet, Paul Bazelais (2019)

7/10Relevance

91 citations

Why this paper is relevant

Framework comparison for predictive analytics; baseline for separating prediction from intervention.

Student performance analysis and prediction in classroom learning: A review of educational data mining studies

Anupam Khan, S. Ghosh (2020)

7/10Relevance

195 citations

Why this paper is relevant

Review of classroom prediction studies; helps establish that most models are associative.

Data-driven Decision Making in Higher Education Institutions: State-of-play

Silvia N. Gaftandzhieva, Sadiq Hussain, S. Hilčenko, R. Doneva, Kirina Boykova (2023)

6/10Relevance

36 citations

Abstract

—The paper highlights the importance of using data-driven decision-making tools in Higher Education Institutions (HEIs) to improve academic performance and support sustainable development. HEIs must utilize data analytics tools, including educational data mining, learning analytics, and business intelligence, to extract insights and knowledge from educational data. These tools can help HEIs’ leadership monitor and improve student enrolment campaigns, track student performance, evaluate academic staff, and make data-driven decisions. Although decision support systems have many advantages, they are still underutilized in HEIs, leaving field for further research and implementation. To address this, the authors summarize the benefits of applying data-driven decision approaches in HEIs and review various frameworks and methodologies, such as a course recommendation system and an academic prediction model, to aid educational decision-making. These tools articulate pedagogical theories, frameworks, and educational phenomena to establish mainstay significant components of learning to enable the scheming of superior learning systems. The tools can be utilized by the placement agencies or companies to find out their probable trainees/ recruitees. They can help students in course selection, and educational management in being more efficient and effective.

Why this paper is relevant

State-of-play on data-driven decision making; useful for discussing how analytics are used operationally rather than causally.

Generate your own research questions

ChatAcademia helps researchers discover novel research questions with AI-powered analysis.

Start Free Trial