Prediction of Academic Dropout in Higher Education: The Case of Face-to-Face Undergraduate Courses at the University of Brasília
Dropout , Dropout Indicators, Machine Learning Algorithms, Forecasting, Prediction
For some time now, (inter)national researchers have been studying dropout rates in higher education courses, classifying them into two types: students who drop out of the university, and students who drop out of higher education. Both situations cause damage to the institution, students, and society at large. Starting in 1995, with the creation of ANDIFES, studies began to be more frequent in Brazil. This commission developed reports that analyzed graduation rates, retention, and dropout rates in undergraduate courses at Brazilian universities. Institutional dropout was one of the study subjects, described as a student leaving their original course without completing it. The University of Brasília (UnB), considering the issues surrounding student dropout, has created mechanisms to increase student retention in undergraduate courses. The objective of this work was to develop and evaluate an analytical model that allows the use of academic data for predicting dropout rates in face-to-face undergraduate courses. A Systematic Literature Review was conducted to identify the factors that impact dropout rates and define indicators that can be extracted from UnB's academic systems. It also helped in selecting algorithms/tools to support the analysis. The main result of the systematic review was the identification of 29 factors used by researchers, where average score, gender, and course grades were the most commonly used ones. Regarding tools, Regression, Decision Tree, and Neural Network were the most frequently used algorithms. Based on this preliminary result, the Undergraduate Analysis Model (MAGRA) was created, which utilizes existing indicators in UnB's academic systems in conjunction with machine learning tools to predict students at risk of dropout. The research was conducted in an environment that encompasses two scenarios. The first was developed at the Faculty of Gama (FGA), which served as a prototype for MAGRA's creation, and the second at UnB. In the testing stages, where only the courses taken by students were considered, it was shown that the number of times a student takes a course can be an indicator of their difficulty in completing it within the designated time frame. By employing the new variables, there was an increase in the number of valid models and, consequently, an increase in the analyzed courses and classes, resulting in a higher number of predictions. This situation was observed during the studies conducted at the Faculty of Gama and the University of Brasília. To improve early identification of students with dropout characteristics, it is necessary to create feedback mechanisms from course coordinators, introduce new systems, improve data quality, and adjust algorithm parameters.