Banca de DEFESA: José Ronaldo Agra de Souza Filho

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
STUDENT : José Ronaldo Agra de Souza Filho
DATE: 10/09/2024
TIME: 14:30
LOCAL: On-line

DUBI: a framework for automatic evaluation of chatbots


chatbots, static evaluation, interactive evaluation, automation, quality

PAGES: 102
BIG AREA: Ciências Exatas e da Terra
AREA: Ciência da Computação
SUBÁREA: Sistemas de Computação
SPECIALTY: Arquitetura de Sistemas de Computação

The dissemination of Artificial Intelligence (AI) has propelled the utilization of chatbots, also known as virtual assistants, which are conversational systems for automated interaction with users. However, the evaluation of chatbots remains a complex and laborious challenge, often being carried out manually, rendering it impractical for extensive use. A review of the state of the art on this subject indicated that two distinct methods of evaluation are commonly utilized: static and interactive. The former focuses on analyzing the structure and training content of the virtual assistant, while the latter employs interaction with the system for assessment. However, it has been observed that previous studies do not employ these methods in combination, resulting in a coverage of tested features below what is necessary for a comprehensive system diagnosis. In this context, this work introduces the DUBI framework, which proposes an automated assessment approach for chatbots. The framework encompasses both the static and interactive components of the system. The static evaluation module assesses a range of metrics and utilizes this information to identify improvement points in the chatbot's structure, such as intent balance and similarity between them. The interactive evaluation gauges metrics related to the virtual assistant's performance and conversation quality. The DUBI framework offers significant advantages over manual evaluation, saving time and resources, mitigating evaluation variability and potential biases. The technical feasibility of the proposal was demonstrated through an experiment conducted with a real chatbot. In this experiment, all functional aspects of DUBI were experienced, enabling a demonstration of its effectiveness in assessing the performance and quality of the virtual assistant, as well as objectively identifying areas for improvement in chatbot modeling. Preliminary results indicate that enhancements in the virtual assistant's structure positively impact its performance, for instance, improving metrics like F1-score and fallback rate. Hence, the utilization of the DUBI framework has the potential to significantly contribute to the ongoing improvement of chatbot projects. This framework also addresses existing limitations in the current state of the art.

Presidente - 1489499 - JACIR LUIZ BORDIM
Interno - 402520 - MARCELO LADEIRA
Notícia cadastrada em: 10/09/2024 09:46
SIGAA | Secretaria de Tecnologia da Informação - STI - (61) 3107-0102 | Copyright © 2006-2024 - UFRN - app22_Prod.sigaa16