Portal de Programas de Pós-Graduação (UnB)

SIGAA - Sistema Integrado de Gestão de Atividades Acadêmicas

PPCA PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO APLICADA (PROFISSIONAL) INSTITUTO DE CIÊNCIAS EXATAS Telefone/Ramal: (61) 98114-0478 E-mail: ednacanedo@unb.br https://www.unb.br/pos-graduacao

Banca de DEFESA: Wagner Miranda Costa

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
STUDENT : Wagner Miranda Costa
DATE: 21/12/2023
TIME: 09:00
LOCAL: remoto
TITLE: Semantic Similarity between Judgments to Support the Formulation of TCU Jurisprudence

KEY WORDS:

Natural Language Processing, Information Retrieval, Document Vector Representation, Bag-of-Concepts, Word Embeddings

PAGES: 70
BIG AREA: Ciências Exatas e da Terra
AREA: Ciência da Computação
SUBÁREA: Metodologia e Técnicas da Computação
SPECIALTY: Sistemas de Informação
SUMMARY:

Jurisprudence refers to the set of repeated decisions on a given subject, constituting a type of judicial precedent. Within the scope of the Federal Audit Court (TCU), the body responsible for exercising external control of the Federal Public Administration, jurisprudence represents the consolidated interpretations of the rules applicable to the financial and operational supervision of the public accounts of the Union’s bodies and entities. Since the elaboration of jurisprudence is defined based on a grouping of similar rulings, it is important to develop automated tools that assist the specialists responsible for this activity. However, this is a challenging task for the area of computing, due to the specificities of the vocabulary present in the texts of the rulings and the massive volume of data to be processed. Therefore, it is necessary to develop scalable, effective and efficient approaches that have low computational cost. This work presents the study and implementation of some approaches for representing these textual documents, both at the word level and at the concept level. As a contribution, a new approach called BoC-Th (Bag of Concepts with Thesaurus) was proposed, which generates weighted histograms of concepts defined based on the distance of the words in the document to their respective similar term within a thesaurus. This approach allows us to emphasize words with greater meaning in the context, thus generating more discriminative vectors. Experimental evaluations were carried out comparing the proposed approach with traditional approaches for document representation. The proposed method obtained superior results among the techniques evaluated for recovering jurisprudential documents. BoC-Th increased average accuracy compared to traditional approaches, including the original BoC (Bag of Concepts), while also being faster than traditional BoW, BM25, and TF-IDF representations. The proposed approach contributed to enriching an area with peculiar characteristics, providing a resource for retrieving textual information more accurately and quickly than other techniques based on natural language processing.

COMMITTEE MEMBERS:
Presidente - 3064724 - GLAUCO VITOR PEDROSA
Interno - 1937247 - BRUNO CESAR RIBAS
Interno - 1821656 - THIAGO DE PAULO FALEIROS
Externo à Instituição - EDUARDO DE PAULA COSTA - USP

Notícia cadastrada em: 01/12/2023 16:35