Banca de DEFESA: Lucas Coelho de Almeida

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
STUDENT : Lucas Coelho de Almeida
DATE: 16/12/2022
TIME: 16:00

Não informado.


Não informado.

PAGES: 122
BIG AREA: Engenharias
AREA: Engenharia Elétrica

The digitization of relationships and information has increased human beings’ ability to produce data exponentially. However, at the same rate at which new data is created, it is increasingly necessary to understand and mine large databases, even without any structure or formatting and with different purposes. In this context, the use of data indexing techniques using search engines (from English Search Engines) and the interpretation of datasets with the aim of classifying and categorizing them proves to be indispensable for scenarios of textitBig Data and Data Lake, where information can come from different sources with different technical and semantic characteristics, requiring multi-class classifications and natural language processing techniques, commonly known as NLP techniques (from English Natural Language Processing). Additionally, it is necessary to understand whether the classification tools are biased and whether the results are useful and consistent with expectations, especially in cybercrime investigation contexts. This is the problem of decision-making transparency, that is, the clear and/or legible representation of the parameters that led the machine to a certain decision/classification. An ideal research system, therefore, should be able to index large databases, understand the semantics and be subject to adaptation/learning to act in different scenarios, and at the end of the process, still provide results enriched with the parameters that led to machine to make certain decisions for subsequent auditing of transparency in the process. Therefore, this dissertation aims to propose an end-to-end architecture of a search engine that indexes and uses metasemantic interpretations based on natural language processing techniques on data from Web pages, in order to also provide examples of parameters similar to the classifications derived from the samples. The ”meta” prefix in the term ”metasemantics” refers to a set of classification, prediction and data enrichment techniques applied to emulate the semantic indexing process, while preserving the auditability of the process. For the purpose of validating the proposal, samples of Web pages were created and official databases were used to train instances of machine learning to simulate real contexts of application of the project. As a result, the validation shows how the proposed search engine allows the storage and processing of plain data originating from Web pages and increases the speed and objectivity with which investigations are carried out and audited in language processing contexts natural, especially relevant to cybercrime contexts.

Interno - 906.575.601-97 - DANIEL ALVES DA SILVA - UnB
Presidente - 1311780 - FABIO LUCIO LOPES DE MENDONCA
Notícia cadastrada em: 12/12/2022 13:47
SIGAA | Secretaria de Tecnologia da Informação - STI - (61) 3107-0102 | Copyright © 2006-2024 - UFRN - app17_Prod.sigaa11