Dissertations/Thesis

Clique aqui para acessar os arquivos diretamente da Biblioteca Digital de Teses e Dissertações da UnB

2024
Dissertations
1
  • CARLOS JOEL TAVARES DA SILVA
  • Uma Arquitetura de Sistema Multi-robô com Planejamento Multiagente.

  • Advisor : CELIA GHEDINI RALHA
  • COMMITTEE MEMBERS :
  • CELIA GHEDINI RALHA
  • GENAINA NUNES RODRIGUES
  • RICARDO PEZZUOL JACOBI
  • YARA A. RIZK
  • Data: Feb 1, 2024


  • Show Abstract
  • Guaranteeing goal achievement in a Multi-Robot System (MRS) is challenging, especially considering operations in dynamic environments. Despite the Multi-Agent Planning (MAP) literature presenting several approaches to solve this problem, there is space for improvement. One open challenge in MRS is related to plan recovery. Thus, this work integrates MAP to MRS, presenting the Multi-Robot System Architecture with Planning (MuRoSA-Plan) focusing on plan recovery. To illustrate the architecture, a multi-robot mission coordination case in healthcare service uses the Robot Operating System (ROS2) and IPyHOP planner with hierarchical task networks. Experiments with the MuRoSA-Plan prototype present improvement compared to the Planning System Framework for ROS2 - PlanSys2 framework. The experimental results show that MuRoSA-Plan generates runtime-adapted plans mitigating mission disruptions, satisfying the goals of the healthcare service case, indicating a promising solution for plan recovery in MRS.

2
  • JAQUELINE GUTIERRI COELHO
  • A database containing non-coding RNAs involved in colorectal cancer.

  • Advisor : MARIA EMILIA MACHADO TELLES WALTER
  • COMMITTEE MEMBERS :
  • MARIA EMILIA MACHADO TELLES WALTER
  • MARISTELA TERTO DE HOLANDA
  • NALVO FRANCO DE ALMEIDA JUNIOR
  • Data: Apr 4, 2024


  • Show Abstract
  • In nature, there are two main nucleic acids: DNA and RNA. The central dogma of molecular biology describes the process by which DNA is transcribed into RNA, which in turn is translated into proteins. Contrary to the classical view of this dogma, it has been discovered that DNA transcription also generates non-coding nucleic acids, such as microRNAs and long non-coding RNAs. These ncRNAs play essential roles in gene regulation and other cellular processes, highlighting the complexity of the genetic machinery and the functional diversity of ncRNAs in cell biology. In this context, the optimization and enhancement of the Perci Database, a centralized and reliable source of information on colorectal cancer (CRC), has the potential to significantly boost research in this area. This improvement is important to facilitate understanding of the cellular mechanisms underlying tumor detection, progression and prognosis. Furthermore, this study aims to compile relevant data on colorectal cancer, with a focus on ncRNAs, in order to make them available for public consultation. An online database was developed covering five CRC descriptors and three specific categories of ncRNAs: long ncRNAs (lncRNAs), long circular ncRNAs (circ ncRNAs) and microRNAs (miRNAs), along with associated transcriptomic features. The cancer descriptors covered include colorectal cancer, colon cancer, adenocarcinoma, tumor and liver metastasis of colorectal cancer. Questions such as "Which circRNAs are implicated in colorectal neoplasia?", "Which circRNAs and miRNAs are associated with colorectal cancer liver metastasis?" and "Given a specific lncRNA (H19), which cancer descriptors are related to this ncRNA?" can be answered using our database. The web system can be accessed at: http://percidatabase.com.br/.

3
  • Wedrey Nunes da Silva
  • Analysis of CS through Biosignals: an approach with Symbolic Machine Learning

  • Advisor : RICARDO PEZZUOL JACOBI
  • COMMITTEE MEMBERS :
  • RICARDO PEZZUOL JACOBI
  • CELIA GHEDINI RALHA
  • TIAGO BARROS PONTES E SILVA
  • THIAGO MALHEIROS PORCINO
  • Data: Apr 19, 2024


  • Show Abstract
  • Cybersickness (CS) represents one of the main obstacles to the use and adoption of Vir-
    tual Reality (VR). Symptoms associated with CS can vary from person to person and
    include: nausea, vertigo, eyestrain and headache, and can last from a few minutes to
    hours after exposure to RV. Although the reported incidence of CS among users of VR
    varies, studies indicate that a large portion of the population, approximately 40% to 60%
    may experience moderate to severe symptoms of CS. One of the main obstacles to ensure
    comfort when using immersive systems is CS, which often occurs when using devices such
    as Head-mounted Display (HMD). Although there are several theories about the possible
    causes of CS, there is no easy or systematic method to measure and quantify it. It is com-
    mon for researchers to use subjective measures to identify the intensity of CS, which can
    be measured through pre- and post-experience self-reported questionnaires, such as the
    Virtual reality sickness questionnaire (VRSQ) [1]. In previous studies, several approaches
    were used to measure the intensity of CS, using subjective and objective measures. Ac-
    cording to researchers, CS has a significant impact on physiological signals, including
    the delta wave of EEG, HR, HRV, GSR and EGG, which have a significant correlation
    with this condition. The general objective of this work is to investigate the physiological
    alterations associated with CS in virtual reality games, collecting information from user
    profile, game and also biosignals (ECG, EDA and ACC), from the total of 30 healthy
    people. Participants will be immersed through two VR games, the first in a racing car
    and the second in a flight. A Symbolic ML classifier will also be used to detect the poten-
    tial causes of CS that occurred during the VR experiments, based on profile, game and
    biosignal data. This work intends to validate the hypothesis that physiological signals
    can be effective in the elaboration of strategies to reduce the symptoms of CS.

Thesis
1
  • Patricia Medyna Lauritzen de Lucena Drumond
  • Visual and Textual Feature Fusion for Document Analysis

  • Advisor : TEOFILO EMIDIO DE CAMPOS
  • COMMITTEE MEMBERS :
  • CAROLINA SCARTON
  • FABRICIO ATAIDES BRAZ
  • LI WEIGANG
  • RICARDO MARCONDES MARCACINI
  • TEOFILO EMIDIO DE CAMPOS
  • Data: Jan 18, 2024


  • Show Abstract
  • The large volume of documents produced daily in all sectors, such as industry, commerce, and government agencies, has increased the number of researches aimed at  automating the process of reading, understanding, and analyzing documents. Business documents can be born digital, as electronic files, or can be a digitized form that comes from writing or printed on paper. In addition, these documents often come in various layouts and formats. They can be organized in different ways, from plain text, multi- column layouts, and a wide variety of tables/forms/figures. In many documents, the spatial relationship of text blocks usually contains important semantic information for downstream tasks. The relative position of text blocks plays a crucial role in document understanding. However, the task of embedding layout information in the representation of a page instance is not trivial. In the last decade, Computer Vision (CV) and Natural Language Processing (NLP) pre-training techniques have been advancing in extracting content from document images considering visual, textual, and layout features. Deep learning methods, especially the pre-training technique, represented by Transformer architecture, have become a new paradigm for solving various downstream tasks. However, a major drawback of such pre-trained models is that they require a high computational cost. Unlike these models, we propose a simple and traditional rule-based spatial layout encoding method, which combines textual and spatial information from text blocks. We show that this enables a standard NLP pipeline to be significantly enhanced without requiring expensive mid or high-level multimodal fusion. We evaluate our method on two datasets, Tobacco800 and RVL-CDIP, for document image classification tasks. The document classification performed with our method obtained an accuracy of 83.6% on the large-scale RVL-CDIP and 99.5% on the Tobacco800 datasets. In order to validate the effectiveness of our method, we intend to carry out more experiments. First, we will use other more robust datasets. Then we will change parameters such as quadrant amounts, insertion/deletion of positional tokens, and other classifiers.

2
  • Gabriel Ferreira Silva
  • Towards Nominal AC-Unification

  • Advisor : MAURICIO AYALA RINCON
  • COMMITTEE MEMBERS :
  • CHRISTIAN URBAN
  • CÉSAR MUÑOZ
  • JOSÉ MESEGUER
  • MAURICIO AYALA RINCON
  • VANDER RAMOS ALVES
  • Data: Jan 26, 2024


  • Show Abstract
  • The nominal syntax extends first-order syntax and allows us to represent smoothly sys-
    tem with bindings. In order to profit from the nominal setting, we must adapt important
    notions to it, such as unification and matching. This thesis is about nominal unification/-
    matching in the presence of an equational theory E and our efforts towards obtaining a
    nominal AC-unification algorithm. First, we extend and formalise a nominal C-unification
    algorithm to also handle matching and equality checking by adding an extra parameter
    X for protected variables, i.e., variables that cannot be instantiated. The formalised al-
    gorithm is used to test a Python manual implementation of the algorithm. Then, as a
    first step towards nominal AC-unification, we give the first formalisation of a first-order
    AC-unification algorithm. We choose to verify Stickels tried-and-tested algorithm. The
    proof of termination employs an intricate (but duly motivated) lexicographic measure that
    is based on Fages proof of termination. Finally, we adapt the first-order AC-unification
    algorithm to propose the first nominal AC-matching algorithm and formalise it to be
    terminating, sound and complete. As was the case for nominal C-unification, we used a
    parameter X for protected variables and this approach also let us obtain a verified nomi-
    nal AC-equality checker as a byproduct. The 3 formalisations previously described were
    done in the PVS proof assistant and are available in NASALib, PVS main repository of
    formalisations. In each one of the three formalisations we describe the files that compose
    the formalisation, pointing out their structure, hierarchy and size. We were not able to
    propose a nominal AC-unification algorithm, but we show how the problem has two in-
    teresting questions: generating solutions π · X ≈? X and proving termination. For the
    first question we propose a non-deterministic enumeration procedure and exemplify how
    it can compute non-obvious solution. For the second question we demonstrate that the
    problem  f(X, W) ≈? f(π · X, π · Y ) gives rise to a loop and prove that it is enough to
    loop a limited amount of times, where this limit depend on the order of the permutation
    π. Unfortunately, we were not able to generalise our reasoning to similar problems.

3
  • Aldo Henrique Dias Mendes
  • Multi-agent Architecture with Distinct Reasoning Models for Resource Management across Multiple Cloud Providers

  • Advisor : CELIA GHEDINI RALHA
  • COMMITTEE MEMBERS :
  • ALETEIA PATRICIA FAVACHO DE ARAUJO VON PAUMGARTTEN
  • ALEXANDRE DA COSTA SENA
  • CELIA GHEDINI RALHA
  • LUCIA MARIA DE ASSUMPCAO DRUMMOND
  • RICARDO PEZZUOL JACOBI
  • Data: Feb 2, 2024
    Ata de defesa assinada:


  • Show Abstract
  • Nowadays, scientific and commercial applications are often deployed to cloud environments requiring multiple resource types. This scenario increases the necessity for efficient resource management. However, efficient resource management remains challenging due to the complex nature of modern cloud-distributed systems since resources involve different characteristics, technologies, and financial costs. Thus, optimized cloud resource management to support the heterogeneous nature of applications balancing cost, time, and waste remains a challenge. Multi-agent technologies can offer noticeable improvements for resource management, with intelligent agents deciding on Virtual Machine (VM) resources. This article proposes MAS-Cloud+, a novel agent-based architecture for predicting, provisioning, and monitoring optimized cloud computing resources. MAS-Cloud+ implements agents with three reasoning models including heuristic, formal optimization, and metaheuristic. MAS-Cloud+ instantiates VMs considering Service Level Agreement (SLA) on cloud platforms, prioritizing user needs considering time, cost, and waste of resources providing appropriate selection for evaluated workloads. To validate MAS-Cloud+, we use a DNA sequence comparison application subjected to different workload sizes and a comparative study with state-of-the-art work with Apache Spark benchmark applications executed on the AWS EC2. Our results show that to execute the sequence comparison application, the best performance was obtained by the optimization model, whereas the heuristic model presented the best cost. By providing the choice among multiple reasoning models, our results show that MAS-Cloud+ could provide a more cost-effective selection of the instances reducing approx 58% of execution average cost of WordCount, Sort and PageRank BigDataBench benchmarking workloads. As for the execution time, the WordCount and PageRank present reduction, the latter with approx 58%. The results indicate a promising solution for efficient cloud resource management.

4
  • Lucas Angelo da Silveira
  • Modelo de Ilhas Paralelo Heterogêneo Dinamicamente Reconfigurável.

  • Advisor : MAURICIO AYALA RINCON
  • COMMITTEE MEMBERS :
  • MAURICIO AYALA RINCON
  • DANIEL MAURICIO MUNOZ ARBOLEDA
  • TELMA WOERLE DE LIMA SOARES
  • CARLOS ARTEMIO COELLO COELLO
  • LEANDRO DOS SANTOS COELHO
  • Data: Apr 26, 2024


  • Show Abstract
  • Optimization problems are encountered in various fields of activity, and as the understanding and practice in these fields advance, their complexities become more pronounced. Several bioinspired algorithms have been proposed in recent decades to address optimization problems. Each of these algorithms possesses unique characteristics that impact the evolutionary process and the quality of the solutions achieved in distinct ways.
    The parallel island model is a strategy for parallelizing bioinspired algorithms that yields significant gains in solution accuracy. In this model, the set of candidate solutions is divided into subpopulations known as islands. Each island evolves its set of solutions through its own bioinspired algorithm, operating in parallel with the other islands. Periodically, islands exchange solutions through the migration process. This movement of solutions between islands is conditioned by the model's topology and a set of rules comprising the migration policy.
    This work proposes a new implementation approach for parallel island models inspired by heterogeneity and algorithmic reconfiguration, introducing the stagnation-based reconfigurable heterogeneous island models. Heterogeneity allows the execution of different bioinspired algorithms on islands, increasing model diversity. At the same time, algorithmic reconfiguration replaces the applied bioinspired algorithm when islands' stagnation is detected. During the evolutionary process, each island maintains a record of its progress, measured by the fitness of the best individual in each island in the current and previous two generations. Whenever an island presents \textit{stagnation}, i.e., no progress is detected, the island is \textit{reconfigured} to continue the evolutionary process by executing the best-bioinspired algorithm up to that point. This approach is beneficial for dealing with optimization problems where finding optimal solutions in polynomial time is impractical. Additionally, it stands out for its autonomy, as it does not require user intervention to perform reconfiguration. The automatic adjustment of the model and the decision for reconfiguration are determined by stagnation.

5
  • Italo Barbosa Brasileiro
  • Core Switching Paradigms in Multi-core Elastic Optical Networks

  • Advisor : ANDRE COSTA DRUMMOND
  • COMMITTEE MEMBERS :
  • ANDRE CASTELO BRANCO SOARES
  • ANDRE COSTA DRUMMOND
  • GUSTAVO BITTENCOURT FIGUEIREDO
  • PRISCILA AMERICA SOLIS MENDEZ BARRETO
  • Data: Jun 27, 2024


  • Show Abstract
  • Elastic Optical Networks (EONs) emerge as a technology for efficient spectral allocation in optical fibers. A single EON fiber supports multiple circuits in parallel, allocating distinct spectral channels and accommodating circuits with variable bandwidth requirements. Multi-core fibers (MCF) emerge to increase resource availability further. The MCF can hold multiple cores (usually 7 or 12). Conceptually, each MCF operates as a group of single-core fibers. MCFs enable Spatial Division Multiplexing (SDM) in EON, which increases spectral resources by utilizing the different spatial channels (cores). The current SDM-EON literature branches into two main paradigms: core-constrained and Spatial Lane Change (SLC). The former defines architectures where a circuit must remain in the same core along its route. The latter, SLC architectures, allow the core switching along the route. The impact caused by the two architectures lies in the deployment and energy cost versus the degree of flexibility in the resource allocation procedure. This thesis proposes solutions to improve resource utilization efficiency in both paradigms. A resource allocation solution with dedicated cores for different circuit categories is proposed for architectures focused on the core-constrained paradigm. This solution aims to maintain circuits that are more resilient to physical interferences in the most affected cores, besides adopting techniques to reduce spectral fragmentation in allocation. For the SLC paradigm, this thesis presents a solution adapted to network defragmentation scenarios. The proposed solution combines a defragmentation heuristic with two techniques for spectral reorganization without service interruption or suspension. The goal is to reduce the fragmentation state whenever a trigger is activated, resulting in greater resource availability after spectral reorganization. Finally, a novel core switching paradigm, named sparse core switching, is introduced, which entails an architecture on which different nodes in the network possess distinct degrees of flexibility to perform core switching. The main objective is to reduce drastically the deployment cost while maintaining switching flexibility only in more advantageous nodes. This approach saves resources on multiple levels and performs efficiently compared to core-constrained and SLC approaches.

2023
Dissertations
1
  • GEOVANA RAMOS SOUSA SILVA
  • Fatores Humanos no Design de Interações de Chatbot: Práticas de Design Conversacional

  • Advisor : EDNA DIAS CANEDO
  • COMMITTEE MEMBERS :
  • EDNA DIAS CANEDO
  • GENAINA NUNES RODRIGUES
  • ANA PAULA CHAVES STEINMACHER
  • MAIRIELI SANTOS WESSEL
  • Data: Jan 31, 2023


  • Show Abstract
  • Context: Chatbots are intelligent agents that mimic human behavior to carry on meaningful conversations. The conversational nature of chatbots poses challenges to designers since their development is different from other software and requires investigating new practices in the context of human-AI interaction and their impact on user experience. Since chatbots usually act as a brand's representative, improving the conversational experience for users directly impacts how users perceive the organization the chatbot represents. Objective: The objective of this work is to identify textual, visual, or interactive elements of text-based chatbot interactions and how these elements can potentiate or weaken some perceptions and feelings of users, such as satisfaction, engagement, and trust, for the creation of the Guidelines for Chatbot Conversational Design (GCCD) guide. Method: We used multiple research methods to generate, validate and verify the guide. First, we conducted a Systematic Literature Review (SRL) to identify conversational design practices and their impacts. These practices were used in the GCCD guide through qualitative analysis and coding of SLR results. Then, the guide was validated through a survey to implement improvements regarding its presentation. Results: The guide's validation by software developers with different levels of experience showed that they strongly agreed that the guide could induce greater user satisfaction and engagement. Furthermore, they also strongly agreed that the guide is clear and understandable, as well as easy and flexible to use. Although participants suggested some improvements, they reported that the guide's main strengths are objectivity and clarity. Conclusion and Future Work: The guide proved to be useful for developers with different levels of knowledge, with the potential to become a strong ally for developers in the conversational design process. In the next step, a case study will be carried out to verify the guide's effectiveness when used in chatbot conversations.

     
2
  • Danilo José Bispo Galvão
  • Uma abordagem para verificação de missões multi-robôs em alto nível no UPPAAL

  • Advisor : GENAINA NUNES RODRIGUES
  • COMMITTEE MEMBERS :
  • GENAINA NUNES RODRIGUES
  • RODRIGO BONIFACIO DE ALMEIDA
  • VANDER RAMOS ALVES
  • RADU CALINESCU
  • Data: Jan 31, 2023


  • Show Abstract
  • The need to leverage means to specify robotic missions from a high abstraction level has gained momentum due to the popularity growth of robotic applications. As such, it is paramount to provide means to guarantee that not only the robotic mission is correctly specified, but that it also guarantees degrees of safety given the growing complexity of tasks assigned to Multi-Robot Systems (MRS). Therefore, robot missions now need to be specified and formally verified for both robots and other agents involved in the robotic mission operation. However, many mission specifications lack a streamlined verification process that ensures that all mission properties are thoroughly verified through model checking. This work proposes a preliminary model checking process for mission specification and decomposition of MRS in UPPAAL model checker. In particular, we present a semi-automated model containing hierarchical domain definition properties transformed into UPPAAL templates and mission properties formalized into the UPPAAL timed automata language TCTL. In the future, we intend to fully generate models automatically and verify additional mission specification properties not covered currently. We have evaluated our approach in a robotic mission of a food logistic mission specification and results show that the expected behaviour is correctly verified and the corresponding properties satisfied in the UPPAAL model checking tool.

3
  • Guo Ruizhe
  • Improving the quality of the Chinese to Portuguese machine translation with RoBERTa

  • Advisor : LI WEIGANG
  • COMMITTEE MEMBERS :
  • LI WEIGANG
  • MARISTELA TERTO DE HOLANDA
  • THIAGO DE PAULO FALEIROS
  • ZHAO LIANG
  • Data: Jan 31, 2023


  • Show Abstract
  • The continuous changes in the information age have promoted the development of the translation field, and machine translation, accompanied by the rise of artificial intelligence, is showing a trend of prosperity and development. Machine translation is an important topic in natural language processing. The application of neural machine translation in machine translation has been revived and developed in recent years. With the introduction of excellent algorithms and the improvement of computer computing power, neural machine translation has shown great potential.

         There are big differences in the form and expression of the language between Portuguese and Chinese, and the communication between Chinese and Portuguese is in the development stage, and the basic translation materials are very scarce. The study of automatic translation between Chinese and Portuguese will not only help Chinese and Portuguese speaking populations, but it is also an important topic for translation between languages where basic data are scarce.

         This dissertation presents a study on Neural Machine Translation (Neural Machine Translation) for the language pair Portuguese (PT)-Chinese (ZH) and adds the Chinese-Portuguese (Brazil) and Portuguese (Brazil)-Chinese translation directions. The objective is to seek a more suitable model among the above languages with advanced algorithms and architectures, in order to improve the current level of Chinese-Portuguese translation, as well as the level of Chinese-Portuguese (Brazil) translation.

         State-of-the-art translation models are used in Chinese-Portuguese machine translation. The algorithm RoBERTa is the most advanced, the mixed word segmentation framework is used for pre-training, and BERT is used for subsequent translation. In the available and public Chinese-Portuguese parallel corpus, select the Opensubtitles2016 that has the largest amount of data. And it uses BLEU and Rouge-two evaluation indicators that are more versatile in machine translation.

         In the end, we got the results of the impacts of different factors on Chinese-Portuguese machine translation under existing resources and a better model of Chinese-Portuguese machine translation, at the same time discovering some effective works that should be done in the field of Chinese-Portuguese machine translation in the future.

4
  • Yuri Barcellos Galli
  • Machine Learning as an aid in detecting signs of osteoporosis by analyzing oral panoramic radiographs

  • Advisor : BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • COMMITTEE MEMBERS :
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • FLAVIO DE BARROS VIDAL
  • PEDRO DE AZEVEDO BERGER
  • FÁBIO WILDSON GURGEL COSTA
  • Data: Feb 8, 2023


  • Show Abstract
  • Osteoporosis is synonymous with bone fragility, and it is a silent disease that is only detected
    commonly after it has already caused harm to the person who has it. This disease of frailty
    bone makes the fracture more common and more harmful to its bearers, and for this reason it is a
    public health issue. Identifying disease at an early stage is essential to help
    prevent its damage, and in this task artificial intelligence and machine learning have
    shown great help in recent years. Machine learning algorithms
    can predict the risk of osteoporosis by analyzing patient images from
    routine tests, such as panoramic radiographs.
    The proposed methodology has a two-step process, which is composed of
    image preprocessing and machine learning. The image preprocessing
    consisted of transforming the raw original panoramic buccal images into regions of
    reduced interest, more specific and clearer for classification. The learning stage
    machine learning consisted of supplying these preprocessed images to algorithms
    computations classify them. This work proposes a Neural Network architecture
    Convolutional (CNN), compared with Support Vector Machine (SVM) and Random Forest
    (RF), which aims to identify signs of osteoporosis in this type of image, with the aim of
    improve the results of reference technique, the CNN of the article [1], when using the s-
    CNN trutra modified to perform an automatic detection of osteoporosis with high
    sensitivity and use the RF method for a system with high specificity.
    In sensitivity, the proposed customized CNN obtained 77.19%, while the CNN of
    reference obtained 70.18%. In specificity, the proposed Random Forest obtained 75.95%,
    while the reference CNN obtained 22.78%. Combining these results,
    we continue to improve what is obtained by the reference technique for the dataset
    presented, which is quite challenging, of elderly patients from poor communities, in
    analog images with various artifacts and characteristics that make classification difficult.

5
  • Beatriz Fragnan Pimento de Oliveira
  • NoSQP-Based Data Warehouse Lifecycle: Architectural Adaptations and Performance Analysis

  • Advisor : MARISTELA TERTO DE HOLANDA
  • COMMITTEE MEMBERS :
  • MARISTELA TERTO DE HOLANDA
  • ALETEIA PATRICIA FAVACHO DE ARAUJO VON PAUMGARTTEN
  • CELIA GHEDINI RALHA
  • DANIEL CARDOSO MORAES DE OLIVEIRA
  • Data: Feb 23, 2023


  • Show Abstract
  • The Data Warehouse (DW) context is constantly changing in public and private organizations. Considering that DWs originally relied on relational databases, with the emergence of Big Data, new proposals for the management of large volumes of data have been defined in the literature, motivating an investment in alternative solutions by several organizations. As the center of a Decision Support System (DSS), the DW needs to extract value from this large mass of available data. Thus, one of the existing alternatives is to use Not-only SQL (NoSQL) solutions to model and process DW, due to its flexibility and scalability characteristics. In this context, this work aims to analyze the challenges arising from the adoption of the new paradigm (NoSQL) and to suggest an adaptation to the DW life cycle proposed by Kimball, when migrating to the NoSQL paradigm, for different pre-built NoSQL databases. selected. Subsequently, a case study will be carried out to develop DW based on NoSQL databases with open data from the Brazilian Army. With the implementation of the case study, it will be possible not only to verify the influence of data modeling on the performance of the selected queries, but also to perform a performance comparison of the relational and non-relational paradigms.

6
  • PEDRO BORGES PIO
  • Noise detection algorithm recommendation using meta-learning

  • Advisor : LUIS PAULO FAINA GARCIA
  • COMMITTEE MEMBERS :
  • ANDRE CARLOS PONCE DE LEON FERREIRA DE CARVALHO
  • LUIS PAULO FAINA GARCIA
  • THIAGO DE PAULO FALEIROS
  • VINICIUS RUELA PEREIRA BORGES
  • Data: Feb 24, 2023


  • Show Abstract
  • This work implements a noise detection algorithm recommendation using meta-learning techniques. First, a systematic review of the literature on the subject of meta-learning for preprocessing algorithm recommendation was performed. The review verified which preprocessing techniques, meta-features, machine learning algorithms and performance metrics are commonly used in the area of recommending preprocessing algorithms. Next, two different approaches were implemented for recommending noise filters using meta- learning techniques. The first is a ranking approach (MtL-Rank), which performs the suggestion using regressors and predicts the value of the performance metric f1-score. The other approach performs the recommendation through a sequence of linked classifiers (MtL-Multi). The performance of the approaches was also evaluated when recommending the filters together with their hyperparameters. In total, we used eight noise filters or 27 when considering their hyperparameter variations, four machine learning techniques to extract the performance metric and three meta-rankers or meta-classifiers to perform the recommendation. The system is evaluated at both the meta and base levels. At the meta level, the performance of a meta-learner is evaluated through their accuracy. At the base level, the average gain in the performance metric (f1-score) is verified. The results showed that the MtL-Rank approach obtained a higher average gain at the base level, obtaining significantly better results than the filter used as baseline. On the other hand, the MtL-Multi approach obtained better results at the meta level, reaching an accuracy up to 49%. In addition, it was verified that the suggestion of hyperparameters together with the noise filter can generate a gain in the performance when compared with only recommending the filter.

7
  • Matheus Schmitz Oliveira
  • A Contextual Deep Reinforcement Learning Trading Model for the Brazilian Stock Market

  • Advisor : GERALDO PEREIRA ROCHA FILHO
  • COMMITTEE MEMBERS :
  • GERALDO PEREIRA ROCHA FILHO
  • MARCELO ANTONIO MAROTTA
  • VINICIUS RUELA PEREIRA BORGES
  • RENATO HIDAKA TORRES
  • Data: Mar 10, 2023


  • Show Abstract
  • Stock exchanges have been present in society over the last few centuries, being fundamental for moving the economy and building great fortunes. However, company prices fluctuate, making the task of identifying the best opportunities for buying and selling their shares a challenge. In this sense, the use of algorithms for automatic trading on stock exchanges gained evidence, showing positive characteristics such as efficiency and support of feelings in decision-making. Reinforcement Learning is applied to problems involving sequences of decisions in complex environments, being promising for modeling asset trading environments. Despite the significant advance seen in recent news, a gap was identified related to the combination of numerical market data and textual data from multiple sources of information. Thus, the present work fills a gap found when investigating, proposing, and validating the development of a contextual model based on Deep Reinforcement Learning for the individualized trading of assets in the Brazilian financial market. The proposal was evaluated in four different scenarios, based on the combination between the amount of data used and various reward schemes adopted by the DRL agent of the trained contextual model. For the evaluation, three benchmarks were chosen: initial investment, Buy & Hold of the specific company and Buy & Hold of BOVA11. The results appreciated that the developed contextual model outperformed the invested initial equity in 94.5% of cases in the best case. Furthermore, the scenarios that used the Sharpe ratio as the reward function reported more net worth above the selected benchmarks. Finally, all scenarios consider the simulation of transaction fees charged by financial institutions, making the result even more realistic

8
  • Rodrigo Pereira de Mesquita
  • Guide for Elicitation techniques applied to Agile Software Development

  • Advisor : EDNA DIAS CANEDO
  • COMMITTEE MEMBERS :
  • EDNA DIAS CANEDO
  • ALETEIA PATRICIA FAVACHO DE ARAUJO VON PAUMGARTTEN
  • VANDER RAMOS ALVES
  • SABRINA DOS SANTOS MARCZAK
  • Data: Apr 27, 2023


  • Show Abstract
  • Background: Requirements elicitation techniques are essential to support requirements
    engineers to gain a better understanding of the needs of users and stakeholders. Al-
    though there are several techniques available to support the Requirements Engineering
    (RE) software development teams might be doubtful about which technique to use during
    requirements elicitation. Objective: The goal of this work is to identify Requirements
    Elicitation (RE) Techniques most used in the literature and compare with the techniques
    most used by professionals in the industry. In addition, we identified the challenges re-
    lated to requirements elicitation, the pros and cons of the main techniques identified in
    the literature, and based on the pros and cons, analyze possible combinations of require-
    ments elicitation techniques that can minimize the challenges identified in literature and
    industry. Method: We performed a Systematic Literature Review (SLR) to identify
    requirements elicitation techniques and challenges discussed in the literature or industry.
    Moreover, we performed a Survey to investigate the perception of software practitioners
    (individuals working in the software industry in a large variety of roles and positions)
    in relation to the techniques identified and subsequently compare them with the results
    obtained in the SLR. Finally, using Focus Group technique, we executed two validation
    sessions with nineteen specialists to evaluate technique combinations and the findings
    provided on this guide. Results: 54 primary studies were identified in the SLR and they
    demonstrated that traditional techniques are still the most used in both literature and
    software industry projects. In addition, some techniques, such as Persona, are gaining
    ground, helping requirements engineers to find different ways to elicit requirements from
    end users and stakeholders. Moreover, we have investigated combinations of techniques
    already discussed and presented in literature, also based on the strengths found in the
    literature for each technique, it was possible to identify combinations of techniques that
    by the forces or points in favor identified in the SLR, could be combined to overcome most
    of the challenges identified. Furthermore, validation sessions provided the view of special-
    ists that complemented the techniques and combinations under use by the community.
    Conclusion The most mentioned techniques in the literature at the same time they are
    used in the software industry are: Prototyping, Interview, User Stories, Brainstorming,
    Observation, Scenarios, Questionnaires and Mind Mapping. In addition, Ethnography,
    Joint Application Development (JAD) and Workshop have many references in literature
    while are not appealing for real projects in industry. On the other hand, Persona at least
    from the papers retrieved during this search is not largely discussed in literature whereas
    it has shown to be widely used in the industry. Combine the use of RE techniques can
    help overcome the challenges identified in the literature. A guide with a description of all
    techniques identified in the literature, containing their advantages and disadvantages can
    support the requirements engineers during the requirements elicitation. The provision
    of this study to support software practitioners in eliciting requirements, will allow the
    software engineering community to contribute feedback related to the combination of the
    use of techniques, thus allowing an improvement and dissemination of the perceptions of
    the combinations made between the RE techniques by the professionals of the software
    development teams. Hence, the guide can support software practitioners in choosing the
    techniques to be used and (or) combined.

9
  • Cristiano Perez Garcia
  • Intelligent and Safe UAM with Deep Reinforcement Learning

  • Advisor : LI WEIGANG
  • COMMITTEE MEMBERS :
  • LI WEIGANG
  • GERALDO PEREIRA ROCHA FILHO
  • MARCELO ANTONIO MAROTTA
  • MARCELO XAVIER GUTERRES
  • Data: Jun 15, 2023


  • Show Abstract
  • Aircraft with electric propulsion and capable of performing vertical takeoff and landings, also known as eVTOL, are under development by several manufacturers and have the potential to revolutionize urban air mobility in the coming years. Adoption tends to be gradual, but once a certain level of maturity of this type of transport is reached, the expected large number of simultaneous flights will pose challenges for air traffic control systems. In addition, these aircraft are expected to be able to operate without a pilot on board. Furthermore, aircraft are supposed to fly on direct routes, making detours only when necessary. Therefore, a set of conflict detection and resolution systems is desired to act redundantly. One of those systems is the one responsible for the tactic conflict resolution. This requires developing specific tools to meet the new scenario, consisting of aircraft with performance characteristics that are not yet existent. This work investigates the possibility of using deep reinforcement learning models to solve this problem. Conflict detection can be performed independently using embedded systems as sensors, such as ADS-B. After the training phase, deep reinforcement learning models can suggest actions to achieve the desired goal even in scenarios that have not been previously observed. This capability makes these models suitable for solving the problem of conflict resolution since it is impracticable to train a system with all possible conflict configurations. A system based on Deep Q Network models was used to manage the trajectories in case of conflict detection. It carried out route deviations to resolve the conflict and deviated the aircraft involved the minimum necessary from their ideal trajectories. A customized simulator was implemented to perform tests using several deep reinforcement learning agents and compare them with alternative strategies. The obtained results indicate that the models can suggest maneuvers capable of reducing the number of conflicts without significantly affecting displacement or fuel consumption.

10
  • Rafael Oliveira Ribeiro
  • Methods for calculating the likelihood ratio for using face recognition systems in forensic scenarios.

  • Advisor : FLAVIO DE BARROS VIDAL
  • COMMITTEE MEMBERS :
  • DAVID MENOTTI GOMES
  • DIBIO LEANDRO BORGES
  • FLAVIO DE BARROS VIDAL
  • JOÃO CARLOS RAPOSO NEVES
  • Data: Jun 19, 2023


  • Show Abstract
  • Forensic face comparison is becoming more relevant as the number of devices with image recording capabilities increases, with a consequential increase in the number of crimes in which the face of the perpetrator is recorded. This forensic examination is still based on the manual analysis and comparison of morphological features of the faces. Its results are expressed qualitatively, making it difficult to reproduce and combine with other evidence. This work evaluates methods to obtain a quantitative result for the examination, with the computation of score-based Likelihood-Ratio - LR. Face recognition systems are used to obtain scores that are then converted to an LR. The methods investigated in this work facilitate reproducibility, a critical aspect in forensics, and it also allows for the empirical validation of performance in the conditions of each forensic case. We evaluate parametric and non-parametric methods for LR computation. Two open-source face recognition models were used (ArcFace and FaceNet) on images from five datasets that are representative of common scenarios in forensic casework: images from social media and images from CCTV cameras. We also investigate strategies for embedding aggregation in cases where there is more than one image of the person of interest. These experiments demonstrate substantial improvements in forensic evaluation settings, with improvements in Cllr of up to 95% (from 0.249 to 0.012) for CCTV images and of up to 96% (from 0.083 to 0.003) for social media images.

11
  • Ismael Coelho Medeiros
  • DogeFuzz: a Framework for Conducting Exploratory Studies of Fuzzing Test to Analyze Smart Contract

  • Advisor : RODRIGO BONIFACIO DE ALMEIDA
  • COMMITTEE MEMBERS :
  • RODRIGO BONIFACIO DE ALMEIDA
  • EDUARDO ADILIO PELINSON ALCHIERI
  • GENAINA NUNES RODRIGUES
  • WILKERSON DE LUCENA ANDRADE
  • Data: Jul 7, 2023


  • Show Abstract
  • Smart contracts are Turing-complete programs that are executed in a blockchain network. Many times, this type of program stores valuable digital assets and in a blockchain such as Ethereum, each smart contract’s binary is public and transparent, and that is why it can be accessed by anyone. This makes this type of program to be a constant target for many kinds of attack e its security to be critical. This work aims to experiment advanced fuzzing techniques in automatic vulnerability detection in smart contracts. The technique to be explored is directed greybox fuzzing, which has the objective of generating input that explore specific points of the program. For that, the tool ContractFuzzer will be extended to use this new technique. To evaluate the performance of this version of the tool, it is necessary to create a dataset to experiment the exploration capacity of the two versions of ContractFuzzer.

12
  • Rodrigo Cardoso Aniceto
  • Helping the plagiarism detection process in introductory programming courses

  • Advisor : MARISTELA TERTO DE HOLANDA
  • COMMITTEE MEMBERS :
  • ALETEIA PATRICIA FAVACHO DE ARAUJO VON PAUMGARTTEN
  • DILMA DA SILVA
  • MARISTELA TERTO DE HOLANDA
  • VINICIUS RUELA PEREIRA BORGES
  • Data: Jul 12, 2023


  • Show Abstract
  • This work proposes an application to help teachers identify students suspected of plagiarism in source codes in an introductory programming course with a virtual teaching environment. This is done through the integration of automatic plagiarism detection tools with data on student behavior in the course, for the generation of unified reports. This behavioral data includes the assignment submission pattern and classroom data such as attendance and grades. It can be applied in distance or face-to-face teaching. This application will be tested with real data in order to simplify the plagiarism identification process. It is also expected to learn more about the profile of students who copy source codes to outline policies aimed at reducing the occurrence of this practice.

13
  • RUBENS MARQUES CHAVES
  • Financial distress forecast on imbalanced data stream

  • Advisor : LUIS PAULO FAINA GARCIA
  • COMMITTEE MEMBERS :
  • LUIS PAULO FAINA GARCIA
  • CELIA GHEDINI RALHA
  • THIAGO DE PAULO FALEIROS
  • RICARDO CERRI
  • Data: Jul 20, 2023


  • Show Abstract
  • Corporate bankruptcy predictions are important to companies, investors and authorities. However, as most bankruptcy prediction models in previous studies have been based on a single time dimension, they tend to ignore the two main characteristics of financial distress data, unbalanced data sets and data stream concept drift. To overcome them, this study tries to identify the most appropriate techniques for dealing with these problems in financial statements provided quarterly by companies to the CVM, using a system of sliding windows and a forgetting mechanism to avoid the degradation of the predictive model. An empirical experiment was carried out on a sample of data collected from the CVM open data portal, over a period of 10 years (2011 to 2020), with 905 different corporations, 23,468 records with 102 indicators each. The majority, 21,750 companies, have no financial difficulties and 1,718 companies have financial difficulties. Due to characteristics of the problem, especially the data unbalance, the performance of the model was measured through AUC (area under the ROC curve), G-measure and F-measure.

14
  • NIKSON BERNARDES FERNANDES FERREIRA
  • Improving the Safety of Numerical Programs

  • Advisor : MAURICIO AYALA RINCON
  • COMMITTEE MEMBERS :
  • AARON DUTLE
  • LAURA TITOLO
  • MAURICIO AYALA RINCON
  • VANDER RAMOS ALVES
  • Data: Jul 21, 2023


  • Show Abstract
  • This work discusses how the presence of round errors involved in real-world implementations of the
    NASA management system for unmanned vehicles DAIDALUS affect the overal safety of the system. The
    DAIDALUS library provides formal definitions for avionics’ Detect and Avoid concepts mechanically
    demonstrated in the proof assistant PVS. However, such verifications are just certificates of the well-
    behavior of the specification from the logical point of view, which do not guarantee the accuracy
    of the algorithms implemented under floating-point arithmetic.Our analysis assumes the IEEE 754
    floating point standards, implemented in several programming languages, and the verification technique
    is grounded on generating a first-order specification of the numerical computations. A prominent feature
    of the approach is splitting the specification into slices defined according to the different computation
    branches. Slicing is crucial to simplify the formal analysis of floating point arithmetic computations

    Nota: Suplente Mariano Miguel Moscato - National Insitute of Airspace - NASA LaRC

15
  • Fernanda Amaral Melo
  • A Meta-Learning Approach for Concept Drift Detection

  • Advisor : LUIS PAULO FAINA GARCIA
  • COMMITTEE MEMBERS :
  • LUIS PAULO FAINA GARCIA
  • GERALDO PEREIRA ROCHA FILHO
  • VINICIUS RUELA PEREIRA BORGES
  • ANDRE CARLOS PONCE DE LEON FERREIRA DE CARVALHO
  • Data: Aug 30, 2023


  • Show Abstract
  • Advances in the data generation and transmission have enhanced the existence of many data flows applications, these highly dynamic environments often comes with the concept drift problem, phenomenon in which the statistical properties of the variables change over time, resulting in the performance loss of Machine Learning models. This work presents a new concept drift detection tool for Machine Learning systems through the use of Meta Learning. The algorithm was proposed for data stream like problems containing concept drift with large target arrival delay, Meta Learning was choosen because of its robustness and adaptation to data stream problems, however, unlike the traditional algorithm recommendation Meta Learning approach, a regressor was used at the meta level to predict the base model performance, these predictions can be used to generate concept drift alerts before the target arrival. The meta model training included the use of several unsupervised Meta Features from the Meta Learning literature, in addition, unsupervised concept drift detection metrics were added to the attributes in order to increase the predictive power of the generated meta regressor. The algorithm was applied in commonly used data streams databases and the performance at the meta level is evaluated through the Mean Squared Error compared to the original Meta Learning approach and to the baseline, a measure from the last known performance window. Finally, the importance of the variables for the meta regressor is analyzed to find the real contribution of the MFe proposed in this research, such as the concept change detection measures. Preliminary results show that the proposed algorithm generates, on average, an error reduction of 12.8% compared to the traditional Meta Learning and 38% compared to the baseline in predicting the performance of the base model. Future work includes the concept drift alert definition based on the meta model predictions and the comparison of the proposed technique with existing concept drift metrics databases with labeled concept drift.

16
  • Paulo Victor Gonçalves Farias
  • Proposal for Congestion Control Sensitive to Network Security and Positioning Requirements Ad Hoc Vehicles
  • Advisor : JACIR LUIZ BORDIM
  • COMMITTEE MEMBERS :
  • EDUARDO ADILIO PELINSON ALCHIERI
  • JACIR LUIZ BORDIM
  • JO UEYAMA
  • MARCELO ANTONIO MAROTTA
  • Data: Sep 22, 2023


  • Show Abstract
  • Periodic message transmission is one of the possible ways to enable operation of applications in a Vehicular Ad-Hoc Network (VANET). These messages are known as beacons and they consist of information about a vehicle’s position, speed, and direction. Safety applications in VANETs dictate that beacons must be sent at high frequencies to en-
    sure reliable and efficient operation. However, due to the transmission channel restraints, the number of messages being transmitted simultaneously can lead to collisions in environments with high vehicle density. When there is a high number of collisions, new messages will not be sent correctly. This event is called broadcast storm. Congestion
    control algorithms have been proposed with effective solutions to this problem, adjusting parameters like transmission rate and transmission power. Despite this, it was noticed that these techniques can impair the operation of safety applications that rely on positioning information with a certain level of accuracy. In this work, a Proactive Geocast Beacon Transmission Algorithm(PGBTA) is proposed as a solution to control network congestion and ensure application positioning requirements. In PGBTA, beacons are transmitted by geocast, where neighbors of a node are divided into geocast groups according to their distance.PGBTA prioritizes sending more frequent updates to groups of neighbors at near distances through a position prediction mechanism, which considers the position accuracy requirements defined in the literature. Simulations were carried out in a real scenario where it was possible to verify the possibility of implementing the PGBTA algorithm, considering metrics such as number of beacons generated, delay between beacon transmissions, number of neighbors and positioning error.

     
     
17
  • Herval Alexandre Dias Hubner
  • Analysis of evolution of software product lines

  • Advisor : VANDER RAMOS ALVES
  • COMMITTEE MEMBERS :
  • GENAINA NUNES RODRIGUES
  • LEOPOLDO MOTTA TEIXEIRA
  • VANDER RAMOS ALVES
  • VINICIUS RUELA PEREIRA BORGES
  • Data: Oct 26, 2023


  • Show Abstract
  • In the current software engineering scenario, Software Product Lines (LPS) emerge as a fundamental approach to face the challenges of mass customization. LPS allow the construction of individual solutions based on reusable components, providing efficiency and flexibility in software development. LPS are fundamental for improving productivity and quality in software development, thanks to the reuse of components and rapid adaptation to new requirements. Variability is central to LPS, facilitating adaptation to diverse product situations through features that can be activated. LPS analysis is crucial for identifying common and variant requirements, but faces challenges of lack of hard data and time constraints. However, LPS analysis faces challenges, such as the scarcity of empirical studies characterizing and detailing the evolution of LPS. In this work, we developed the ASTool tool (software for analyzing changes in the abstract syntax tree) to examine several Software Product Lines and thus characterize their evolution. The results obtained in this analysis reveal that, with regard to the average depth of changes in the Abstract Syntax Tree (AST), modifications occur at superficial levels, that is, close to the root of the syntax tree. In relation to the average number of files changed per commit, a significant number of files modified per commit were observed. As for the average number of gaps between changed lines in the code, the values obtained indicate a low frequency of changes. The results of this study may influence the decision whether or not to use the memorization technique to improve the effectiveness of analyses.

18
  • Lucélia Vieira Mota
  • Creation of labeled dataset in Portuguese from the weak supervised approach

  • Advisor : THIAGO DE PAULO FALEIROS
  • COMMITTEE MEMBERS :
  • ALAN DEMÉTRIUS BARIA VALEJO
  • GERALDO PEREIRA ROCHA FILHO
  • LUIS PAULO FAINA GARCIA
  • THIAGO DE PAULO FALEIROS
  • Data: Dec 14, 2023


  • Show Abstract
  • Training data labeling has become one of the main obstacles to the use of machine learning. Among several data labeling paradigms, weak supervision has shown as an opportunity to alleviate the bottleneck of manual labeling, since from supervision we can programmatically synthesize the training of labels from multiple sources generated by potentially noisy supervision. This dissertation presents experiments on one of the weak supervision application approaches. In particular, a brief literature review was carried out on the theoretical basis that supports the use of this approach and describes in general a learning and labeling workflow within the problem of named entity recognition from weak supervision. Finally, experiments were carried out to evaluate the gains of using this approach to assist in the labeling of bases within the context of Public Administration in Brazil, and thus, to inspire future research directions in the field.

19
  • Leandro Dias Carneiro
  • Assessing the influence of degradations on deep learning models used in facial recognition

  • Advisor : FLAVIO DE BARROS VIDAL
  • COMMITTEE MEMBERS :
  • FLAVIO DE BARROS VIDAL
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • CAMILO CHANG DOREA
  • HELIO PEDRINI
  • Data: Dec 21, 2023


  • Show Abstract
  • During criminal prosecution, facial recognition systems have been increasingly used, as in addition to the accuracy of the systems increasing considerably in recent years, cameras are growing on public roads, homes, and commercial establishments. Currently, most commercial systems present, as a result, a metric that represents the similarity between two faces or simply a qualitative description, leaving aside other analyses regarding the quality and actual usefulness of the material used for comparison. This work aims to estimate the impact of image degradations on facial recognition systems based on deep learning to minimize mistakes made when analyzing the result. To achieve this objective, two sequential steps will be carried out: the creation of a database and, the second, a model capable of identifying the degradation (and intensity) present in the image. The database will be created from 3 facial detection algorithms, eight facial recognition algorithms, 14 types of degradations with six intensity levels in each, and four face databases, with scores calculated for the accuracy metrics, precision, and recall. After creating the database, a deep learning model will be developed, capable of identifying the degradation present in the image. With this identification, it will be possible to consult the database results and estimate the drop in performance for new images. For the face databases analyzed, facial recognition models had a minimum impact of 17% on average and a maximum impact of 43% on average. Furthermore, the models trained in the degradation detection task had approximately 71% and 94% accuracy. Both the algorithms and the face databases are public. The project's final objective is to identify the quality limits necessary for a result considered robust by facial recognition systems. Furthermore, it creates a model capable of estimating, with reasonable accuracy, the type of degradation present in an image.

Thesis
1
  • Lucas Maciel Vieira
  • Explorando características relevantes do câncer coloretal usando dados clínicos e biológicos: um enfoque de bioinformática

  • Advisor : MARIA EMILIA MACHADO TELLES WALTER
  • COMMITTEE MEMBERS :
  • MARIA EMILIA MACHADO TELLES WALTER
  • CELIA GHEDINI RALHA
  • ANDRE CARLOS PONCE DE LEON FERREIRA DE CARVALHO
  • JOÃO CARLOS SETUBAL
  • PETER FLORIAN STADLER
  • Data: Feb 28, 2023


  • Show Abstract
  • Colorectal cancer (CRC) is one of the most frequent and lethal types of cancer around the world, being the second most frequent cancer in Brazil [1]. CRC is a heterogenous cancer that settles in the lower part of the large bowel and can be classified according to its anatomical site as:colon, rectum, and rectosigmoid junction cancer. The most common type of CRC is the adenocarcinoma, which accounts for 90% of the cases. Most of CRC deaths are related to its metastases, and if early detected, it improves considerably the patient survival chances. This disease can be impacted by many environmental aspects such as: eating habits, age, and weight. Its treatment can also differ according to its anatomical site and its recommended treatment, usually, first surgery and then chemotherapy. An inaccurate identification of the CRC anatomical site can lead to under or overtreatment, which can impact the patient’s likelihood of mortality. In order to help CRC prognosis, prevention, and treatment, it is crucial to understand the molecular mechanisms and external factors that affect CRC development and progression. 

    Regarding the biological aspects of CRC, we can describe the impact of coding and non-coding RNAs on the disease’s underlying mechanisms. In specific, we can highlight three molecules: long non-coding RNAs (lncRNAs), micro RNAs (miRNAs), and messenger RNAs (mRNAs). In eukaryotes, the mature mRNAs are formed after the pre-mRNA generated from the transcription undergoes a process known as splicing, which removes some regions (introns) of the pre-mRNA, while binding others (exons), thus forming the mature mRNA. The splicing process can generate more than one protein from a single gene in a process known as alternative splicing. The generated proteins then are used to regulate the organism’s functions by being used in metabolic reactions and affecting many biological processes, such as disease development. 

    The miRNAs play an essential role in gene expression, more specifically, by binding to mRNAs and then starting the processes of inhibition or degradation of their target. On the other hand, the lncRNAs are not directly portrayed in this mRNA expression regulation process but play essential roles, such as altering other molecules’ functions and therefore affecting protein expression affecting disease development and suppression. Given the specific role of each described molecule in disease development, recent studies also highlighted the importance of a mechanism known as competing endogenous RNAs (ceRNAs) networks, in which lncRNAs, miRNAs, and mRNAs interact among themselves. In this mechanism, the miRNAs, in addition to their capability of binding to

    the mRNAs, can also bind to the ceRNAs, which then, act as modulators of miRNAs, therefore, indirectly regulating the mRNA expression. The identification of ceRNA networks related to CRC development and its underlying mechanisms can help doctors to understand better the disease and better identify the patient’s prognosis. In literature, we can find some studies that use bioinformatic approaches to analyze and create ceRNA networks and to indicate potential prognosis biomarkers for colon, rectal, and colorectal cancer in general [2, 3, 4, 5, 6, 7, 8].

    Although some studies were done with the ceRNAs networks constructions in mind, to the best of our  knowledge, our study was the first to establish specific ceRNA networks for: (i) colon; (ii) rectum; and (iii) rectosigmoid junction, and to relate them with specific biological mechanisms in order to clarify the differences and common factors between these sites.

    On the other hand, some studies suggest the use of machine learning methods using clinical features to predict CRC patient prognosis [9, 10, 11]. In specific, Gründner et al. [9] explored a method that combined biological and clinical features to predict prognosis aspects for CRC patients from South Africa. These studies showed promising results in predicting CRC patient’s prognosis, but to the best of our knowledge, our study was the
    first one that used open data and machine learning to predict CRC recurrence and patient survival by using biological markers extracted from the colon, rectal and rectosigmoid cancer ceRNA networks combined with clinical features.

    In this thesis, as the first step, we propose a pipeline by using open-access data from patients with CRC extracted from The Cancer Genome Atlas (TCGA) to construct CRC-specific ceRNA networks and potential biological markers that affect patient prognosis. We aim to perform an analysis to identify molecules that can be used as biological markers for the three CRC anatomical sites:colon, rectum, and rectosigmoid junction. To 
    construct these networks and propose the biological markers, RNA raw expression and clinical data from the CRC patients were collected.The RNA expression profiles were assessed by the use of bioinformatic analysis tools, and a ceRNA network was constructed for each CRC anatomical site, where as output we got the ceRNA networks and the molecules present on them. After, a functional enrichment analysis was performed, where we assessed the potential biological pathways activated by the molecules obtained in the previous step. Finally, an overall survival analysis to identify the impact of these molecules on patient prognosis was performed, giving as output a list of potential biological markers. 

    As an overall result of the first pipeline of this thesis, several potential prognostic markers for colon, rectum, and rectosigmoid junction cancer were found.Also, specific ceRNA networks for each anatomical site were constructed, and we identified different biological pathways that highlight differences in CRC behavior at the different anatomical sites, thus reinforcing the importance of correctly identifying the tumor site. As output, a group of potential biological markers involved in CRC prognosis was generated, in specific, we can highlight the site-specific prognosis biomarkers: hsa-miR-1271-5p,NRG1, hsa-miR-130a-3p,SNHG16, and hsa-miR-495-3p in the colon;E2F8in the rectum; and DMDand hsa-miR-130b-3p in the rectosigmoid junction.

    With the list of potential biological markers related to CRC prognosis, then we proceeded to the second part of this thesis, the proposal of a pipeline to predict CRC recurrence and patient survival using supervised machine learning (ML) methods. Clinical factors such as age and weight, as well as biological factors, can affect CRC progression and prognosis. To better understand the mechanisms of CRC and to identify the impact of both clinical and biological factors in its prognosis, we used patient clinical features combined with the previously found biological markers as biological features, to train our ML models.In order to achieve higher predictive performance and interpretability of the proposed findings, we evaluated and compared the following ML algorithms: Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest
    Neighbors (KNN), Decision Tree (DT) e Adaptative Boosting (AB). To establish the importance of each feature while building the models to predict CRC recurrence and patient survival, first, a feature extraction analysis was performed, to filter and rank which of these features in fact have an impact on the constructed prediction model. With the selected relevant biological and clinical features in hand, we then constructed the ML models and evaluated their performance. Finally, as output, we generated ML models to predict CRC recurrence and patient survival and a list of potential biological and clinical features relevant to patient prognosis.

    Regarding the overall result from the second pipeline, several potential biological and clinical markers were pointed out as important in CRC recurrence and patient survival. For feature importance, we pointed out: SNHG16, hsa-miR-130b-3p, hsa-miR-495-3p, and KCNQ1OT1 as biological features; and age, ethnicity, pathological stage, chemotherapy, height and weight, positive lymph node count and lymph node count as clinical features. Finally, by using LR and RF we achieved the best accuracy of 90% and 82% for predicting
    patient survival and CRC recurrence, respectively. Also, the use of the six proposed ML algorithms showed an overall good performance, in specific, LR and RF displayed good overall results, which was also highlighted in other studies [9, 10, 11].

    This study strongly suggests that the use of bioinformatic approaches should be concurrently used with ML algorithms to enhance the interpretation of CRC mechanisms and patient prognosis. However, we should highlight some limiting factors such as: the amount of available data, as the number of available patients for certain anatomical sites was low; and that the data mainly consisted of patients from the USA. Following the proposed pipelines, doctors can better understand the underlying mechanisms of CRC in its anatomical sites, and also use our model to help predict patient prognosis. Finally, running these pipelines in Brazilian patient data could lead to an increase in CRC data interpretation, especially in a circumstance where there is diversity and inequality in the country’s demographic landscape, which can affect CRC prognosis.

2
  • Willian de Oliveira Barreiros Júnior
  • Execução Eficiente de Análise de Imagens de Microscopia em Máquinas Híbridas de Memória Distribuída

  • Advisor : GEORGE LUIZ MEDEIROS TEODORO
  • COMMITTEE MEMBERS :
  • GEORGE LUIZ MEDEIROS TEODORO
  • RICARDO PEZZUOL JACOBI
  • ALFREDO GOLDMAN VEL LEJBMAN
  • CRISTIANA BARBOSA BENTES
  • RENATO ANTÔNIO CELSO FERREIRA
  • Data: Mar 23, 2023


  • Show Abstract
  • The analysis of high resolution whole slide tissue images (WSIs) is a computationally expensive task, which cost adversely impacts large scale usage of pathology imaging data in research. Parallel solutions to optimize such applications have been proposed target- ing multiple devices and environments, such as CPUs, GPUs, hybrid compute nodes and distributed systems. However, the generalization of efficiently executing parallel code on hybrid and/or distributed machines remains an open challenge for digital histopathol- ogy. An application developer may have to implement multiple versions of data pro- cessing codes targeted for different compute devices. The developer also has to tackle the challenges of efficiently distributing computational load among the nodes of a dis- tributed memory machine and among computing devices within a node. This can be particularly difficult for analysis of high-resolution images with content-dependent com- puting costs. This thesis aims to provide a solution for simplifying the development of WSI analysis workflows while also enabling efficient use of distributed and hybrid (CPU- GPU) resources. For this end, a high-level execution model, coupled with an automatic workload partitioning method was proposed. In order to validate the proposed meth- ods and algorithms, a high-level image processing language (Halide) was used as a local resource (CPU/GPU) parallel solution, together with Region Templates (RT), a system for managing data/tasks coordination among distributed nodes. A novel cost-aware data partitioning strategy that considers the workload irregularity to minimize load imbalance was also developed. For it, two partitioning algorithm were proposed, the Expected Cost Bisection (ECB) and the Background Removal Bisection (BRB). Experimental results show significant performance improvements on hybrid CPU-GPU machines, as compared with using a single compute device (CPU or GPU), as well as with multi-GPU systems. The partitioning algorithms were compared with a baseline hierarchical KD-Tree (KDT) approach, on multi-GPU-only, hybrid CPU-GPU and large-scale distributed CPU nodes environments. Results show speedups of up to 2.72× for ECB and 4.52× for BRB, both compared to KDT. In addition to the simpler development model for domain experts, the attained performance for both hybrid and large-scale distributed computing environ- ments demonstrates the efficacy of the proposed system for large-scale WSI studies. Both improvements on the CADP algorithms performance and the accuracy of the execution cost estimation model are expected as future works for the proposed system.

3
  • Liriam Michi Enamoto
  • GeMGF - Generic Multimodal Gradient-Based Meta Framework

  • Advisor : LI WEIGANG
  • COMMITTEE MEMBERS :
  • LI WEIGANG
  • GERALDO PEREIRA ROCHA FILHO
  • LUIS PAULO FAINA GARCIA
  • JO UEYAMA
  • PAULO CESAR GUERREIRO DA COSTA
  • Data: Apr 13, 2023


  • Show Abstract
  • The emergence of Transformer — a model pre-trained over a large-scale dataset — and the recent new versions have revolutionized research in Machine Learning, especially in Natural Language Processing (NLP) and Computer Vision. The excellent results of Tranformer-based models depend on labeled and high-quality domain specific data. How- ever, due to the diversity of contexts in which these models are used, it is challenging to create models that learn from limited data. The model may suffer from a lack of gener- alization, language bias, and fairness issues caused by large pre-trained models, resulting in unexpected outcomes in real-world applications. This open problem leads to research in multimodal Few-Shot Learning (FSL).

    In this thesis, we propose the Generic Multimodal Gradient-Based Meta Framework (GeMGF). To compensate for the scarcity of data, we use multimodal data in which supplementary and complementary information of one modality can help the data repre- sentation. The multimodal data are extracted using deep learning models and represented in a unified vector space. The framework uses the Prototypical Network and Relation Net- work in the FSL. The Reptile — an optimization-based meta-learner — helps the model avoid model degradation with unseen data. In addition to the multimodal framework, we propose the unimodal version to evaluate the flexibility and adaptability of the framework in different scenarios.

    The framework was evaluated using ten datasets from various domains and charac- teristics, including short texts from Twitter, legal domain long text, text with alphabetic (English and Portuguese) and non-alphabetic (Japanese) languages, medical domain im- ages, and multimodal benchmark datasets. Our multimodal framework was evaluated using CUB-200-2011 and Oxford-102 datasets, outperforming the state-of-the-art model of Munjal et al. [1] by 1.43% with CUB-200-2011 and Pahde et al. [2] by 1.93% with Oxford-102. The result of the multimodal framework with CUB-200-2011 was 34.68% higher than the unimodal framework for image and 13.96% higher with Oxford-102. The results suggest that text and image data jointly helped the framework learn rich informa- tion and improve overall performance. The multimodal GeMGF is a simple and compact framework using only 14 million parameters, 99.8% less than the Multimodal Trans former. The unimodal framework for text achieved excellent results with the Japanese dataset, outperforming Transformer BERT by 58.30% with 90.90% fewer parameters. These results suggest that our framework achieved better performance with a significant computational cost reduction.

    The main contributions of our research are: (i) a novel multimodal FSL framework, GeMGF is developed to reduce the model degradation trained over a few data; (ii) GeMGF is trained without external knowledge avoiding language bias and fairness issues; (iii) GeMGF has independent and flexible feature extractors that enhance its applicability; and (iv) the unimodal framework for text can be adapted to process alphabetic and non- alphabetic languages with high performance.

4
  • Lucas Borges Monteiro
  • Conflict Detection and Resolution in ATM using 4D Trajectory Modeling based on NoSQL Databases and Search Algorithms


  • Advisor : LI WEIGANG
  • COMMITTEE MEMBERS :
  • LI WEIGANG
  • GERALDO PEREIRA ROCHA FILHO
  • VINICIUS RUELA PEREIRA BORGES
  • CLAUDIO BARBIERI DA CUNHA
  • ZHAO LIANG
  • Data: May 26, 2023


  • Show Abstract
  • The progress of science and technology has greatly increased the amount of data produced in various fields, including air transportation. Correctly handling these massive data will bring important results, because it can make decision-making more accurate. In this sense, focusing on the new paradigm of Trajectory-Based Operations (TBO) of Air Traffic Management (ATM), this work presents two models for conflict detection and resolution (CDR). The first one is based on NoSQL database and search algorithms. The second one is called 4 DNavMCTS, which also applies the concepts of Monte Carlo Tree Search (MCTS) and Vector Space Model (VSM) to modeling based on a NoSQL database. Considering the big data of air transport, in the tests carried out, the two models were able to perform the CDR under the paradigm of artificial intelligence (AI) to find and resolve potential conflicts between aircraft and improve flight safety with reasonable predictions.
    The main objectives achieved with the research were:  i) increased security for the ATM; ii) Processing a large amount of data generated by regional and global traffic in four-dimensional navigation; iii) Dealing with uncertainties of human and environmental factors such as climate and temperature; And iv) trajectory management to ensure a conflict-free scenarios even if the model itself occasionally interferes.

5
  • Thiago Mendonça Ferreira Ramos
  • Verifying the Computational Properties of a First-Order Functional Model.

  • Advisor : MAURICIO AYALA RINCON
  • COMMITTEE MEMBERS :
  • DOMINIQUE LARCHEY-WENDLING
  • LAURA TITOLO
  • MAURICIO AYALA RINCON
  • NATARAJAN SHANKAR
  • VANDER RAMOS ALVES
  • Data: Jun 15, 2023


  • Show Abstract
  • This work describes the mechanization of the computational properties of a functional-language model that has been applied to reasoning about the automation of program termination. The formalization was developed using the higher-order proof assistant Prototype Verification System (PVS). The language model was designed to mimic the first-order fragment of PVS functional specifications and is called PV0. Two different computational models are considered: the first model specifies functional programs through a unique function (single-function PVS0 model, or SF-PVS0), and the second model allows simultaneous specification of multiple functions (multiple-function PVS0 model, or MF-PVS0). The operational semantics of recursion in single-function PVS0 specification supports recursion over the whole program.

    In contrast, in multiple-function PVS0 programs,  functional calls are allowed to all functions specified in the program. This work aims to mathematically certify the robustness of the PVS0 models as universal computational models. For doing that, crucial properties and theorems were formalized, including Turing Completeness, the undecidability of the Halting Problem,  the Recursion Theorem, Rice's Theorem, and the Fixed Point Theorem. Furthermore,  the work discusses advances in the undecidability of the Word Problem and the Post Correspondence Problem.

    The undecidability of the Halting Problem was formalized considering properties of the semantic evaluation of PVS0 programs that were applied in verifying the termination of PVS specifications. The equivalence between predicative and functional evaluation operators was vital to this aim. Furthermore, the compositionality of multiple-function PVS0 programs, straightforwardly enabled by the possibility of calling different functions, makes it easy formalization of properties as Turing Completeness. Therefore, enriching the model was an important design decision to simplify the mechanization of this property and the theorems mentioned above.

6
  • Aurélio Ribeiro Costa
  • Adaptive Model to Community Detection in Dynamic Social Networks

  • Advisor : CELIA GHEDINI RALHA
  • COMMITTEE MEMBERS :
  • CELIA GHEDINI RALHA
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • RICARDO PEZZUOL JACOBI
  • DANIEL RATTON FIGUEIREDO
  • FRANCISCO APARECEIDO RODRIGUES
  • Data: Jun 16, 2023


  • Show Abstract
  • A vital problem tackled in network analysis is community structure identification. However, the current use of network analysis techniques concentrates on analyzing static community structures, which generates a gap in not considering the dynamic aspects. Some solutions for the community detection problem adapted to the dynamicity of the networks present limitations on the resulting performance, and others do not fit such contexts. This situation aggravates when considering the demand to analyze constantly growing social networks. This research aims to fulfill this gap by focusing on the topology change along a time frame and applying deep reinforcement learning methodology as an alternative solution to the problem of community detection on dynamic social networks. We propose an adaptive model to maximize the local modularity density of a community structure. Our model includes actor-critic reinforcement learning-based architecture with a graph neural network to cope with changing aspects of large social networks. Experiments conducted using the proposed architecture with synthetic and real-world dynamic social network datasets show accuracy comparable to the state-of-art solutions. Although the results indicate that the architecture copes well with dynamic real-world social networks, further investigation is necessary to improve the architecture with computational performance aspects.

7
  • Leonardo Henrique Moreira
  • Estratégias de Recuperação para Planejamento Multi-agente em Ambientes Dinâmicos

  • Advisor : CELIA GHEDINI RALHA
  • COMMITTEE MEMBERS :
  • ANTONÍN KOMENDA
  • CELIA GHEDINI RALHA
  • EDISON PIGNATON DE FREITAS
  • GENAINA NUNES RODRIGUES
  • LI WEIGANG
  • Data: Jun 30, 2023


  • Show Abstract
  • This thesis explores Multi-Agent Planning (MAP) and its application in dynamic environments. MAP combines artificial intelligence planning with multi-agent systems to coordinate intelligent agents achieving individual or group goals. Planning in dynamic environments introduces challenges in coordination and execution due to non-deterministic outcomes. Plan recovery strategies, like replanning and repairing, aim to handle failures and restore desired conditions. A comprehensive literature review highlighted key contributors and institutions in the MAP research offering insights into concepts, techniques,
    and open challenges. However, the combination of different recovery strategies for MAP models is a research challenge not yet accomplished in the present literature. In this thesis, we address this challenge by proposing an evaluation method for recovery strategies in dynamic environments, combining replanning and repairing. This approach considers planning complexity, coordination allied to execution issues, and agents attempting local repairs before seeking other agents’ assistance. The main objective and results aim to contribute to the MAP field by evaluating the combination of replanning and repairing in planning solution models for dynamic environments.

8
  • Leia Sousa de Sousa
  • Metropolitan Optical Networks: Architectures and Traffic Engineering
  • Advisor : ANDRE COSTA DRUMMOND
  • COMMITTEE MEMBERS :
  • ANDRE CASTELO BRANCO SOARES
  • ANDRE COSTA DRUMMOND
  • EDUARDO ADILIO PELINSON ALCHIERI
  • GUSTAVO BITTENCOURT FIGUEIREDO
  • JACIR LUIZ BORDIM
  • Data: Aug 24, 2023


  • Show Abstract
  • Metropolitan Optical Networks (MONs) are high-speed communication networks that interconnect different locations in a metropolitan area. Different types of applications are offered to customers through MONs, from cloud computing applications, increasingly closer to the end user, to the recent Internet of Things services. These applications are driving increasing demands from enterprise and private customers for scalable, flexible, transparent, terabit-speed, and personalized bandwidth services.
    MONs use fiber optic technology to transmit data at high speeds and from any point in your infrastructure. Unlike core optical networks, MONs have a wide variety of service granularity, with heterogeneous architectures and traffic profiles and an unbalanced distribution of traffic flows along their nodes. Because of this, MONs must be handled differently.
    For network providers, it is of great importance to determine the existing regions, such as residential areas and business areas, so that the behavior of local traffic can be analyzed, proposing interventions in the infrastructure at the critical points of the network. Currently, MONs are undergoing major transformations that include the adoption of a variety of transmission rates, subdivision into several hierarchical levels, and assignment of new roles to the various nodes. This work presents a survey of the proposed new MONs architectures, both single and multi-layered.
    In addition, traffic engineering solutions for MONs based in Elastic Optical Networks (EONs), called MEONs, are discussed, analyzing area-aware solutions that result in lower Bandwidth Blocking Rate (BBR) on networks. In addition to the general BBR metric, this work considers BBR by area and BBR by cluster, which are metrics not yet identified in the current scientific literature. The proposed solutions achieve twice the improvements compared to other aware solutions in the literature in terms of blocking bandwidth.
9
  • Lucas dos Santos Althoff
  • Impactos das Edições de Alinhamento na Experiência de Usuário de Vídeos 360 graus.

     

  • Advisor : MYLENE CHRISTINE QUEIROZ DE FARIAS
  • COMMITTEE MEMBERS :
  • MYLENE CHRISTINE QUEIROZ DE FARIAS
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • CELIA GHEDINI RALHA
  • RUDINEI GOULARTE
  • DEBORA CHRISTINA MUCHALUAT SAADE
  • Data: Dec 20, 2023


  • Show Abstract
  • When watching 360videos, the user has higher content control interactivity, and , thus,
    the conventional visual quality is not enough to describe stimulus, that in turn extends to
    the concept of quality of experience (QoE). This dissertation examines quality factors of
    today‘s most popular immersive media format, the 360videos. Particularly important,
    the subjective assessment of quality of experience. Subjective experiments provide data
    to build adequate solutions and are essential to developing and improving multimedia
    systems and applications. The visual quality assessment support researchers to establish
    baselines for coding, and streaming applications.However, the reliability level of the
    measurement of quality in subjective experiments can vary depending on the type of me-
    dia. The optimization of QoE faces two major roadblocks: inaccurate viewport prediction
    and viewers missing the plot of a story.Alignment edits have emerged as a promising
    mechanism to avoid both issues at once. These “re-targeting edits” act on the content in
    real-time, aligning the user’s viewport with a region of interest in the video content. In
    this dissertation, we investigate the effects of alignment edits on user QoE by conducting
    two subjective experiments, where we introduce gradual alignment editions in the videos,
    inspired by a VR gaming technique.The results confirmed that the proposed gradual
    alignment achieves a level of comfort and presence similar to that of instant editions.
    Moreover, all alignment edits tested reduced the head speed after the edit, confirming
    the usefulness of these edits for streaming video on-demand. Furthermore, we observed
    that the proposed gradual editions can achieve a reduction in head speed of 8% greater
    compared to the instant alignment techniques.

2022
Dissertations
1
  • MARCOS PAULO CAYRES ROSA
  • Dynamic Difficulty Adjustment Based on Player Performance and Profile on Platform Games

  • Advisor : RICARDO PEZZUOL JACOBI
  • COMMITTEE MEMBERS :
  • RICARDO PEZZUOL JACOBI
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • TIAGO BARROS PONTES E SILVA
  • ESTEBAN WALTER GONZALEZ CLUA
  • Data: Jul 29, 2022


  • Show Abstract
  • The Dynamic Difficulty Adjustment (DDA) of games can play an important role in in- creasing the player engagement and fun. Gameplay difficulty can be adapted according to the player’s performance, its affective state or by using a hybrid model that com- bines both approaches. In addition, you can adapt game settings or components and use pre-established metrics or machine learning to analyze what will be adapted. This work investigates the different mechanisms of an DDA system for a platform game to adequately adapt its difficulty level and keep the player in a state of flow. This work contributes with the definition of a method that estimates the game’s difficulty based on specific characteristics of components common to the platform genre. Metrics for measur- ing the flow state and player profile are also reviewed, and rules for creating levels when testing ADD models are proposed. The proposed adjustment varies the size of the plat- form and the height of the jump, comparing different approaches from the game systems and verifying the efficiency of each one in relation to the monitoring and analysis of the data and the control of the components adaptation. An open source platform game was adapted to support the ADD algorithms and to run tests with sample groups, in which participants answered questionnaires and had their data collected for research purposes. The results indicated that the difficulty of platform games can be estimated by the com- ponents of the levels, including correlation between the difficulty and player performance data. In addition, player profiles were predicted from raw game session data and used with machine learning methods to define difficulty progression. Finally, the DDA models were able to adjust the game difficulty to the players, decreasing the dispersion between the performance data and keeping the player in a state of flow, especially when using feedforward neural networks to predict the difficulty experienced and the player’s profile.

2
  • OSMAR LUIZ FERREIRA DE CARVALHO
  • Deep learning e sensoriamento remoto:Empurrando as fronteiras na segmentação de imagens

  • Advisor : DIBIO LEANDRO BORGES
  • COMMITTEE MEMBERS :
  • CELIA GHEDINI RALHA
  • DIBIO LEANDRO BORGES
  • ERALDO APARECIDO TRONDOLI MATRICARDI
  • YOSIO EDEMIR SHIMABUKURO
  • Data: Sep 23, 2022


  • Show Abstract
  • Image segmentation aims to simplify the understanding of digital images. Deep learning-based
    methods using convolutional neural networks were game-changing, allowing the exploration of
    different tasks (e.g., semantic, instance, and panoptic segmentation).Semantic segmentation
    assigns a class to every pixel in an image, instance segmentation classifes objects at a pixel
    level with a unique identifer for each target, and panoptic segmentation combines instance-
    level predictions with different backgrounds.Remote sensing data largely benefts from those
    methods, being very suitable for developing new DL algorithms and creating solutions using
    top-view images.However, some peculiarities prevent remote sensing using orbital and aerial
    imagery from growing when compared to traditional ground-level images (e.g., camera photos):
    (1) The images are extensive, (2) it presents different characteristics (e.g., number of channels
    and image format), (3) a high number of pre-processes and post-processes steps (e.g., extracting
    patches and classifying large scenes), and (4) most open software for labeling and deep learn-
    ing applications are not friendly to remote sensing due to the aforementioned reasons.This
    dissertation aims to advance in all three main categories of image segmentation. Within the in-
    stance segmentation domain, we proposed three experiments. First, we enhanced the box-based
    instance segmentation approach for classifying large scenes, allowing practical pipelines to be
    implemented. Second, we created a bounding-box free method to reach instance segmentation
    results by using semantic segmentation models in a scenario with sparse objects.Third, we
    improved the previous method for crowded scenes and developed the first study considering
    semi-supervised learning using remote sensing and GIS data. Next, in the panoptic segmenta-
    tion domain, we presented the first remote sensing panoptic segmentation dataset containing
    fourteen classes and disposed of software and methodology for converting GIS data into the
    panoptic segmentation format. Since our first study considered RGB images, we extended this
    approach for multispectral data.Finally, we leveraged the box-free method initially designed
    for instance segmentation to the panoptic segmentation task. This dissertation analyzed various
    segmentation methods and types of images, and the developed solutions enable the exploration
    of new tasks (such as panoptic segmentation), the simplification of labeling data (using the pro-
    posed semi-supervised learning procedure), and a simplified way to obtain instance and panoptic
    predictions using simple semantic segmentation models.

3
  • JOAO PAULO COSTA DE ARAUJO
  • Aprimorando a Especificação de Propriedade de Sistemas Ciber-Físicos utilizando Seleção Negativa

  • Advisor : GENAINA NUNES RODRIGUES
  • COMMITTEE MEMBERS :
  • GENAINA NUNES RODRIGUES
  • LUIS PAULO FAINA GARCIA
  • VANDER RAMOS ALVES
  • LARS GRUNSKE
  • Data: Nov 14, 2022


  • Show Abstract
  • Cyber-physical systems are a definite reality in our day-to-day lives, especially in recent years. Nevertheless, the complexity inherent in these domains raises some challenges like unforeseen events, difficulties in depicting either the cyber or the physical processes, and the incomplete knowledge of the environmental contexts, for example, might make the CPS unreliable at runtime, which could have disastrous effects. Seeking inspiration in processes from other fields is a very common activity in Computer Science. Nature, especially biology, has long served as a fruitful source of methodologies like artificial intelligence approaches. The Negative Selection Algorithm, for example, is an immuno-based technique with multiple successful applications in the field of CPS, primarily in the field of fault diagnostics for the identification of anomalous behavior. The algorithm’s explainability may bring expressive benefits for the design and verification of CPS by helping understand the property violation patterns, and thus enhance the system specification. In this work, we propose a methodology that aims at increasing the reliability of CPSs. This is achieved by a systematical diagnosis of system property violations based on data generated by a prototype, performed in the early stages of development. An immuno- inspired algorithm called Negative Selection (NSA) serves as an analytical redundancy method to isolate and identify the cause of property violation in the system. We believe that, by reasoning about why the property violations happen, the system specification and the property themselves may be refined, fault-tolerant mechanisms may be added, and, thus, safer and better applications might be written.

4
  • DIEGO SANTOS DA SILVA
  • SCAN-NF: um Sistema de Aprendizado de Máquina para Classificação de Transações de Produtos de Faturas por Meio de Processamento de Texto Curto.

  • Advisor : LI WEIGANG
  • COMMITTEE MEMBERS :
  • LI WEIGANG
  • GERALDO PEREIRA ROCHA FILHO
  • THIAGO DE PAULO FALEIROS
  • ANNE MAGALY DE PAULA CANUTO
  • Data: Dec 8, 2022


  • Show Abstract
  • An electronic invoice (E-invoice) is a document that records the transactions of goods and services electronically, both in storage and exchanges. E-invoice is an emerging practice and presents a valuable source of information for many areas. Processing these invoices is often a challenging task. Information reported is often incomplete or presents mistakes. Before any meaningful processing of these invoices, it is necessary to identify the product represented in each document. The available literature indicates that specialized archi- tectures are necessary to deal with this type of information. This research frames invoice processing as a short-text processing problem to correctly identify the product of each transaction. This work provides both a contextual framework for invoice processing and the architecture for a system to aid tax auditors. A study case utilizing real-world invoice data is presented. We compare traditional term frequency models to sentence classifi- cation models based on convolutional neural networks . Experiments suggest that even though invoice text descriptions are brief and present many mistakes and typos, simple term frequency models can achieve high baseline results on product code assignments.

Thesis
1
  • Aloisio Dourado Neto
  • Towards Complete 3D Indoor Scene Understanding from a Single Point-of-View

  • Advisor : TEOFILO EMIDIO DE CAMPOS
  • COMMITTEE MEMBERS :
  • TEOFILO EMIDIO DE CAMPOS
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • VINICIUS RUELA PEREIRA BORGES
  • ANDERSON DE REZENDE ROCHA
  • GABRIELA OTILIA CSURKA KHEDARI
  • Data: Oct 11, 2022


  • Show Abstract
  • While reasoning about scenes in 3D is a natural task for humans, it remains a challenging problem in Computer Vision,  despite great advances we have seen in the last few decades. Automatic understanding of the complete 3D geometry of a indoor scene and the semantics of each occupied 3D voxel has many applications, such as robotics,  surveillance,  assistive  computing,  augmented reality and immersive spatial audio reproduction. With this research project, we intend to contribute to enhance the current computational results on scene understanding, both on accuracy and coverage. We focus on the task of Semantic Scene Completion, one of the most complete tasks related to scene understanding, as it aims to infer the complete 3D geometry and the semantic labels of each voxel in a scene, including occluded regions. In this thesis, we formulate and access a series of hypothesis to improve current methods both in quality and in scene coverage. Before getting into the problem of 3D SSC, we explored Domain Adaptation methods to address problems related to scarcity of labeled training data in image segmentation tasks in 2D to further apply to 3D. In the 3D SSC domain, we  introduced and evaluated a completely new way to explore the RGB information provided in the RGB-D input and complement the depth information. We showed that this leads to an enhancement in the segmentation of hard-to-detect objects in the scene. We further advanced in the use of RGB data by using semantic priors from the 2D image as a semantic guidance to the 3D segmentation and completion in a multi-modal data-augmented 3D FCN. We complete the contributions related to the quality improvement by combining a Domain Adaptation technique accessed in the earlier stages of the research to our multi-modal network with impressive results. Regarding the scene coverage, which today is restricted to the limited field of view of regular RGB-D sensors like Microsoft Kinect, we complete our contributions with a new approach to extend the current methods to 360-degree using panoramic RGB images and corresponding depth maps from 360-degree sensors or stereo 3D 360-degree cameras.

2
  • André Lauar Sampaio Meirelles
  • Effective and Efficient Active Learning for Analysis of Images in Pathology Using Learning Deep

  • Advisor : GEORGE LUIZ MEDEIROS TEODORO
  • COMMITTEE MEMBERS :
  • ADRIANO ALONSO VELOSO
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • CELIA GHEDINI RALHA
  • GEORGE LUIZ MEDEIROS TEODORO
  • RENATO ANTÔNIO CELSO FERREIRA
  • Data: Oct 14, 2022


  • Show Abstract
  • Deep learning methods have demonstrated remarkable performance in pathology image
    segmentation and classification tasks. However, these models require a large amount of
    annotated training data. Training data generation is a labor intensive process in digital
    pathology, often requiring substantial time commitment from expert pathologists. Ac-
    tive learning (AL) offers an iterative approach to generate training data needed by deep
    learning models, reducing the cost of manual data annotation. In this work, a new AL
    acquisition method, named Diversity-Aware Data Aquisition (DADA), is proposed and
    evaluated regarding its effectiveness in patch-based detection and classification of tissue
    image regions. The proposed method uses a clustering logic that takes into account image
    features, extracted from the deep learning model being trained, and model prediction un-
    certainty to select meaningful training samples (image patches). Besides reducing training
    set sizes, annotation costs are also diminished by computation time gains using a CNN
    simplification solution also developed in this work, the Network Auto-Reduction (NAR).
    With NAR, both uncertainty calculation costs and model training times are strongly re-
    duced. Additionally, to make these solutions viable in practice, a Web based graphical
    interface was adapted to be used with DADA. The DADA/NAR solutions were exper-
    imentally evaluated with a collection of cancer tissue images and are able to: (i) select
    image patches that accelerate the training process by reducing the number of patches
    required to attain a given Area Under the Curve (AUC) value; (ii) using a subpooling
    approach, DADA dramatically reduces iteration times needed to select a new annota-
    tion set; and (iii) the combination of DADA and NAR brings down the execution times
    even more, reaching practical levels while keeping the predictive capacity of models. The
    generalisation of both DADA and NAR to other contexts and applications are expected
    future work, including application in areas such as remote sensing and image segmentation
    problems.

3
  • Elton Sarmanho Siqueira
  • An automated approach to estimate player experience in game events from psychophysiological data

  • Advisor : RICARDO PEZZUOL JACOBI
  • COMMITTEE MEMBERS :
  • BRUNO LUIGGI MACCHIAVELLO ESPINOZA
  • DANIELA GORSKI TREVIZAN
  • ESTEBAN WALTER GONZALEZ CLUA
  • RICARDO PEZZUOL JACOBI
  • TIAGO BARROS PONTES E SILVA
  • Data: Nov 14, 2022


  • Show Abstract
  • Electronic games emerge as one of the biggest forms of entertainment in the world. As a
    consequence, a better understanding of the player experience is necessary. Thus, studies
    on this topic growth exponentially and are concentrated in the development of approaches
    for improvement of many aspects of the game, such as game design, which impacts the
    interaction of the player with game directly.
    There are several approaches to evaluate the player’s experience. Among them, we
    have traditional evaluation approaches that make use of self-report, direct observation,
    questionnaire and video recording, as well as more sophisticated approaches, such as
    psychophysiological methods, which make use of biosensors and image processing. This
    last evaluation approach has been gathering strength in the academic field since several
    line of researches emerges from it. Currently, results obtained from psychophysiological
    approaches have contributed for a better understanding of the player’s experience. Gen-
    erally, traditional evaluation approaches make use of small numbers of participants. Due
    that, some studies show that it is not always possible to capture the true or real expe-
    rience of the player during a game session, especially after long sessions of game play.
    In this case, evaluations using psychophysiological methods will reduce some problems
    of the traditional approaches. Thus, the present study proposes a process that evaluates
    the player’s experience by adopting a prediction model developed using a neural network,
    which make use of an affective dataset derived from the psychophysiological data (EDA,
    BVP and facial expressions) of the participants of the experiment. In addition, this study
    compares the proposed model with the participant’s self-report model, in order to verify
    whether the results of those models are in agreement and whether the developed model
    enables a new way of evaluating the player’s interaction with the game.
    In summary, this research performs a player experience evaluation using psychophys-
    iological data as a different and robust approach. The author proposes some recommen-
    dations for employing this approach alongside traditional approach, creating a hybrid
    method. However, there are some limitations to the use of psychophysiological evaluation
    in terms of its financial costs, processing and execution. Finally, the methodology used
    in this work, together with its results, present a new contribution to researches related to
    the process of evaluating the player’s experience.

SIGAA | Secretaria de Tecnologia da Informação - STI - (61) 3107-0102 | Copyright © 2006-2024 - UFRN - app25_Prod.sigaa19