|
Disertaciones |
|
1
|
-
PEDRO CARVALHO BROM
-
Relation between Variance and Range of Financial Returns
-
Líder : RAUL YUKIHIRO MATSUSHITA
-
MIEMBROS DE LA BANCA :
-
RAUL YUKIHIRO MATSUSHITA
-
ALAN RICARDO DA SILVA
-
ROBERTO VILA GABRIEL
-
REGINA CÉLIA BUENO DA FONSECA
-
Data: 31-ene-2023
-
-
Resumen Espectáculo
-
This work, organized as a collection of three articles, proposes a solution to the truncation problem, reconciling past-bounded information and future-unbounded events. We show that this is possible by applying a power law relating the length of the truncation (ℓ) and the standard deviation of the data (σ) given by ℓ = ζσβ, where ζ and β are positive coefficients. This approach is applicable for a wide class of symmetric distributions—including truncated Lévy flights — as it does not require the exact form of the probability distribution function. In addition, distributional moments may vary over time. In particular, we applied the proposed methodology to intraday financial returns of exchange rates for different currencies, totaling more than 32 million observations. In this case, we propose a non-Gaussian standardization in the form z = r/σβ, where r is a financial return (typically subject to volatility clusters) and z is the standardized return without volatility clusters.
|
|
2
|
-
Rodrigo Marques dos Santos
-
A Bayesian method for checking the fit of the three parameter logistic model in item response theory
-
Líder : ANTONIO EDUARDO GOMES
-
MIEMBROS DE LA BANCA :
-
ANTONIO EDUARDO GOMES
-
ANDRE LUIZ FERNANDES CANCADO
-
RAUL YUKIHIRO MATSUSHITA
-
DALTON FRANCISCO DE ANDRADE
-
Data: 27-feb-2023
-
-
Resumen Espectáculo
-
he Item Response Theory has been increasingly used in studies that aim to estimate the latent trait and, among the existing models, the logistic ones are the most used. However, more and more studies show that the assumption that Item Characteristic Curves (ICC’s) follow the Logistic form are not valid, making it increasingly important to check this assumption. herefore, estimating the ICC in alternative, nonparametric ways can be a powerful tool to compare with the ICC generated by the logistic model and thus allow inference about the veracity of this assumption.This study proposes a nonparametric test that uses Bayesian inference, more specifically the Posterior Predictive Model Checking (PPMC) method to test this hypothesis. To compare with the ICC calculated by the Logistic Model, Isotonic and Nadaraya-Watson regressions were used to create 6 test statistics. Two analyses were done, one using a simulated data set and the other applying this test to real data from a SARESP application. The simulation results were satisfactory, with the test indicating significant differences in very few items that actually followed the 3-parameter Logistic Model, and managing to recognize well those items that had a non-monotonic ICC. Despite this, the test recognized only one item that were mixtures of distributions.For the real data, the Isotonic Regression estimators indicated different values than those indicated by the Nadaraya-Watson Regression, for the most part of items.
|
|
3
|
-
Arthur Canotilho Machado
-
Approximate Bayesian Computation via factorisationof the posterior distribution
-
Líder : GUILHERME SOUZA RODRIGUES
-
MIEMBROS DE LA BANCA :
-
GUILHERME SOUZA RODRIGUES
-
RAUL YUKIHIRO MATSUSHITA
-
THAIS CARVALHO VALADARES RODRIGUES
-
KELLY CRISTINA MOTA GONÇALVES
-
Data: 01-mar-2023
-
-
Resumen Espectáculo
-
It is common in modern Bayesian inference problems to come across complex and/or high-dimensional models, such as those that arise in the field of population genetics (Beaumont Zhang, & Balding, 2002), where the likelihood function and marginal distributions are difficult or even intractable to compute, leading to problemsin obtaining the posterior distribution. There are several methods for approximating the posterior distribution for these type of cases, including the Approximate Gibbs Sampler proposed by Rodrigues, Nott, and Sisson (2019), which allows the generation ofsamples from an approximate posterior distribution using principles of Approximate Bayesian Computation (ABC) and Gibbs Sampling. Santos (2021) proposed an improvement to the technique by previously decorrelating the parameters of interest and using quantile regression models via neural networks in the process of approximating the complete conditional distributions In this work, we suggest replacing the Approximate Gibbs Sampler with an algorithm that approximates the terms of a convenient factorization ofthe posterior distribution. We present a review of the theory and practical applications comparing the methods of Rodrigues, Nott, and Sisson (2019), of Santos (2021), and the proposed in this work. Synthetic datasets were generated to compare the methods.The algorithm proposed in this work showed good performance compared to its peers.
|
|
4
|
-
Ricardo Torres Bispo Reis
-
Quantile-based Recalibration of Artificial Neural Networks
-
Líder : GUILHERME SOUZA RODRIGUES
-
MIEMBROS DE LA BANCA :
-
GUILHERME SOUZA RODRIGUES
-
JOSE AUGUSTO FIORUCCI
-
THAIS CARVALHO VALADARES RODRIGUES
-
RAFAEL IZBICKI
-
Data: 01-mar-2023
-
-
Resumen Espectáculo
-
Artificial neural networks (ANN) are powerful tools for prediction and data modeling. Although they are becoming ever more powerful, modern improvements have compromised their calibration in favor of enhanced prediction accuracy, thus making their true confidence harder to assess. To address this problem, we propose a new post-processing quantile-based method of recalibration for ANN. To illustrate the method's mechanics we present two toy examples. In both, recalibration reduced the Mean Squared Error over the original uncalibrated models and provided a better representation of the data generative model. To further investigate the effects of the proposed recalibration procedure, we also present a simulation study comparing various parameter configurations--the recalibration successfully improved performance over the base models in all scenarios under consideration. At last, we apply the proposed method to a problem of diamond price prediction, where it was also able toimprove the overall model performance.
|
|
5
|
-
Lucas José Gonçalves Freitas
-
Text clustering applied to the treatment of unbalanced legal data
-
Líder : THAIS CARVALHO VALADARES RODRIGUES
-
MIEMBROS DE LA BANCA :
-
THAIS CARVALHO VALADARES RODRIGUES
-
ANDRE LUIZ FERNANDES CANCADO
-
NÁDIA FELIX FELIPE DA SILVA
-
RAFAEL BASSI STERN
-
Data: 02-mar-2023
-
-
Resumen Espectáculo
-
The Federal Supreme Court (STF), the highest instance of the Brazilian judicial system, produces, as well as courts of other instances, an immense amount of data organized in text form, through decisions, petitions, injunctions, appeals and other legal documents. Such documents are classified and grouped by public employees specialized in cataloging of judicial processes, which in specific cases use technological support tools. Some processes in the STF, for example, are classified under one or more sustainable development goals (SDGs) of the United Nations (UN) 2030 Agenda. As it is a repetitive task related to pattern recognition, it is possible to develop tools based on machine learning for this purpose. In this work, Natural Language Processing (NLP) models are proposed for clustering processes, in order to increase the database on certain sustainable development goals (SDGs) with few inputs naturally. The activity of clustering, which is of enormous importance in its own right, is also able to gather unlabeled entries around cases already classified by court officials, thus allowing new labels to be allocated to similar cases. The results of the work show that cluster-augmented sets can be used in supervised learning flows to aid in the classification of legal texts, especially in contexts with unbalanced data.
|
|
6
|
-
Gustavo Martins Venancio Pires
-
A hybrid model for hierarchical time series with multiple seasonality
-
Líder : JOSE AUGUSTO FIORUCCI
-
MIEMBROS DE LA BANCA :
-
DIEGO CARVALHO DO NASCIMENTO
-
EDUARDO YOSHIO NAKANO
-
JOSE AUGUSTO FIORUCCI
-
PAULO HENRIQUE FERREIRA DA SILVA
-
Data: 14-mar-2023
-
-
Resumen Espectáculo
-
This Master’s Thesis proposes a hybrid model capable of forecasting hierarchical time series with multiple seasonality. This hybrid methodology consists of using aMachine Learningmodel that has variables containing time series statistical methodologies to generate cohesive forecasts. This methodology was applied to theM5-Forecasting(2020) competition available through Kaggle, in which the objective was to more accurately predict the daily sale of 3,409 products distributed in 5 levels of hierarchy by 28 days. During the dissertation, 5 different approaches were compared andtheLight Gradient Boosting Machine(LGBM) model containing a variable based on the TBATS (Trigonometric seasonity, Box-Cox transformation ARMA errors, Tred and Seasonal components) obtained an accuracy gain of 27% compared to the LGBM models without the variable mentioned. This model would have obtained the 318th place in the competition, being among the top 6% competitors.
|
|
7
|
-
Roberto de Souza Marques Buffone
-
Analysis of the traffic accidents rate with victims using Geographically Weighted Beta Regression.
-
Líder : ALAN RICARDO DA SILVA
-
MIEMBROS DE LA BANCA :
-
ALAN RICARDO DA SILVA
-
ANDRE LUIZ FERNANDES CANCADO
-
TEREZINHA KESSIA DE ASSIS RIBEIRO
-
FLÁVIO JOSÉ CRAVEIRO CUNTO
-
Data: 14-jun-2023
-
-
Resumen Espectáculo
-
Classical linear regression allows, in a simple way, that a continuous quantitative variable is modeled from other variables. However, this type of methodology has certain assumptions, such as independence between observations, which if ignored can lead to methodological issues. Additionally, not all data follows a normal distribution, which leads to alternative methods for modeling. In this context, Geographically Weighted Beta Regression (GWBR) is presented with the aim of incorporating spatial dependence into the modeling, along with the analysis of rates and proportions using the beta distribution. The beta distribution, with its scope within the unit interval and its flexible nature, easily adapts to the analyzed data. In this study, GWBR was applied to the rate of traffic accidents with victims in Fortaleza-CE, Brazil, from 2009 to 2011, comparing its results to global and local models of classical regression, classical regression with logit transformation of the response variable, and global beta regression. Additionally, the ‘gwbr’ package was developed in R software, providing the necessary algorithms for GWBR application. In conclusion, it was found that the local approach using the beta distribution is a viable model for explaining the rate of traffic accidents with victims, given its suitability to both asymmetric and symmetric distributions. Therefore, when analyzing rates, the use of the beta distribution is always recommended.
|
|
8
|
-
Matheus Stivali
-
Two essays on yield curve modelling
-
Líder : JOSE AUGUSTO FIORUCCI
-
MIEMBROS DE LA BANCA :
-
JOSE AUGUSTO FIORUCCI
-
EDUARDO YOSHIO NAKANO
-
RAUL YUKIHIRO MATSUSHITA
-
GERALDO NUNES SILVA
-
Data: 12-dic-2023
-
-
Resumen Espectáculo
-
The dissertation undertakes two distinct lines of statistical analysis on the yield curve for Brazil: the first involves the interpolation of daily observed data to estimate the complete curve. In contrast, the second focuses on extrapolating past information to forecast the yield curve. These analyses aim to model the behaviour of interest rates in Brazil, offering insights for improved macroeconomic management and supporting investment decisions. The analysis utilizes data from interest rate futures contracts traded in Brazil between January 2018 and April 2023. The second chapter is dedicated to estimating empirical models of the Term Structure of Interest Rates. Despite B3 periodically releasing yield curve estimates for monitoring the Brazilian market, various estimation techniques are considered for alternative purposes due to inherent trade-offs. The interest rate and maturity relationship holds for all terms, but daily observations are limited to specific maturities corresponding to traded securities or derivatives. Therefore, estimating the entire curve from these observed data points is crucial. This chapter evaluates empirical models, which do not impose restrictions derived from theoretical term structure models during the estimation process. These models are focused on obtaining a smooth function from observed data while adhering to specific constraints, such as the non-negativity of interest rates. The evaluation criteria include the quality of fit, robustness to outliers, and smoothness of the estimated function. This chapter contributes to literature by assessing models not previously applied to yield curve estimation and utilizing the multiple comparison procedure. Results highlight the strong fit of spline models, emphasize the greater smoothness of Nelson-Siegel family models, and recognize the noteworthy performance of the previously overlooked Loess model. The third chapter delves into modelling the yield curve dynamics through a factor model perspective to generate curve predictions. The analysis incorporates Brazilian data by implementing the Nelson-Siegel Dynamic model proposed by Diebold and Li (2006) and further developed in Diebold et al. (2006). Both original estimation procedures, two-step and one-step, are considered, focusing on the latter using the Kalman filter. Out-of-sample predictive capacity is assessed through the Diebold-Mariano test, comparing the performance of these implementations against simpler models.
|
|
9
|
-
Gabriel Ângelo da Silva Gomes
-
Essays on fingerprint data statistical analysis
-
Líder : RAUL YUKIHIRO MATSUSHITA
-
MIEMBROS DE LA BANCA :
-
RAUL YUKIHIRO MATSUSHITA
-
GLADSTON LUIZ DA SILVA
-
ROBERTO VILA GABRIEL
-
REGINA CÉLIA BUENO DA FONSECA
-
Data: 13-dic-2023
-
-
Resumen Espectáculo
-
This dissertation is organized as a collection of five articles regarding applying statistical tools in fingerprint studies. The first applies convolutional neural networks to fingerprint data for predicting human attributes such as sex, hand types (left or right), and position of fingers (right index finger, for example). The second presents a bibliometric review from 2018 to 2023 of automated minutiae counting initiatives, we noted that most involve convolutional neural networks. The third deals with a statistical analysis of the distribution of Level 2 details concerning levels 1 and 3, in addition to considering sex and type of finger. The fourth suggests an initiative to disseminate 1,000 fingerprints sampled from Brazilians (50 males and 50 females) for ethical, non-profit academic and scientific research. This initiative aims to promote fingerprint identification studies. Finally, the fifth essay suggests Rényi’s divergence as an alternative to the traditional chi-square test to evaluate goodness-of-fit, homogeneity, and independence in contingency tables involving rare events. We illustrate this method using fingerprint minutiae data sampled from the Brazilian Federal Police records.
|
|
10
|
-
Aitcheou Gauthier Zountchegnon
-
Time series forecasting applied to data sale of a large retailer in Brazil.
-
Líder : JOSE AUGUSTO FIORUCCI
-
MIEMBROS DE LA BANCA :
-
JOSE AUGUSTO FIORUCCI
-
EDUARDO YOSHIO NAKANO
-
GUILHERME SOUZA RODRIGUES
-
MARINHO GOMES DE ANDRADE FILHO
-
Data: 19-dic-2023
-
-
Resumen Espectáculo
-
Retail trade plays a crucial role in the Brazilian economy, and planning for sales volume and other factors related to the retail sector is of great importance for its growth. To effectively forecast and plan sales quantities, methodologies related to time séries can be employed. This study focuses on the development and evaluation of predictive models, which should consider typical characteristics of such data, such as hierarchical structure, the presence of multiple seasonalities in higher-level séries, and intermittent behavior in lower-level séries.
|
|