Banca de QUALIFICAÇÃO: Mayana Wanderley Pereira

Uma banca de QUALIFICAÇÃO de DOUTORADO foi cadastrada pelo programa.
STUDENT : Mayana Wanderley Pereira
DATE: 25/05/2023
TIME: 10:00
LOCAL: Videoconferência
TITLE:

ADVANCING FAIRNESS AND DIFFERENTIAL PRIVACY IN MACHINE LEARNING FOR SOCIALLY RELEVANT APPLICATIONS


KEY WORDS:

machine learning, differential privacy, synthetic data, child sexual abuse media, algorithmic fairness, artificial intelligence


PAGES: 76
BIG AREA: Engenharias
AREA: Engenharia Elétrica
SUBÁREA: Telecomunicações
SPECIALTY: Sistemas deTelecomunicações
SUMMARY:

This thesis investigates privacy-preserving machine learning techniques for socially relevant applications, focusing on two specific areas: the detection and identification of Child Sexual Abuse Media (CSAM) and the generation of synthetic datasets that respect privacy and fairness concerns. We address the challenge of developing machine learning-based solutions for CSAM detection while considering the ethical and legal constraints of using explicit imagery for model training. To circumvent these limitations, we propose a novel framework that leverages file metadata for CSAM identification. Our approach involves training and evaluating deploymentready machine learning models based on file paths, demonstrating its effectiveness on a dataset of over one million file paths collected from actual investigations. Additionally, we assess the robustness of our solution against adversarial attacks and explore the use of differential privacy to protect the model from model inference attacks without sacrificing utility. In the second part of this thesis, we investigate the opportunities and challenges of utilizing synthetic data generation in the context of increasing global privacy regulations. Synthetic data, which mimics real data without replicating personal information, offers various possibilities for data analysis and machine learning tasks. However, little is understood about the impacts of using synthetic data sets in machine learning pipelines, especially when only synthetic data is available for training and evaluation. This study examines the relationship between differential privacy and machine learning fairness, exploring how different synthetic data generation methods affect fairness and comparing the performance of models trained and tested with synthetic data versus real data. The findings contribute to a better understanding of synthetic data usage in machine learning pipelines and its potential to advance research across various fields. As future work, we aim to develop protocols for generation of synthetic data sets from distributed sources with differentially private guarantees, without the need for a trusted dealer. The goal of this approach is to enable data holders to share data without violating legal and ethical restrictions.


BANKING MEMBERS:
Externo à Instituição - RICARDO FELIPE CUSTODIO - UFSC
Externo à Instituição - MARIO LARANGEIRA
Externo ao Programa - 2311780 - FABIO LUCIO LOPES DE MENDONCA
Externo ao Programa - 2556078 - GEORGES DANIEL AMVAME NZE
Presidente - 1771918 - UGO SILVA DIAS
Notícia cadastrada em: 17/05/2023 07:45
SIGAA | Secretaria de Tecnologia da Informação - STI - (61) 3107-0102 | Copyright © 2006-2024 - UFRN - app33_Prod.sigaa27