A new machine learning based framework to classify and analyze industry-specific regulations
Government Transparency, Natural Language Processing (NLP), Text Classification, Machine Learning, Regulation Metrics
Government transparency and openness are key factors to bring forth the modernization of the state. The combination of transparency and digital information has given rise to the concept of Open Government, that increases citizen understanding and monitoring of government actions, which in turn improves the quality of public services and of the government decision making process. With the goal of improving legislative transparency and the understanding of the Brazilian regulatory process and its characteristics, this work introduces RegBR, the first national framework to centralize, classify and analyze regulations from the Brazilian government. A centralized database of Brazilian federal legislation built from automated ETL routines and processed with data mining and machine learning techniques was created. This framework evaluates different natural language processing (NLP) models in a text classification task on a novel Portuguese legal corpus and performs regulatory analysis based on metrics that concern linguistic complexity, restrictiveness, popularity, and industry-specific citation relevance. Hence, this work is an innovative and unique project that proposes metrics that can be used by policy makers to measure their own work and which aims to increase openness and transparency of the public process, but also supports new studies in the area of Brazilian regulatory impact.