Abstractive Summarization of Long Documents Used in Inspections and Procedural Instructions
Natural Language Processing, Abstractive Summarization, Long Documents, Legal Documents
The Brazilian Federal Court of Accounts organizes its work by processes and, throughout their life cycle, each of them usually contains from tens to hundreds of legal documents. Each document easily reaches a few dozen pages. The number of processes and documents only tends to grow over time, which generates a huge amount of material for reading and with a very rich content, but difficult to consume, as it takes considerable time to read each process. The processes are usually read to verify if they have relevant content for any fiscalization or procedural instruction in progress. In addition to the high cost of reading a process, part of this content is discarded by the auditor because it is not linked to their current work, which generates a waste of time in this activity. To improve the efficiency of this process, we proposed in this work the development of an automatic text summarization solution using machine learning applied to natural language processing. This solution will use the abstractive summarization approach applied to long documents and with legal content, using models that are state-of-the-art in the task and based on transformers with linear attention mechanism. The solution will be made available as an Web Apllication with a microservice for better integration with applications that make up the auditor’s work process. The summaries generated by the models will be evaluated mainly by metrics that focus more on the semantics of the generated text and, as a result, will have a better adherence to the desired content. The user will provide feedback on the generated summaries and they will be used to feed back the model later.