Konferensartikel

FinDSE@FinTOC-2019 Shared Task

Carla Abreu
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal / LIACC, Porto, Portugal

Henrique Lopes Cardoso
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal / LIACC, Porto, Portugal

Eugénio Oliveira
Faculdade de Engenharia da Universidade do Porto, Porto. Portugal / LIACC, Porto, Portugal

Ladda ner artikel

Ingår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:10, s. 69-73

NEALT Proceedings Series 40:10, p. 69-73

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We present the approach developed at the Faculty of Engineering of the University of Porto to participate in FinTOC-2019 Financial Document Structure Extraction -- Detection of titles sub-task. Several financial documents are produced in machine-readable format. Due to the poor structure of these documents, it is an arduous task to retrieve the desired information from them. The aim of this sub-task is to detect titles in this kind of documents. We propose a supervised learning approach making use of linguistic, semantic and morphological features to classify a text block as title or non title. The proposed methodology got a F1 score of 97.01%.

Nyckelord

Machine Learning, Natural Language Processing, Document Structure Extraction

Referenser

Inga referenser tillgängliga

Citeringar i Crossref