The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

 Kerstin Enflo. Photo.

Kerstin Enflo

Professor

 Kerstin Enflo. Photo.

Joint Handwritten Text Recognition and Word Classification for Tabular Information Extraction

Author

  • Christopher Blomqvist
  • Kerstin Enflo
  • Andreas Jakobsson
  • Kalle Åström

Summary, in English

In this paper, we present a system for extracting tabular information from loosely structured handwritten documents. The system consists of three parts, (i) a u-net like CNN-based method for text detection and segmentation, (ii) a new attention-based method for simultaneous text recognition and classification of word-parts, and (iii) a method for matching the word parts into a tabular structure for each entry. A key contribution is the observation that the new attention-based recognition and classification module makes it possible for improved spatial analysis of the tabular information. The method is evaluated on a unique historical document: The Swedish Wealth Tax of 1571, consisting of 11,453 pages of hand-written tax records. The evaluation shows that the system provides a significant improvement to the state-of-the-art to the problem of tabular extraction from loosely structured historical documents.

Department/s

  • Department of Economic History
  • Growth, technological change, and inequality
  • LTH Profile Area: AI and Digitalization
  • eSSENCE: The e-Science Collaboration
  • Mathematical Statistics
  • Biomedical Modelling and Computation
  • Statistical Signal Processing Group
  • Stroke Imaging Research group
  • Mathematics (Faculty of Engineering)
  • ELLIIT: the Linköping-Lund initiative on IT and mobile communication
  • Mathematical Imaging Group

Publishing year

2022-11-29

Language

English

Pages

1564-1570

Publication/Series

2022 26th International Conference on Pattern Recognition (ICPR)

Document type

Conference paper

Publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

Topic

  • Computer Vision and Robotics (Autonomous Systems)
  • Economic History

Keywords

  • Histograms
  • Image segmentation
  • Text recognition
  • Finance
  • Writing
  • Information retrieval
  • Decoding

Conference name

26TH International Conference on Pattern Recognition, 2022

Conference date

2022-08-21 - 2022-08-25

Conference place

Montreal, Canada

Status

Published

Project

  • Praise the people or praise the place: How culture and specialization drive long-term regional growth

Research group

  • Biomedical Modelling and Computation
  • Statistical Signal Processing Group
  • Stroke Imaging Research group
  • Mathematical Imaging Group

ISBN/ISSN/Other

  • ISBN: 978-1-6654-9062-7
  • ISBN: 978-1-6654-9063-4