Document Layout Analysis System

Document Layout Analysis System

A generic tool to identify and extract regions of text by analyzing connected components

Target users and customers

Everyone who has to deal with document image analysis.

Layout analysis is the first major step in a document image analysis workflow. The correctness of the output of page segmentation and region classification is crucial as the resulting representation is the basis for all subsequent analysis and recognition processes.

Application sectors

  • Industry
  • Service
  • Patrimony
  • Edition
  • Administration


The system identifies and extracts regions of text by analyzing connected components constrained by black and white (background) separators. The rest is filtered out as non-text. First, the image is binarized, any skew is corrected and black page borders are removed. Subsequently, connected components are extracted and filtered according to their size (very small components are filtered out).

Technical requirements:

Any Posix compliant system

Conditions for access and use:

Ask Jouve



  • Jouve

Contact details:

Jean-Pierre Raysz

Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne