Document Image Analysis

Documents are mainly paper-based type of media containing information of various kinds such as text, graphics, pictures, mathematical formulas, and tables. Document analysis is concerned with the problems of transferring documents into an electronic form. It is a key area of research for various applications in machine vision and media processing, including page readers, forms processing, content-based document retrieval and transmission, and digital libraries. Document analysis will provide important tools for the development of value added services for next generation telecommunication systems, such as content production for media services or intelligent terminals.

The variety of different types of documents and document degradation cause problems for automated analysis. For most applications, however, very reliable, robust and fairly generic algorithms would be needed.

Our objective is to make document analysis more generic and reliable than is possible with today's methodology and in this way to create basis for the development of new innovative applications. Since 1993 our group has investigated various methods for document image analysis, including shape description, binarization, skew detection, preprocessing, page segmentation and forms processing, and has obtained very promising results..

