Keywords

handwritten documents, text, separating lines

Abstract

We present an approach to finding (and separating) lines of text in free-form handwritten historical document images. After preprocessing, our method uses the count of foreground/background transitions in a binarized image to determine areas of the document that are likely to be text lines. Alternatively, an Adaptive Local Connectivity Map (ALCM) found in the literature can be used for this step of the process. We then use a min-cut/max-flow graph cut algorithm to split up text areas that appear to encompass more than one line of text. After removing text lines containing relatively little text information (or merging them with nearby text lines), we create output images for each line. A grayscale output image is created, as well as a special mask image containing both the foreground and information flagging ambiguous pixels. Foreground pixels that belong to other text lines are removed from the output images to provide cleaner line images useful for further processing. While some refinement is still necessary, the result of early experimentation with our method is encouraging.

Original Publication Citation

Douglas J. Kennard and William A. Barrett, "Separating Lines of Text in Free-Form Hand-written Historical Documents," IEEE Proceedings, 2nd International Conference on Document Image Analysis for Libraries (DIAL 26), pp. 12-23, Lyon, France, April, 26.