This paper presents a novel binarization algorithm for color document images. Conventional thresholding methods do not produce satisfactory binarization results for documents with close or mixed foreground colors and background colors. Initially, statistical image features are extracted from the luminance distribution. Then, a decision-tree based binarization method is proposed, which selects various color features to binarize color document images. First, if the document image colors are concentrated within a limited range, saturation is employed. Second, if the image foreground colors are significant, luminance is adopted. Third, if the image background colors are concentrated within a limited range, luminance is also applied. Fourth, if the total number of pixels with low luminance (less than 60) is limited, saturation is applied; else both luminance and saturation are employed. Our experiments include 519 color images, most of which are uniform invoice and name-card document images. The proposed binarization method generates better results than other available methods in shape and connected-component measurements. Also, the binarization method obtains higher recognition accuracy in a commercial OCR system than other comparable methods
關聯:
IEEE Transactions on Image Processing, v11(4), p.434-451