The classical rotation algorithm applied to monochromatic images introduces white holes in black ... more The classical rotation algorithm applied to monochromatic images introduces white holes in black areas, making edges uneven and disconnecting neighboring elements. Several algorithms in the literature address only the white hole problem. This paper proposes a new algorithm that solves those three problems, producing better quality images.
This paper presents an algorithm based on Flood Fill, Component Labelling, and Region Adjacency G... more This paper presents an algorithm based on Flood Fill, Component Labelling, and Region Adjacency Graphs for removing noisy borders in monochromatic images of documents introduced by the digitalization process using automatically fed scanners. The new algorithm was tested on 20,000 images and provided better quality images and time-space performance than its predecessors including the widespread used commercial tools.
BigBatch is a tool designed to automatically process thousands of monochromatic images of documen... more BigBatch is a tool designed to automatically process thousands of monochromatic images of documents generated by production line scanners. It removes noisy borders, checks and corrects orientation, calculates and compensates the skew angle, crops the image standardizing document sizes, and finally compresses it according to user defined file format. BigBatch encompasses the best and recently developed algorithms for such kind
Very frequently the digitalisation process of documents produce images rotated of small angles in... more Very frequently the digitalisation process of documents produce images rotated of small angles in relation to the original image axis. The skew introduced makes more difficult the visualisation of images by human users. Besides that, it increases the complexity of any sort of automatic image recognition, degrades the performance of OCR tools, increases the space needed for image storage, etc. Thus, skew correction is an important part of any document processing system being a matter of concern of researchers for almost two decades now. The search for faster and good quality solutions to this problem is still on. This paper presents an efficient algorithm for skew detection and correction of images of documents including non-textual graphical elements, such as pictures and tables. The new algorithm was tested in over 10,000 images yielding satisfactory performance.
... Very often the filing punches on the left margin are torn off. ... The parameters adopted in ... more ... Very often the filing punches on the left margin are torn off. ... The parameters adopted in the tests performed were: Border Percent = 40; White Noise Length = 40; Variance = 40. ... On Figure 07, part of the document information was removed. ...
BigBatch is a processing environment designed to automatically process batches of millions of mon... more BigBatch is a processing environment designed to automatically process batches of millions of monochromatic images of documents generated by production line scanners. It removes noisy borders, checks and corrects orientation, calculates and compensates the skew angle, crops the image standardizing document sizes, and finally compresses it according to user defined file format. BigBatch encompasses the best and recently developed algorithms for such kind of document images. BigBatch may work either in standalone or operator assisted modes. Besides that, BigBatch in standalone mode is able to process in clusters of workstations or in grids.
The classical rotation algorithm applied to monochromatic images introduces white holes in black ... more The classical rotation algorithm applied to monochromatic images introduces white holes in black areas, making edges uneven and disconnecting neighboring elements. Several algorithms in the literature address only the white hole problem. This paper proposes a new algorithm that solves those three problems, producing better quality images.
This paper presents an algorithm based on Flood Fill, Component Labelling, and Region Adjacency G... more This paper presents an algorithm based on Flood Fill, Component Labelling, and Region Adjacency Graphs for removing noisy borders in monochromatic images of documents introduced by the digitalization process using automatically fed scanners. The new algorithm was tested on 20,000 images and provided better quality images and time-space performance than its predecessors including the widespread used commercial tools.
BigBatch is a tool designed to automatically process thousands of monochromatic images of documen... more BigBatch is a tool designed to automatically process thousands of monochromatic images of documents generated by production line scanners. It removes noisy borders, checks and corrects orientation, calculates and compensates the skew angle, crops the image standardizing document sizes, and finally compresses it according to user defined file format. BigBatch encompasses the best and recently developed algorithms for such kind
Very frequently the digitalisation process of documents produce images rotated of small angles in... more Very frequently the digitalisation process of documents produce images rotated of small angles in relation to the original image axis. The skew introduced makes more difficult the visualisation of images by human users. Besides that, it increases the complexity of any sort of automatic image recognition, degrades the performance of OCR tools, increases the space needed for image storage, etc. Thus, skew correction is an important part of any document processing system being a matter of concern of researchers for almost two decades now. The search for faster and good quality solutions to this problem is still on. This paper presents an efficient algorithm for skew detection and correction of images of documents including non-textual graphical elements, such as pictures and tables. The new algorithm was tested in over 10,000 images yielding satisfactory performance.
... Very often the filing punches on the left margin are torn off. ... The parameters adopted in ... more ... Very often the filing punches on the left margin are torn off. ... The parameters adopted in the tests performed were: Border Percent = 40; White Noise Length = 40; Variance = 40. ... On Figure 07, part of the document information was removed. ...
BigBatch is a processing environment designed to automatically process batches of millions of mon... more BigBatch is a processing environment designed to automatically process batches of millions of monochromatic images of documents generated by production line scanners. It removes noisy borders, checks and corrects orientation, calculates and compensates the skew angle, crops the image standardizing document sizes, and finally compresses it according to user defined file format. BigBatch encompasses the best and recently developed algorithms for such kind of document images. BigBatch may work either in standalone or operator assisted modes. Besides that, BigBatch in standalone mode is able to process in clusters of workstations or in grids.
Uploads
Papers by Bruno Avila