Dr. Nikolaos Stamatopoulos was born in 1984 in Athens, Greece. He received his Bachelor in Informatics and Telecommunications in 2006 and his Ph.D. degree in 2011, both from the Department of Informatics and Telecommunications of National and Kapodistrian University of Athens. His Ph.D Thesis is on Optical Process and Analysis of Historical Documents. He is currently working as research associate at the Institute of Informatics and Telecommunications of the National Center for Scientific Research "Demokritos", Athens, Greece. He has participated in several research and industrial projects and he is program committee member of several international Conferences and Workshops (ICDAR, ICFHR, DAS, MEDPRAI). His main research interests are in Image Processing and Document Image Analysis, Processing of Historical Documents, OCR and Pattern Recognition and he has authored many papers in journals and conference proceedings in the above areas.
Proceedings of the 2016 ACM Symposium on Document Engineering - DocEng '16, 2016
In this paper we propose a new method for automated segmenta-tion of scanned newspaper pages into... more In this paper we propose a new method for automated segmenta-tion of scanned newspaper pages into articles. Article regions are produced as a result of merging sub-article level content and title regions. We use a Bayesian Gaussian mixture model to model page Connected Component information and cluster input into sub-article components. The Bayesian model is conditioned on a prior distribution over region features, aiding classification into titles and content. Using a Dirichlet prior we are able to automatically estimate correctly the number of title and article regions. The method is tested on a dataset of digitized historical newspapers, where visual experimental results are very promising.
2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016
Word segmentation refers to the process of defining the word regions of a text line. It is a crit... more Word segmentation refers to the process of defining the word regions of a text line. It is a critical stage towards word and character recognition as well as word spotting and mainly concerns three basic stages, namely preprocessing, distance computation and gap classification. In this paper, we propose a novel word segmentation method which uses the Student’s-t distribution for the gap classification stage. The main advantage of the Student’s-t distribution concerns its robustness to the existence of outliers. In order to test the efficiency of the proposed method we used the four benchmarking datasets of the ICDAR/ICFHR Handwriting Segmentation Contests as well as a historical typewritten dataset of Greek polytonic text. It is observed that the use of mixtures of Student’s-t distributions for word segmentation outperforms other gap classification methods in terms of Recognition Accuracy and F-Measure. Also, in terms of all examined benchmarks, the Student's-t is shown to produce a perfect segmentation result in significantly more cases than the state-of-the-art Gaussian mixture model.
When capturing a document using a digital camera, the resulting document image is often framed by... more When capturing a document using a digital camera, the resulting document image is often framed by a noisy black border or includes noisy text regions from neighbouring pages. In this paper, we present a novel technique for enhancing the document images captured by a digital camera by automatically detecting the document borders and cutting out noisy black borders as well
Article history: Image segmentation is a major task of handwritten document image processing. Man... more Article history: Image segmentation is a major task of handwritten document image processing. Many of the proposed techniques for image segmentation are complementary in the sense that each of them using a different approach can solve different difficult problems such as overlapping, touching components, influence of author or font style etc. In this paper, a combination method of different segmentation
Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14, 2014
ABSTRACT In this paper, we introduce the H-DocPro platform which is a publicly available document... more ABSTRACT In this paper, we introduce the H-DocPro platform which is a publicly available document image processing platform for historical documents. H-DocPro is a result of our recent and ongoing research on historical document image processing and has been developed in order to monitor the successive application of several new or state-of-the-art document image processing methods. It is an open architecture software platform that permits several document image processing modules and methods (e.g. binarization, image enhancement, page split) to be utilized in an easy to define processing workflow. We provide detailed information on how to use H-DocPro, the available modules and methods as well as the way one can add his own components exploiting the open architecture form of the platform. Representative examples and experimental results using large sets of historical document images demonstrate the efficiency of H-DocPro methods.
2010 12th International Conference on Frontiers in Handwriting Recognition, 2010
One of the major issues in document image processing is the efficient creation of ground truth in... more One of the major issues in document image processing is the efficient creation of ground truth in order to be used for training and evaluation purposes. Since a large number of tools have to be trained and evaluated in realistic circumstances, we need to have a quick and low cost way to create the corresponding ground truth. Moreover, the specific
Proceedings of the 2016 ACM Symposium on Document Engineering - DocEng '16, 2016
In this paper we propose a new method for automated segmenta-tion of scanned newspaper pages into... more In this paper we propose a new method for automated segmenta-tion of scanned newspaper pages into articles. Article regions are produced as a result of merging sub-article level content and title regions. We use a Bayesian Gaussian mixture model to model page Connected Component information and cluster input into sub-article components. The Bayesian model is conditioned on a prior distribution over region features, aiding classification into titles and content. Using a Dirichlet prior we are able to automatically estimate correctly the number of title and article regions. The method is tested on a dataset of digitized historical newspapers, where visual experimental results are very promising.
2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016
Word segmentation refers to the process of defining the word regions of a text line. It is a crit... more Word segmentation refers to the process of defining the word regions of a text line. It is a critical stage towards word and character recognition as well as word spotting and mainly concerns three basic stages, namely preprocessing, distance computation and gap classification. In this paper, we propose a novel word segmentation method which uses the Student’s-t distribution for the gap classification stage. The main advantage of the Student’s-t distribution concerns its robustness to the existence of outliers. In order to test the efficiency of the proposed method we used the four benchmarking datasets of the ICDAR/ICFHR Handwriting Segmentation Contests as well as a historical typewritten dataset of Greek polytonic text. It is observed that the use of mixtures of Student’s-t distributions for word segmentation outperforms other gap classification methods in terms of Recognition Accuracy and F-Measure. Also, in terms of all examined benchmarks, the Student's-t is shown to produce a perfect segmentation result in significantly more cases than the state-of-the-art Gaussian mixture model.
When capturing a document using a digital camera, the resulting document image is often framed by... more When capturing a document using a digital camera, the resulting document image is often framed by a noisy black border or includes noisy text regions from neighbouring pages. In this paper, we present a novel technique for enhancing the document images captured by a digital camera by automatically detecting the document borders and cutting out noisy black borders as well
Article history: Image segmentation is a major task of handwritten document image processing. Man... more Article history: Image segmentation is a major task of handwritten document image processing. Many of the proposed techniques for image segmentation are complementary in the sense that each of them using a different approach can solve different difficult problems such as overlapping, touching components, influence of author or font style etc. In this paper, a combination method of different segmentation
Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14, 2014
ABSTRACT In this paper, we introduce the H-DocPro platform which is a publicly available document... more ABSTRACT In this paper, we introduce the H-DocPro platform which is a publicly available document image processing platform for historical documents. H-DocPro is a result of our recent and ongoing research on historical document image processing and has been developed in order to monitor the successive application of several new or state-of-the-art document image processing methods. It is an open architecture software platform that permits several document image processing modules and methods (e.g. binarization, image enhancement, page split) to be utilized in an easy to define processing workflow. We provide detailed information on how to use H-DocPro, the available modules and methods as well as the way one can add his own components exploiting the open architecture form of the platform. Representative examples and experimental results using large sets of historical document images demonstrate the efficiency of H-DocPro methods.
2010 12th International Conference on Frontiers in Handwriting Recognition, 2010
One of the major issues in document image processing is the efficient creation of ground truth in... more One of the major issues in document image processing is the efficient creation of ground truth in order to be used for training and evaluation purposes. Since a large number of tools have to be trained and evaluated in realistic circumstances, we need to have a quick and low cost way to create the corresponding ground truth. Moreover, the specific
Uploads
Papers by Nikolaos Stamatopoulos