research-article

BYANJON: A Ground Truth Preparation System for Online Handwritten Bangla Documents

Authors:

Shibaprasad Sen,

Ankan Bhattacharyya,

Ram Sarkar,

Kaushik RoyAuthors Info & Claims

Transactions on Asian and Low-Resource Language Information Processing, Volume 20, Issue 6

Article No.: 106, Pages 1 - 16

https://doi.org/10.1145/3464379

Published: 12 August 2021 Publication History

Get Access

Abstract

The work reported in this article deals with the ground truth generation scheme for online handwritten Bangla documents at text-line, word, and stroke levels. The aim of the proposed scheme is twofold: firstly, to build a document level database so that future researchers can use the database to do research in this field. Secondly, the ground truth information will help other researchers to evaluate the performance of their algorithms developed for text-line extraction, word extraction, word segmentation, stroke recognition, and word recognition. The reported ground truth generation scheme starts with text-line extraction from the online handwritten Bangla documents, then words extraction from the text-lines, and finally segmentation of those words into basic strokes. After word segmentation, the basic strokes are assigned appropriate class labels by using modified distance-based feature extraction procedure and the MLP (Multi-layer Perceptron) classifier. The Unicode for the words are then generated from the sequence of stroke labels. XML files are used to store the stroke, word, and text-line levels ground truth information for the corresponding documents. The proposed system is semi-automatic and each step such as text-line extraction, word extraction, word segmentation, and stroke recognition has been implemented by using different algorithms. Thus, the proposed ground truth generation procedure minimizes huge manual intervention by reducing the number of mouse clicks required to extract text-lines, words from the document, and segment the words into basic strokes. The integrated stroke recognition module also helps to minimize the manual labor needed to assign appropriate stroke labels. The freely available and can be accessed at https://byanjon.herokuapp.com/.

References

[1]

S. M. Obaidullah, C. Halder, and K. C. Santosh. 2018. PHDIndic_11: Page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimedia Tools and Applications 77 (2018), 1643–1678.

Abstract

References

Cited By

Index Terms

Recommendations

Stroke Segmentation and Recognition from Bangla Online Handwritten Text

A System for Bangla Online Handwritten Text

CMATERdb1: a database of unconstrained handwritten Bangla and Bangla–English mixed script document image

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations