Search | arXiv e-print repository

Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model

Authors: Till Grutschus, Ola Karrar, Emir Esenov, Ekta Vats

Abstract: This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video and leverages a pretrained vision transformer for multi-class action detection, with classes: "Fall", "Lying" and "Other/Activities of daily living (ADL)". A method for temporal action localization that relies on a simple cutup of untrimmed videos is… ▽ More This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video and leverages a pretrained vision transformer for multi-class action detection, with classes: "Fall", "Lying" and "Other/Activities of daily living (ADL)". A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated. The methodology includes a preprocessing pipeline that converts datasets with timestamp action annotations into labeled datasets of short action clips. Simple and effective clip-sampling strategies are introduced. The effectiveness of the proposed method has been empirically evaluated on the publicly available High-Quality Fall Simulation Dataset (HQFSD). The experimental results validate the performance of the proposed pipeline. The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset under the given experimental settings. The source code will be made available on GitHub. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2303.05929 [pdf, other]

Uncovering the Handwritten Text in the Margins: End-to-end Handwritten Text Detection and Recognition

Authors: Liang Cheng, Jonas Frankemölle, Adam Axelsson, Ekta Vats

Abstract: The pressing need for digitization of historical documents has led to a strong interest in designing computerised image processing methods for automatic handwritten text recognition. However, not much attention has been paid on studying the handwritten text written in the margins, i.e. marginalia, that also forms an important source of information. Nevertheless, training an accurate and robust rec… ▽ More The pressing need for digitization of historical documents has led to a strong interest in designing computerised image processing methods for automatic handwritten text recognition. However, not much attention has been paid on studying the handwritten text written in the margins, i.e. marginalia, that also forms an important source of information. Nevertheless, training an accurate and robust recognition system for marginalia calls for data-efficient approaches due to the unavailability of sufficient amounts of annotated multi-writer texts. Therefore, this work presents an end-to-end framework for automatic detection and recognition of handwritten marginalia, and leverages data augmentation and transfer learning to overcome training data scarcity. The detection phase involves investigation of R-CNN and Faster R-CNN networks. The recognition phase includes an attention-based sequence-to-sequence model, with ResNet feature extraction, bidirectional LSTM-based sequence modeling, and attention-based prediction of marginalia. The effectiveness of the proposed framework has been empirically evaluated on the data from early book collections found in the Uppsala University Library in Sweden. Source code and pre-trained models are available at Github. △ Less

Submitted 29 January, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2201.09633 [pdf, other]

Paired Image to Image Translation for Strikethrough Removal From Handwritten Words

Authors: Raphaela Heil, Ekta Vats, Anders Hast

Abstract: Transcribing struck-through, handwritten words, for example for the purpose of genetic criticism, can pose a challenge to both humans and machines, due to the obstructive properties of the superimposed strokes. This paper investigates the use of paired image to image translation approaches to remove strikethrough strokes from handwritten words. Four different neural network architectures are exami… ▽ More Transcribing struck-through, handwritten words, for example for the purpose of genetic criticism, can pose a challenge to both humans and machines, due to the obstructive properties of the superimposed strokes. This paper investigates the use of paired image to image translation approaches to remove strikethrough strokes from handwritten words. Four different neural network architectures are examined, ranging from a few simple convolutional layers to deeper ones, employing Dense blocks. Experimental results, obtained from one synthetic and one genuine paired strikethrough dataset, confirm that the proposed paired models outperform the CycleGAN-based state of the art, while using less than a sixth of the trainable parameters. △ Less

Submitted 1 April, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

Comments: accepted at DAS2022

arXiv:2201.09390 [pdf, other]

AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks

Authors: Dmitrijs Kass, Ekta Vats

Abstract: This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based se… ▽ More This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence modeling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experimental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases. Source code and pre-trained models are available at https://github.com/dmitrijsk/AttentionHTR. △ Less

Submitted 12 September, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

Comments: 15th IAPR International Workshop on Document Analysis System (DAS)

arXiv:1801.05367 [pdf, other]

doi 10.1007/978-3-319-73165-0_8

TexT - Text Extractor Tool for Handwritten Document Transcription and Annotation

Authors: Anders Hast, Per Cullhed, Ekta Vats

Abstract: This paper presents a framework for semi-automatic transcription of large-scale historical handwritten documents and proposes a simple user-friendly text extractor tool, TexT for transcription. The proposed approach provides a quick and easy transcription of text using computer assisted interactive technique. The algorithm finds multiple occurrences of the marked text on-the-fly using a word spott… ▽ More This paper presents a framework for semi-automatic transcription of large-scale historical handwritten documents and proposes a simple user-friendly text extractor tool, TexT for transcription. The proposed approach provides a quick and easy transcription of text using computer assisted interactive technique. The algorithm finds multiple occurrences of the marked text on-the-fly using a word spotting system. TexT is also capable of performing on-the-fly annotation of handwritten text with automatic generation of ground truth labels, and dynamic adjustment and correction of user generated bounding box annotations with the word being perfectly encapsulated. The user can view the document and the found words in the original form or with background noise removed for easier visualization of transcription results. The effectiveness of TexT is demonstrated on an archival manuscript collection from well-known publicly available dataset. △ Less

Submitted 22 November, 2017; originally announced January 2018.

Journal ref: Digital Libraries and Multimedia Archives. IRCDL 2018. Communications in Computer and Information Science, vol 806. Springer, Cham

arXiv:1712.03738 [pdf, other]

doi 10.1109/DAS.2018.14

Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing

Authors: Prashant Singh, Ekta Vats, Anders Hast

Abstract: Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given docume… ▽ More Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given document quality metric on existing datasets where ground truth images are available. The trained surrogate model can later be used to predict the metric value on previously unseen document images without requiring access to ground truth images. The surrogate model is empirically evaluated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets. △ Less

Submitted 11 December, 2017; originally announced December 2017.

arXiv:1709.01788 [pdf, other]

doi 10.24132/JWSCG.2018.26.1.4

Radial Line Fourier Descriptor for Historical Handwritten Text Representation

Authors: Anders Hast, Ekta Vats

Abstract: Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exist… ▽ More Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exist popular feature descriptors such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), the invariant properties of these descriptors amplify the noise in the degraded document images, rendering them more sensitive to noise and complex characteristics of historical manuscripts. Therefore, an efficient and relaxed feature descriptor is required as handwritten words across different documents are indeed similar, but not identical. This paper introduces a Radial Line Fourier (RLF) descriptor for handwritten word representation, with a short feature vector of 32 dimensions. A segmentation-free and training-free handwritten word spotting method is studied herein that relies on the proposed RLF descriptor, takes into account different keypoint representations and uses a simple preconditioner-based feature matching algorithm. The effectiveness of the RLF descriptor for segmentation-free handwritten word spotting is empirically evaluated on well-known historical handwritten datasets using standard evaluation measures. △ Less

Submitted 20 March, 2018; v1 submitted 6 September, 2017; originally announced September 2017.

Comments: under review

arXiv:1709.01782 [pdf, other]

doi 10.1145/3151509.3151520

Automatic Document Image Binarization using Bayesian Optimization

Authors: Ekta Vats, Anders Hast, Prashant Singh

Abstract: Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The propo… ▽ More Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets. △ Less

Submitted 21 October, 2017; v1 submitted 6 September, 2017; originally announced September 2017.

Journal ref: 4th International Workshop on Historical Document Imaging and Processing (HIP2017). ACM, New York, NY, USA, 89-94

arXiv:1709.01775 [pdf, other]

doi 10.1109/ICDAR.2017.374

On-the-fly Historical Handwritten Text Annotation

Authors: Ekta Vats, Anders Hast

Abstract: The performance of information retrieval algorithms depends upon the availability of ground truth labels annotated by experts. This is an important prerequisite, and difficulties arise when the annotated ground truth labels are incorrect or incomplete due to high levels of degradation. To address this problem, this paper presents a simple method to perform on-the-fly annotation of degraded histori… ▽ More The performance of information retrieval algorithms depends upon the availability of ground truth labels annotated by experts. This is an important prerequisite, and difficulties arise when the annotated ground truth labels are incorrect or incomplete due to high levels of degradation. To address this problem, this paper presents a simple method to perform on-the-fly annotation of degraded historical handwritten text in ancient manuscripts. The proposed method aims at quick generation of ground truth and correction of inaccurate annotations such that the bounding box perfectly encapsulates the word, and contains no added noise from the background or surroundings. This method will potentially be of help to historians and researchers in generating and correcting word labels in a document dynamically. The effectiveness of the annotation method is empirically evaluated on an archival manuscript collection from well-known publicly available datasets. △ Less

Submitted 6 September, 2017; originally announced September 2017.

Journal ref: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Volume 8, IEEE, Kyoto, Japan, 2017, pp. 10-14

arXiv:1412.0439 [pdf, ps, other]

doi 10.1016/j.patcog.2014.11.016

Fuzzy human motion analysis: A review

Authors: Chern Hong Lim, Ekta Vats, Chee Seng Chan

Abstract: Human Motion Analysis (HMA) is currently one of the most popularly active research domains as such significant research interests are motivated by a number of real world applications such as video surveillance, sports analysis, healthcare monitoring and so on. However, most of these real world applications face high levels of uncertainties that can affect the operations of such applications. Hence… ▽ More Human Motion Analysis (HMA) is currently one of the most popularly active research domains as such significant research interests are motivated by a number of real world applications such as video surveillance, sports analysis, healthcare monitoring and so on. However, most of these real world applications face high levels of uncertainties that can affect the operations of such applications. Hence, the fuzzy set theory has been applied and showed great success in the recent past. In this paper, we aim at reviewing the fuzzy set oriented approaches for HMA, individuating how the fuzzy set may improve the HMA, envisaging and delineating the future perspectives. To the best of our knowledge, there is not found a single survey in the current literature that has discussed and reviewed fuzzy approaches towards the HMA. For ease of understanding, we conceptually classify the human motion into three broad levels: Low-Level (LoL), Mid-Level (MiL), and High-Level (HiL) HMA. △ Less

Submitted 2 December, 2014; v1 submitted 1 December, 2014; originally announced December 2014.

Comments: Accepted in Pattern Recognition, first survey paper that discusses and reviews fuzzy approaches towards HMA

Journal ref: Pattern Recognition 48(5) 2015 1773-1796

Showing 1–10 of 10 results for author: Vats, E