short-paper

A Hybrid Transformer Network for Detection of Risk Situations on Multimodal Life-Log Health Data

Authors:

Rupayan Mallick,

Jenny Benois-Pineau,

Thinhinane Yebda,

Hélène Amieva,

Laura MiddletonAuthors Info & Claims

ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

Pages 22 - 26

https://doi.org/10.1145/3512731.3534212

Published: 27 June 2022 Publication History

Abstract

The paper is focused on the development of hybrid transformer architectures for the detection of risk events on multimodal data recorded on a person with visual and signal sensors. The proposed two-stream architecture consists of a visual transformer and linear transformer of time series. The linear transformer is benchmarked on the publicly available dataset UCI-HAR. The experiments with our architecture have been conducted on the in-the-wild dataset BIRDS. The hybrid transformer architecture has better empirical performance than the 3D CNNs and RNNs in previous work. The accuracy of detection of risk situations shows an improvement of 10% over the single-stream transformers.

Supplementary Material

MP4 File (282_ICMR_ICDAR.mp4)

This work combines various modalities such as wearable videos and sensor signals for the classification of risk situations for frail subjects in @home environment. It presents an extended taxonomy of risk situations consisting of both sensor and visual modalities to understand the long-term and immediate risks better. This video also highlights the challenges while using these recorded signals and hence the importance of preprocessing this data. Once this data is obtained the hybrid transformer model is put in place after the synchronization of the data. The hybrid transformer architecture tries to extract and exploit features from the two weekly synchronized streams for the classification of the risk situations where the features from sensors complement the video features. The use of both modalities, video and sensors allows for increasing scores up to 10 and 15% compared to single modality usage.

Download
35.97 MB

References

[1]

D. Anguita, A. Ghio, L. Oneto, X. Parra, and Jorge Luis Reyes-Ortiz. 2013. A Public Domain Dataset for Human Activity Recognition using Smartphones. In ESANN.

[2]

Mirza Mansoor Baig, Shereen Afifi, Hamid GholamHosseini, and Farhaan Mirza. 2019. A Systematic Review of Wearable Sensors and IoT-Based Monitoring Applications for Older Adults--a Focus on Ageing Population and Independent Living. Journal of medical systems 43, 8 (2019), 233.

Digital Library

[3]

Susanne Boll, Jeannie S. Lee, Jochen Meyer, Nitish Nag, and Noel E. O'Connor. 2019. HealthMedia'19: 4th International Workshop on Multimedia for Personal Health and Health Care. In ACM Multimedia. ACM, 2720--2721.

[4]

Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.

[5]

Cathal Gurrin, Klaus Schoeffmann, Hideo Joho, Andreas Leibetseder, Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, Michael Riegler, Luca Piras, Minh-Triet Tran, Jakub Lokoc, and Wolfgang Huerst. 2019. [Invited papers] Comparing Approaches to Interactive Lifelog Search at the Lifelog Search Challenge (LSC2018). ITE Transactions on Media Technology and Applications 7, 2 (2019), 46--59. https://doi.org/10.3169/mta.7.46

[6]

Cathal Gurrin, Klaus Schoeffmann, Hideo Joho, and Bernd Munzer. 2019. A Test Collection for Interactive Lifelog Retrieval. In MMM 2019, the 25th International Conference on MultiMedia Modeling. Thessaloniki, Greece.

[7]

Lisa Anne Hendricks, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, and Aida Nematzadeh. 2021. Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers. Transactions of the Association for Computational Linguistics 9 (07 2021), 570--585. https://doi.org/10.1162/tacl_a_00385 arXiv: https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00385/1929720/tacl_a_00385.pdf

[8]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[9]

Carlos Fernando Crispim Junior, Vincent Buso, Konstantinos Avgerinakis, Georgios Meditskos, Alexia Briassouli, Jenny Benois-Pineau, Ioannis Kompatsiaris, and François Brémond. 2016. Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1598--1611.

Digital Library

[10]

Xiangyu Z. Shaoqing R. Kaiming, H. and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770--778.

[11]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster). http://arxiv.org/abs/1412.6980

[12]

Stephen R Lord, Hylton B Menz, and Catherine Sherrington. 2006. Home environment risk factors for falls in older people and the efficacy of home modifications. Age and ageing 35, suppl_2 (2006), ii55--ii59.

[13]

Rupayan Mallick, Thinhinane Yebda, Jenny Benois-Pineau, Akka Zemmari, Marion Pech, and Hélène Amieva. 2021. A GRU Neural Network with attention mechanism for detection of risk situations on multimodal lifelog data. In CBMI. IEEE, 1--6.

[14]

Rupayan Mallick, Thinhinane Yebda, Jenny Benois-Pineau, Akka Zemmari, Marion Pech, and Helene Amieva. 2022. Detection of Risky Situations for Frail Adults with Hybrid Neural Networks on Multimodal Health Data. IEEE MultiMedia (2022), 1--1. https://doi.org/10.1109/MMUL.2022.3147381

[15]

Tasnim M. Newaz N. Kaiser M. Shamim Nahiduzzaman, Md and Mufti Mahmud. 2020. Machine learning based early fall detection for elderly people with neurological disorder using multimodal data fusion. In International Conference on Brain Informatics. Springer, 204--214.

Digital Library

[16]

Tomislav Pozaic, Ulrich Lindemann, Anna-Karina Grebe, and Wilhelm Stork. 2016. Sit-to-stand transition reveals acute fall risk in activities of daily living. IEEE journal of translational engineering in health and medicine 4 (2016), 1--11.

[17]

Madian Khabsa Han Fang Hao Ma Sinong Wang, Belinda Z. Li. 2020. Lin- former: Self-Attention with Linear Complexity. CoRR abs/2006.04768 (2020). arXiv:2006.04768 https://arxiv.org/abs/2006.04768

[18]

Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. 2021. Bottleneck Transformers for Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16519--16529.

[19]

Thanos G Stavropoulos, Asterios Papastergiou, Lampros Mpaltadoros, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2020. IoT wearable sensors and devices in elderly care: a literature review. Sensors 20, 10 (2020), 2826.

[20]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.

[21]

Bian J. Hogan W.R. Wu Y. Yang, X. 2010. Clinical concept extraction using transformers. Jama 303, 3 (2010), 258--266.

[22]

Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amièva, and Cathal Gurrin. 2020. Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail People. In ICMR. ACM, 402--406.

[23]

Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva, Laura Middleton, and Max Bergelt. 2021. Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments. In MMM (2) (Lecture Notes in Computer Science, Vol. 12573). Springer, 342--353.

Recommendations

IFI: Interpreting for Improving: A Multimodal Transformer with an Interpretability Technique for Recognition of Risk Events
MultiMedia Modeling
Abstract
Methods of Explainable AI (XAI) are popular for understanding the features and decisions of neural networks. Transformers used for single modalities such as videos, texts, or signals as well as multi-modal data can be considered as a state-of-the-...
Partial Discharge Detection of Transformer Winding
AIAM2021: 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture

Transformers are irreplaceable in the power system. However, partial discharge may be caused due to the defects of the transformer itself and the deterioration of the insulation, including winding short-circuit, core overvoltage and overcurrent. These ...
A Transformer Architecture for Stress Detection from ECG
ISWC '21: Proceedings of the 2021 ACM International Symposium on Wearable Computers

Electrocardiogram (ECG) has been widely used for emotion recognition. This paper presents a deep neural network based on convolutional layers and a transformer mechanism to detect stress using ECG signals. We perform leave-one-subject-out experiments ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval

June 2022

80 pages

ISBN:9781450392419

DOI:10.1145/3512731

General Chair:
Minh-Son Dao
National Institute of Information and Communications Technology, Japan
,
Program Chairs:
Duc-Tien Dang-Nguyen
Bergen University, Norway
,
Michael Riegler
Simula Metropolitan Center for Digital Engineering, Norway

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

MESR and French National ANRT Fund

Conference

ICMR '22

Sponsor:

SIGMM

ICMR '22: International Conference on Multimedia Retrieval

June 27 - 30, 2022

NJ, Newark, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
72
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents