Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3512731.3534212acmconferencesArticle/Chapter ViewAbstractPublication PagesicdarConference Proceedingsconference-collections
short-paper

A Hybrid Transformer Network for Detection of Risk Situations on Multimodal Life-Log Health Data

Published: 27 June 2022 Publication History

Abstract

The paper is focused on the development of hybrid transformer architectures for the detection of risk events on multimodal data recorded on a person with visual and signal sensors. The proposed two-stream architecture consists of a visual transformer and linear transformer of time series. The linear transformer is benchmarked on the publicly available dataset UCI-HAR. The experiments with our architecture have been conducted on the in-the-wild dataset BIRDS. The hybrid transformer architecture has better empirical performance than the 3D CNNs and RNNs in previous work. The accuracy of detection of risk situations shows an improvement of 10% over the single-stream transformers.

Supplementary Material

MP4 File (282_ICMR_ICDAR.mp4)
This work combines various modalities such as wearable videos and sensor signals for the classification of risk situations for frail subjects in @home environment. It presents an extended taxonomy of risk situations consisting of both sensor and visual modalities to understand the long-term and immediate risks better. This video also highlights the challenges while using these recorded signals and hence the importance of preprocessing this data. Once this data is obtained the hybrid transformer model is put in place after the synchronization of the data. The hybrid transformer architecture tries to extract and exploit features from the two weekly synchronized streams for the classification of the risk situations where the features from sensors complement the video features. The use of both modalities, video and sensors allows for increasing scores up to 10 and 15% compared to single modality usage.

References

[1]
D. Anguita, A. Ghio, L. Oneto, X. Parra, and Jorge Luis Reyes-Ortiz. 2013. A Public Domain Dataset for Human Activity Recognition using Smartphones. In ESANN.
[2]
Mirza Mansoor Baig, Shereen Afifi, Hamid GholamHosseini, and Farhaan Mirza. 2019. A Systematic Review of Wearable Sensors and IoT-Based Monitoring Applications for Older Adults--a Focus on Ageing Population and Independent Living. Journal of medical systems 43, 8 (2019), 233.
[3]
Susanne Boll, Jeannie S. Lee, Jochen Meyer, Nitish Nag, and Noel E. O'Connor. 2019. HealthMedia'19: 4th International Workshop on Multimedia for Personal Health and Health Care. In ACM Multimedia. ACM, 2720--2721.
[4]
Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.
[5]
Cathal Gurrin, Klaus Schoeffmann, Hideo Joho, Andreas Leibetseder, Liting Zhou, Aaron Duane, Duc-Tien Dang-Nguyen, Michael Riegler, Luca Piras, Minh-Triet Tran, Jakub Lokoc, and Wolfgang Huerst. 2019. [Invited papers] Comparing Approaches to Interactive Lifelog Search at the Lifelog Search Challenge (LSC2018). ITE Transactions on Media Technology and Applications 7, 2 (2019), 46--59. https://doi.org/10.3169/mta.7.46
[6]
Cathal Gurrin, Klaus Schoeffmann, Hideo Joho, and Bernd Munzer. 2019. A Test Collection for Interactive Lifelog Retrieval. In MMM 2019, the 25th International Conference on MultiMedia Modeling. Thessaloniki, Greece.
[7]
Lisa Anne Hendricks, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, and Aida Nematzadeh. 2021. Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers. Transactions of the Association for Computational Linguistics 9 (07 2021), 570--585. https://doi.org/10.1162/tacl_a_00385 arXiv: https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00385/1929720/tacl_a_00385.pdf
[8]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[9]
Carlos Fernando Crispim Junior, Vincent Buso, Konstantinos Avgerinakis, Georgios Meditskos, Alexia Briassouli, Jenny Benois-Pineau, Ioannis Kompatsiaris, and François Brémond. 2016. Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1598--1611.
[10]
Xiangyu Z. Shaoqing R. Kaiming, H. and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770--778.
[11]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster). http://arxiv.org/abs/1412.6980
[12]
Stephen R Lord, Hylton B Menz, and Catherine Sherrington. 2006. Home environment risk factors for falls in older people and the efficacy of home modifications. Age and ageing 35, suppl_2 (2006), ii55--ii59.
[13]
Rupayan Mallick, Thinhinane Yebda, Jenny Benois-Pineau, Akka Zemmari, Marion Pech, and Hélène Amieva. 2021. A GRU Neural Network with attention mechanism for detection of risk situations on multimodal lifelog data. In CBMI. IEEE, 1--6.
[14]
Rupayan Mallick, Thinhinane Yebda, Jenny Benois-Pineau, Akka Zemmari, Marion Pech, and Helene Amieva. 2022. Detection of Risky Situations for Frail Adults with Hybrid Neural Networks on Multimodal Health Data. IEEE MultiMedia (2022), 1--1. https://doi.org/10.1109/MMUL.2022.3147381
[15]
Tasnim M. Newaz N. Kaiser M. Shamim Nahiduzzaman, Md and Mufti Mahmud. 2020. Machine learning based early fall detection for elderly people with neurological disorder using multimodal data fusion. In International Conference on Brain Informatics. Springer, 204--214.
[16]
Tomislav Pozaic, Ulrich Lindemann, Anna-Karina Grebe, and Wilhelm Stork. 2016. Sit-to-stand transition reveals acute fall risk in activities of daily living. IEEE journal of translational engineering in health and medicine 4 (2016), 1--11.
[17]
Madian Khabsa Han Fang Hao Ma Sinong Wang, Belinda Z. Li. 2020. Lin- former: Self-Attention with Linear Complexity. CoRR abs/2006.04768 (2020). arXiv:2006.04768 https://arxiv.org/abs/2006.04768
[18]
Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. 2021. Bottleneck Transformers for Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16519--16529.
[19]
Thanos G Stavropoulos, Asterios Papastergiou, Lampros Mpaltadoros, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2020. IoT wearable sensors and devices in elderly care: a literature review. Sensors 20, 10 (2020), 2826.
[20]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.
[21]
Bian J. Hogan W.R. Wu Y. Yang, X. 2010. Clinical concept extraction using transformers. Jama 303, 3 (2010), 258--266.
[22]
Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amièva, and Cathal Gurrin. 2020. Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail People. In ICMR. ACM, 402--406.
[23]
Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva, Laura Middleton, and Max Bergelt. 2021. Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments. In MMM (2) (Lecture Notes in Computer Science, Vol. 12573). Springer, 342--353.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval
June 2022
80 pages
ISBN:9781450392419
DOI:10.1145/3512731
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification.
  2. multimodal data
  3. risk detection
  4. transformers

Qualifiers

  • Short-paper

Funding Sources

  • MESR and French National ANRT Fund

Conference

ICMR '22
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 72
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media