Multimodal Information Coding System for Wearable Devices of Advanced Uniform

Ronzhin, Andrey L.; Basov, Oleg O.; Motienko, Anna I.; Karpov, Alexey A.; Mikhailov, Yuri V.; Zelezny, Milos

doi:10.1007/978-3-319-40349-6_52

Andrey L. Ronzhin²,
Oleg O. Basov³,
Anna I. Motienko²,
Alexey A. Karpov²,
Yuri V. Mikhailov² &
…
Milos Zelezny⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9734))

Included in the following conference series:

International Conference on Human Interface and the Management of Information

1951 Accesses

Abstract

The paper presents a mathematical model of a subsystem for multimodal information coding. Analytical expressions for the quality and speed of information transmission are obtained. The results of experimental studies of the developed multimodal information coding system are presented. The requirements for using the developed model and system for data processing in wearable devices of advanced uniform are discussed.

You have full access to this open access chapter, Download conference paper PDF

Toward a Taxonomy of Wearable Technologies in Healthcare

Internet of Wearable Things Systems: Comprehensive Review

Design and Implementation of a Wearable System for Information Monitoring

Keywords

1 Introduction

Nowadays, there is a need to improve speech compression, to increase the efficiency of using network resources of a mobile communication system, and to consider new features of modern communication systems development. Therefore, it is of great importance to work out new coding methods and algorithms as well as to improve the existing ones. The solution to this problem is connected with the creation of the effective multimodal information coding systems for wearable devices of advanced.

Even now, there are prerequisites to reject the traditional principles for separating transmitted information into communication services and to implement polymodal infocommunication systems (PICS). Such systems imply a coherent set of data processing and information storage, telecommunication networks, which operate under a single management for the purpose of collecting, processing, storage, protection, transmission and distribution, display and use of multimodal information, taking into account the meaning of the transmitted messages, the identity of users, their mood, physiological and psycho-emotional state. To estimate user state, parameters are implemented of non-invasive registration methods functioning during communication and duty work of users [1–9].

It should be noted that a multimodal information coding system is one of the most important elements of PICS, as the quality indicators of the procedure for multimodal information coding determine the upper bound to the quality of communication provided in the chain “soldier-squad-platoon-company” [10]. Unlike traditional telecommunications, transmission of information via PICS is carried out in the form of a set of signals of the modalities corresponding to the main channels of interpersonal communication [11]. The signals of the individual modalities (speech, lip movement, eye movement, movement of the facial muscles, gestures, handwritten keyboard input, or input via sensors), processed in the subscriber terminal of the soldier, are transmitted together through hardware and software means of communication and further along the existing communication channels of data network.

It is obvious that the maximum amount of information is transferred from the subscriber to the subscriber via visual and acoustic communication channels. However, depending on the situation, there is always a technical possibility to allocate bandwidth to transmit additional data by means of efficient compression of the signals of modalities.

The analysis shows that the following contradictory objectives are the most relevant to a multimodal information coding system for wearable devices of advanced uniform:

to increase the quality of encoding messages while maintaining one of the speeds of multimodal information transmission;
to adaptively reduce the speed under conditions of multimodal information transmission via data networks with varying parameters (e.g., along radio channels) without degradation of the qualitative assessments of encoding messages.

The solution to the formulated tasks will improve the efficiency of interpersonal communication in the chain “soldier-squad-platoon-company” by increasing the amount of information about military personnel state necessary for making adequate managerial decisions during combat operations. However, such an approach is in conflict with the traditional principle of providing communication services and informatization to the users of a data transmission network. It is required to validate scientific and methodical modeling tools, including mathematical ones; to determine potential and ultimate characteristics of PICS for the total number of modalities in each managerial situation; to choose an apparatus for estimating parameters of data network during multimodal information transmission.

2 Model of Multimodal Information Coding System

The synthesis of PICS requires the development of theoretical models to assess the quality of encoding multimodal information with given resources of data transmission network (DTN) and a volume required for transmission of the maximum number of messages of different modalities with the specified quality and rationale for the choice of methods of transmitting such messages based on simulation results.

In view of the existing models of speech and video codecs, a set-theoretic model of a multimodal information coding system with multi-parameter adaptation can be represented as follows (Fig. 1). We consider the parametric coding of redundant messages and propose a universal representation of signals of different modalities in the following form. The input of a multimodal information coding system can be described as the sets of messages from sources of different modalities $ \left\{ {\vec{A}_{w} } \right\} $, $ w = \overline{1,W} . $ The internal sub-system parameters are:

(1)
source (modalities) number $ W $;
(2)
number $ K_{w} $ of the values $ w = \overline{1,W} $ of a random variable describing the w-th message source in analysis period $ T_{A} $;
(3)
mapping type of a parametric analysis $ G_{PA} $, determined by a mode of creation, combination and mapping of a parameters set $ \vec{X}_{w} $ of the analyzed messages $ \vec{A}_{w} $;
(4)
number $ p_{w} ,w = \overline{1,W} $ and a representation of coding parameters $ \vec{X}_{w,j} ,j = \overline{{1,p_{w} }} $ of multimodal information sources;
(5)
mapping mode $ G_{Kw,j} $, $ w = \overline{1,W} $, $ j = \overline{{1,p_{w} }} $, which determines the quantization procedures of the observed parameters of multimodal information sources;
(6)
number $ o_{j} $ of quantization levels for each of the observed parameters of information sources $ \left( {j = \overline{{1,p_{1} + \ldots + p_{W} }} } \right) $, which determine the cardinality of subsets of coding parameters $ \left\{ {C_{j} } \right\} $ at DTN input;
(7)
mapping mode of a statistical analysis $ G_{SA} $, which determines the classification procedure of initial messages from $ W $ sources;
(8)
number $ H $ of states of user terminal $ \overline{CMT} $, which determines a set of information capacity distribution modes of a communication channel by sets of coding parameters of $ W $ sources, where
$$ \begin{aligned} \overline{CMT} = \left\{ {No_{{G_{K1,1} }} } \right\} \times \ldots \times \left\{ {No_{{G_{K1,p} }} } \right\} \times \ldots \times \left\{ {No_{{G_{KW,1} }} } \right\} \times \ldots \times \left\{ {No_{{G_{KW,pW} }} } \right\} \hfill \\ \quad \quad \quad \times \left\{ {o_{1} } \right\} \times \ldots \times \left\{ {o_{p} } \right\} \times \ldots \times \left\{ {o_{p1 + \ldots + pW - 1} + 1} \right\} \times \ldots \times \left\{ {o_{p1 + \ldots + pW} } \right\} \hfill \\ \end{aligned} $$

where $ No_{{G_{KW,pW} }} $ is a number of a mapping $ G_{Kw,j} $, $ w = \overline{1,W} $, $ j = \overline{{1,p_{1} + \ldots + p_{W} }} $. Values $ o_{j} = 1 $ provide possibility of exclusion of j subspaces from structure of space of coding parameters. In a particular case of one-to-one correspondence between the values $ o_{j} $ and mappings $ G_{Kw,j} $, the set $ \overline{CMT} $ will be as:

$$ \overline{CMT} = \left\{ {o_{1} } \right\} \times \ldots \times \left\{ {o_{p} } \right\} \times \ldots \times \left\{ {o_{p1 + \ldots + pW - 1} + 1} \right\} \times \ldots \times \left\{ {o_{p1 + \ldots + pW} } \right\} $$

.

For a highly adaptive system, the number of possible states of the user terminal can be infinitely large: $ H = \left| {\overline{CMT} } \right| = \infty $.

Given the distortions produced in the encoding and the impact of the communication channel defined by mapping $ G_{CH} $, with known mapping $ G_{DK} $, which uniquely determines a decoding procedure, the mathematical description of the entire coding subsystem relative to the external parameter characterizing the quality of encoding messages of different modalities can be represented as follows:

$$ D_{\text{cym}} = \sum\limits_{w = 1}^{W} {\left( {\frac{{D_{w} \left[ {\left\{ {\vec{A}} \right\},\left\{ {\hat{\vec{A}}} \right\}} \right]}}{{P_{{C_{w} }} }} + \sigma \sum\limits_{q = 1}^{W} {\left( {\frac{{D_{w} \left[ {\left\{ {\vec{A}} \right\},\left\{ {\hat{\vec{A}}} \right\}} \right]}}{{P_{{C_{w} }} }} - \frac{{D_{q} \left[ {\left\{ {\vec{A}} \right\},\left\{ {\hat{\vec{A}}} \right\}} \right]}}{{P_{{C_{q} }} }}} \right)}^{2} } \right)} , $$

where $ D_{w} \left[ {\left\{ {\vec{A}} \right\},\left\{ {\hat{\vec{A}}} \right\}} \right] $, $ w = \overline{1,W} $ is the mean square error between the sets of the initial and recovered messages or noise energy at restoring messages of w source; $ \sigma $ is the empirical coefficient that determines the degree of the influence of penalty; $ P_{{{\text{C}}_{w} }} = \sum\limits_{i = 1}^{U} {\vec{A}_{{{\kern 1pt} w{\kern 1pt} i}}^{T} \vec{A}{\kern 1pt}_{{w{\kern 1pt} i}} } $ is message energy of the w-th source.

The above-mentioned list of the internal parameters of a multimodal information coding system allows us to formulate the mathematical model of such a system in relation to the external parameter – transmission speed (information output):

$$ B_{ (W )} = \frac{{\log \prod\limits_{w = 1}^{W} {\prod\limits_{{j = p_{w - 1} + 1}}^{{p_{w} }} {o_{j} } + r + \log d_{\phi } + \log H} }}{{T_{A} }}\;{\text{bit/s}}, $$

where $ \begin{aligned} r = \left\lceil {\log_{2} \left( {1 + \sum\limits_{i = 1}^{{t_{\text{INT}} }} {C_{n}^{i} } } \right)} \right\rceil \hfill \\ \hfill \\ \end{aligned} $ is a number of check bits, and $ t_{\text{INT}} $ is the correcting ability of an error-correcting code; $ C_{n}^{i} $ is the number of combinations of $ n $ by $ i $; $ d_{\varphi } = \left| {\left\{ {{\kern 1pt} \varphi } \right\}} \right| $ is the cardinality of a sub-space $ \left\{ {{\kern 1pt} \varphi } \right\} \in \left\{ {\,\vec{D}} \right\} $, characterizing the set of error-correcting coding modes.

Thus, the obtained formalisms determine the important internal functional features of DTN taking into account the patterns of transmission of data blocks corresponding to the active modalities. The proposed mathematical model of information modality processing is being introduced in the telecommunication system connecting distributed users and a central control station.

3 Coding System for a Multimodal Speech Signal

The coding system for a multimodal speech signal (Fig. 2), developed based on the presented mathematical model is shown in [12]. A set of subjective evaluations (Table 1) shows that introduction of adaption procedure to multimodal speech coding system ensured a high quality reconstruction of the speech signal for transmission speeds $ \,\left\{ V \right\} $. Furthermore, tests of the subjective listening of the speech signal, encoded (decoded) with the use of algorithms №. 1–3, indicate marked superiority of intelligibility and naturalness of the synthesized speech. They also show superiority of the speaker’s voice recognition in comparison with algorithms on FS1015, FS1017 and FS1016 standards correspondingly; at the same time with the use of algorithm №. 4 they point out quality, comparable to standard Full-rate GSM (13 kbps).

Table 1. Qualitative characteristics of the multimodal speech coding system

Full size table

Peak computational complexity $ Q $ of algorithms is calculated with allowance for the need to fulfill the required number of operations in real time for maximum volumes of VQ codebooks. The transition to the adaptive coding demanded substantial (about twice as much) increase in the amount $ W $ of stored information in memory devices due to the need to store the new program segments and additional variants of codebooks. At the heart of improving the quality parameters of the developed algorithms is an in-depth analysis of the speech signal frame and adaption to their parameters, characterized by increased computational complexity of procedures for speech coding. At the hardware level the developed algorithms, in comparison with the similar standard algorithms, require increased efficiency of estimators and additional capacity of memory elements.

4 Conclusion

The wearable and embedded devices of user uniform have limited energy resources that impose constraints on using sensors and methods of data processing and communication. For this reason, the proposed model providing energy optimization during communication via several information modalities is useful for application, and the proposed multimodal speech coding system will be implemented for the organization of military communications in the chain “soldier-squad-platoon-company”. Increasing the number of transmitted modalities will allow solving a range of other important practical problems, for example, improving the quality of identification of psycho-physical state of the soldier and other tasks [13–16].

References

Gregory, F.D., Dai, L.: Multisensory information processing for enhanced human-machine symbiosis. In: Yamamoto, S., Abbott, A.A. (eds.) HIMI 2015. LNCS, vol. 9172, pp. 354–365. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20612-7_34
Chapter Google Scholar
Goldberg, D.H., Vogelstein, R., Socolinsky, D.A., Wolff, L.B.: Toward a wearable, neurally-enhanced augmented reality system. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) FAC 2011. LNCS, vol. 6780, pp. 493–499. Springer, Heidelberg (2011)
Google Scholar
Tao, X.: Handbook of Smart Textiles. Springer, Singapore (2015)
Book Google Scholar
Meng, F., Spence, C.: Tactile warning signals for in-vehicle systems. Accid. Anal. Prev. 75, 333–346 (2015)
Article Google Scholar
White, T.L., Krausman, A.S.: Effects of inter-stimulus interval and intensity on the perceived urgency of tactile patterns. Appl. Ergon. 48, 121–129 (2015)
Article Google Scholar
Ayuso, A.J.R., Lopez-Soler, J.M.: Speech Recognition and Coding: New Advances and Trends. NATO ASI Series, vol. 147. Springer, Berlin (1995). Germany, 464 p.
Book Google Scholar
Karpov, A., Ronzhin, A.: A universal assistive technology with multimodal input and multimedia output interfaces. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part I. LNCS, vol. 8513, pp. 369–378. Springer, Heidelberg (2014)
Google Scholar
Karpov, A., Akarun, L., Yalçın, H., Ronzhin, Al., Demiröz, B., Çoban, A., Zelezny, M.: Audio-visual signal processing in a multimodal assisted living environment. In: Proceeding of 15th International Conference INTERSPEECH-2014, Singapore, pp. 1023–1027 (2014)
Google Scholar
Karpov, A., Ronzhin, A., Kipyatkova, I.: An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. LNCS, vol. 6762, pp. 454–463. Springer, Heidelberg (2011)
Google Scholar
http://yourtactic.com/news/view/12
Basov, O.O.: Reasoning of the transition to polymodal infocommunicational systems. In: Distributed Computer and Communication Networks: Control, Computation, Communication. – DCCN-2015, pp. 19–22, October 2015
Google Scholar
Saveliev, A., Basov, O., Ronzhin, A., Ronzhin, A.: Algorithms for low bit-rate coding with adaptation to statistical characteristics of speech signal. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 65–72. Springer, Heidelberg (2015)
Chapter Google Scholar
Balatskaya, L.N., Choinzonov, E.L., Chizevskaya, Svetlana Yu., Kostyuchenko, E.U., Meshcheryakov, R.V.: Software for assessing voice quality in rehabilitation of patients after surgical treatment of cancer of oral cavity, oropharynx and upper jaw. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 294–301. Springer, Heidelberg (2013)
Chapter Google Scholar
Volf, D., Meshcheryakov, R., Kharchenko, S.: The singular estimation pitch tracker. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 454–462. Springer, Heidelberg (2015)
Chapter Google Scholar
Karpov, A., Ronzhin, A., Kipyatkova, I.: An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. LNCS, vol. 6762, pp. 454–463. Springer, Heidelberg (2011)
Google Scholar
Potapova, R., Komalova, L., Bobrov, N.: Acoustic markers of emotional state “aggression”. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 55–64. Springer, Heidelberg (2015)
Chapter Google Scholar

Download references

Acknowledgments

This work is partially supported by the Russian Foundation for Basic Research (grants № 16-08-00696-a, 15-07-06774-a).

Author information

Authors and Affiliations

SPIIRAS, 39, 14th Line, St. Petersburg, 199178, Russia
Andrey L. Ronzhin, Anna I. Motienko, Alexey A. Karpov & Yuri V. Mikhailov
Academy of FAP of Russia, 35, Priborostroitelnaya, Orel, 302034, Russia
Oleg O. Basov
University of West Bohemia, Pilsen, Czech Republic
Milos Zelezny

Authors

Andrey L. Ronzhin
View author publications
You can also search for this author in PubMed Google Scholar
Oleg O. Basov
View author publications
You can also search for this author in PubMed Google Scholar
Anna I. Motienko
View author publications
You can also search for this author in PubMed Google Scholar
Alexey A. Karpov
View author publications
You can also search for this author in PubMed Google Scholar
Yuri V. Mikhailov
View author publications
You can also search for this author in PubMed Google Scholar
Milos Zelezny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey L. Ronzhin .

Editor information

Editors and Affiliations

Tokyo University of Science , Tokyo, Japan
Sakae Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ronzhin, A.L., Basov, O.O., Motienko, A.I., Karpov, A.A., Mikhailov, Y.V., Zelezny, M. (2016). Multimodal Information Coding System for Wearable Devices of Advanced Uniform. In: Yamamoto, S. (eds) Human Interface and the Management of Information: Information, Design and Interaction. HIMI 2016. Lecture Notes in Computer Science(), vol 9734. Springer, Cham. https://doi.org/10.1007/978-3-319-40349-6_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-40349-6_52
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40348-9
Online ISBN: 978-3-319-40349-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Information Coding System for Wearable Devices of Advanced Uniform

Abstract

Similar content being viewed by others

Toward a Taxonomy of Wearable Technologies in Healthcare

Internet of Wearable Things Systems: Comprehensive Review

Design and Implementation of a Wearable System for Information Monitoring

Keywords

1 Introduction

2 Model of Multimodal Information Coding System

3 Coding System for a Multimodal Speech Signal

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multimodal Information Coding System for Wearable Devices of Advanced Uniform

Abstract

Similar content being viewed by others

Toward a Taxonomy of Wearable Technologies in Healthcare

Internet of Wearable Things Systems: Comprehensive Review

Design and Implementation of a Wearable System for Information Monitoring

Keywords

1 Introduction

2 Model of Multimodal Information Coding System

3 Coding System for a Multimodal Speech Signal

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation