Abstract
The paper presents a mathematical model of a subsystem for multimodal information coding. Analytical expressions for the quality and speed of information transmission are obtained. The results of experimental studies of the developed multimodal information coding system are presented. The requirements for using the developed model and system for data processing in wearable devices of advanced uniform are discussed.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Multimodal information
- Coding algorithms
- Uniform
- Wearable devices
- Data transmission
- Energy consumption reduction
1 Introduction
Nowadays, there is a need to improve speech compression, to increase the efficiency of using network resources of a mobile communication system, and to consider new features of modern communication systems development. Therefore, it is of great importance to work out new coding methods and algorithms as well as to improve the existing ones. The solution to this problem is connected with the creation of the effective multimodal information coding systems for wearable devices of advanced.
Even now, there are prerequisites to reject the traditional principles for separating transmitted information into communication services and to implement polymodal infocommunication systems (PICS). Such systems imply a coherent set of data processing and information storage, telecommunication networks, which operate under a single management for the purpose of collecting, processing, storage, protection, transmission and distribution, display and use of multimodal information, taking into account the meaning of the transmitted messages, the identity of users, their mood, physiological and psycho-emotional state. To estimate user state, parameters are implemented of non-invasive registration methods functioning during communication and duty work of users [1–9].
It should be noted that a multimodal information coding system is one of the most important elements of PICS, as the quality indicators of the procedure for multimodal information coding determine the upper bound to the quality of communication provided in the chain “soldier-squad-platoon-company” [10]. Unlike traditional telecommunications, transmission of information via PICS is carried out in the form of a set of signals of the modalities corresponding to the main channels of interpersonal communication [11]. The signals of the individual modalities (speech, lip movement, eye movement, movement of the facial muscles, gestures, handwritten keyboard input, or input via sensors), processed in the subscriber terminal of the soldier, are transmitted together through hardware and software means of communication and further along the existing communication channels of data network.
It is obvious that the maximum amount of information is transferred from the subscriber to the subscriber via visual and acoustic communication channels. However, depending on the situation, there is always a technical possibility to allocate bandwidth to transmit additional data by means of efficient compression of the signals of modalities.
The analysis shows that the following contradictory objectives are the most relevant to a multimodal information coding system for wearable devices of advanced uniform:
-
to increase the quality of encoding messages while maintaining one of the speeds of multimodal information transmission;
-
to adaptively reduce the speed under conditions of multimodal information transmission via data networks with varying parameters (e.g., along radio channels) without degradation of the qualitative assessments of encoding messages.
The solution to the formulated tasks will improve the efficiency of interpersonal communication in the chain “soldier-squad-platoon-company” by increasing the amount of information about military personnel state necessary for making adequate managerial decisions during combat operations. However, such an approach is in conflict with the traditional principle of providing communication services and informatization to the users of a data transmission network. It is required to validate scientific and methodical modeling tools, including mathematical ones; to determine potential and ultimate characteristics of PICS for the total number of modalities in each managerial situation; to choose an apparatus for estimating parameters of data network during multimodal information transmission.
2 Model of Multimodal Information Coding System
The synthesis of PICS requires the development of theoretical models to assess the quality of encoding multimodal information with given resources of data transmission network (DTN) and a volume required for transmission of the maximum number of messages of different modalities with the specified quality and rationale for the choice of methods of transmitting such messages based on simulation results.
In view of the existing models of speech and video codecs, a set-theoretic model of a multimodal information coding system with multi-parameter adaptation can be represented as follows (Fig. 1). We consider the parametric coding of redundant messages and propose a universal representation of signals of different modalities in the following form. The input of a multimodal information coding system can be described as the sets of messages from sources of different modalities \( \left\{ {\vec{A}_{w} } \right\} \), \( w = \overline{1,W} . \) The internal sub-system parameters are:
-
(1)
source (modalities) number \( W \);
-
(2)
number \( K_{w} \) of the values \( w = \overline{1,W} \) of a random variable describing the w-th message source in analysis period \( T_{A} \);
-
(3)
mapping type of a parametric analysis \( G_{PA} \), determined by a mode of creation, combination and mapping of a parameters set \( \vec{X}_{w} \) of the analyzed messages \( \vec{A}_{w} \);
-
(4)
number \( p_{w} ,w = \overline{1,W} \) and a representation of coding parameters \( \vec{X}_{w,j} ,j = \overline{{1,p_{w} }} \) of multimodal information sources;
-
(5)
mapping mode \( G_{Kw,j} \), \( w = \overline{1,W} \), \( j = \overline{{1,p_{w} }} \), which determines the quantization procedures of the observed parameters of multimodal information sources;
-
(6)
number \( o_{j} \) of quantization levels for each of the observed parameters of information sources \( \left( {j = \overline{{1,p_{1} + \ldots + p_{W} }} } \right) \), which determine the cardinality of subsets of coding parameters \( \left\{ {C_{j} } \right\} \) at DTN input;
-
(7)
mapping mode of a statistical analysis \( G_{SA} \), which determines the classification procedure of initial messages from \( W \) sources;
-
(8)
number \( H \) of states of user terminal \( \overline{CMT} \), which determines a set of information capacity distribution modes of a communication channel by sets of coding parameters of \( W \) sources, where
$$ \begin{aligned} \overline{CMT} = \left\{ {No_{{G_{K1,1} }} } \right\} \times \ldots \times \left\{ {No_{{G_{K1,p} }} } \right\} \times \ldots \times \left\{ {No_{{G_{KW,1} }} } \right\} \times \ldots \times \left\{ {No_{{G_{KW,pW} }} } \right\} \hfill \\ \quad \quad \quad \times \left\{ {o_{1} } \right\} \times \ldots \times \left\{ {o_{p} } \right\} \times \ldots \times \left\{ {o_{p1 + \ldots + pW - 1} + 1} \right\} \times \ldots \times \left\{ {o_{p1 + \ldots + pW} } \right\} \hfill \\ \end{aligned} $$
where \( No_{{G_{KW,pW} }} \) is a number of a mapping \( G_{Kw,j} \), \( w = \overline{1,W} \), \( j = \overline{{1,p_{1} + \ldots + p_{W} }} \). Values \( o_{j} = 1 \) provide possibility of exclusion of j subspaces from structure of space of coding parameters. In a particular case of one-to-one correspondence between the values \( o_{j} \) and mappings \( G_{Kw,j} \), the set \( \overline{CMT} \) will be as:
.
For a highly adaptive system, the number of possible states of the user terminal can be infinitely large: \( H = \left| {\overline{CMT} } \right| = \infty \).
Given the distortions produced in the encoding and the impact of the communication channel defined by mapping \( G_{CH} \), with known mapping \( G_{DK} \), which uniquely determines a decoding procedure, the mathematical description of the entire coding subsystem relative to the external parameter characterizing the quality of encoding messages of different modalities can be represented as follows:
where \( D_{w} \left[ {\left\{ {\vec{A}} \right\},\left\{ {\hat{\vec{A}}} \right\}} \right] \), \( w = \overline{1,W} \) is the mean square error between the sets of the initial and recovered messages or noise energy at restoring messages of w source; \( \sigma \) is the empirical coefficient that determines the degree of the influence of penalty; \( P_{{{\text{C}}_{w} }} = \sum\limits_{i = 1}^{U} {\vec{A}_{{{\kern 1pt} w{\kern 1pt} i}}^{T} \vec{A}{\kern 1pt}_{{w{\kern 1pt} i}} } \) is message energy of the w-th source.
The above-mentioned list of the internal parameters of a multimodal information coding system allows us to formulate the mathematical model of such a system in relation to the external parameter – transmission speed (information output):
where \( \begin{aligned} r = \left\lceil {\log_{2} \left( {1 + \sum\limits_{i = 1}^{{t_{\text{INT}} }} {C_{n}^{i} } } \right)} \right\rceil \hfill \\ \hfill \\ \end{aligned} \) is a number of check bits, and \( t_{\text{INT}} \) is the correcting ability of an error-correcting code; \( C_{n}^{i} \) is the number of combinations of \( n \) by \( i \); \( d_{\varphi } = \left| {\left\{ {{\kern 1pt} \varphi } \right\}} \right| \) is the cardinality of a sub-space \( \left\{ {{\kern 1pt} \varphi } \right\} \in \left\{ {\,\vec{D}} \right\} \), characterizing the set of error-correcting coding modes.
Thus, the obtained formalisms determine the important internal functional features of DTN taking into account the patterns of transmission of data blocks corresponding to the active modalities. The proposed mathematical model of information modality processing is being introduced in the telecommunication system connecting distributed users and a central control station.
3 Coding System for a Multimodal Speech Signal
The coding system for a multimodal speech signal (Fig. 2), developed based on the presented mathematical model is shown in [12]. A set of subjective evaluations (Table 1) shows that introduction of adaption procedure to multimodal speech coding system ensured a high quality reconstruction of the speech signal for transmission speeds \( \,\left\{ V \right\} \). Furthermore, tests of the subjective listening of the speech signal, encoded (decoded) with the use of algorithms №. 1–3, indicate marked superiority of intelligibility and naturalness of the synthesized speech. They also show superiority of the speaker’s voice recognition in comparison with algorithms on FS1015, FS1017 and FS1016 standards correspondingly; at the same time with the use of algorithm №. 4 they point out quality, comparable to standard Full-rate GSM (13 kbps).
Peak computational complexity \( Q \) of algorithms is calculated with allowance for the need to fulfill the required number of operations in real time for maximum volumes of VQ codebooks. The transition to the adaptive coding demanded substantial (about twice as much) increase in the amount \( W \) of stored information in memory devices due to the need to store the new program segments and additional variants of codebooks. At the heart of improving the quality parameters of the developed algorithms is an in-depth analysis of the speech signal frame and adaption to their parameters, characterized by increased computational complexity of procedures for speech coding. At the hardware level the developed algorithms, in comparison with the similar standard algorithms, require increased efficiency of estimators and additional capacity of memory elements.
4 Conclusion
The wearable and embedded devices of user uniform have limited energy resources that impose constraints on using sensors and methods of data processing and communication. For this reason, the proposed model providing energy optimization during communication via several information modalities is useful for application, and the proposed multimodal speech coding system will be implemented for the organization of military communications in the chain “soldier-squad-platoon-company”. Increasing the number of transmitted modalities will allow solving a range of other important practical problems, for example, improving the quality of identification of psycho-physical state of the soldier and other tasks [13–16].
References
Gregory, F.D., Dai, L.: Multisensory information processing for enhanced human-machine symbiosis. In: Yamamoto, S., Abbott, A.A. (eds.) HIMI 2015. LNCS, vol. 9172, pp. 354–365. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20612-7_34
Goldberg, D.H., Vogelstein, R., Socolinsky, D.A., Wolff, L.B.: Toward a wearable, neurally-enhanced augmented reality system. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) FAC 2011. LNCS, vol. 6780, pp. 493–499. Springer, Heidelberg (2011)
Tao, X.: Handbook of Smart Textiles. Springer, Singapore (2015)
Meng, F., Spence, C.: Tactile warning signals for in-vehicle systems. Accid. Anal. Prev. 75, 333–346 (2015)
White, T.L., Krausman, A.S.: Effects of inter-stimulus interval and intensity on the perceived urgency of tactile patterns. Appl. Ergon. 48, 121–129 (2015)
Ayuso, A.J.R., Lopez-Soler, J.M.: Speech Recognition and Coding: New Advances and Trends. NATO ASI Series, vol. 147. Springer, Berlin (1995). Germany, 464 p.
Karpov, A., Ronzhin, A.: A universal assistive technology with multimodal input and multimedia output interfaces. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part I. LNCS, vol. 8513, pp. 369–378. Springer, Heidelberg (2014)
Karpov, A., Akarun, L., Yalçın, H., Ronzhin, Al., Demiröz, B., Çoban, A., Zelezny, M.: Audio-visual signal processing in a multimodal assisted living environment. In: Proceeding of 15th International Conference INTERSPEECH-2014, Singapore, pp. 1023–1027 (2014)
Karpov, A., Ronzhin, A., Kipyatkova, I.: An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. LNCS, vol. 6762, pp. 454–463. Springer, Heidelberg (2011)
Basov, O.O.: Reasoning of the transition to polymodal infocommunicational systems. In: Distributed Computer and Communication Networks: Control, Computation, Communication. – DCCN-2015, pp. 19–22, October 2015
Saveliev, A., Basov, O., Ronzhin, A., Ronzhin, A.: Algorithms for low bit-rate coding with adaptation to statistical characteristics of speech signal. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 65–72. Springer, Heidelberg (2015)
Balatskaya, L.N., Choinzonov, E.L., Chizevskaya, Svetlana Yu., Kostyuchenko, E.U., Meshcheryakov, R.V.: Software for assessing voice quality in rehabilitation of patients after surgical treatment of cancer of oral cavity, oropharynx and upper jaw. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 294–301. Springer, Heidelberg (2013)
Volf, D., Meshcheryakov, R., Kharchenko, S.: The singular estimation pitch tracker. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 454–462. Springer, Heidelberg (2015)
Karpov, A., Ronzhin, A., Kipyatkova, I.: An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. LNCS, vol. 6762, pp. 454–463. Springer, Heidelberg (2011)
Potapova, R., Komalova, L., Bobrov, N.: Acoustic markers of emotional state “aggression”. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 55–64. Springer, Heidelberg (2015)
Acknowledgments
This work is partially supported by the Russian Foundation for Basic Research (grants № 16-08-00696-a, 15-07-06774-a).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ronzhin, A.L., Basov, O.O., Motienko, A.I., Karpov, A.A., Mikhailov, Y.V., Zelezny, M. (2016). Multimodal Information Coding System for Wearable Devices of Advanced Uniform. In: Yamamoto, S. (eds) Human Interface and the Management of Information: Information, Design and Interaction. HIMI 2016. Lecture Notes in Computer Science(), vol 9734. Springer, Cham. https://doi.org/10.1007/978-3-319-40349-6_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-40349-6_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40348-9
Online ISBN: 978-3-319-40349-6
eBook Packages: Computer ScienceComputer Science (R0)