Buffer Jitter
Buffer Jitter
Buffer Jitter
0 (2010-01)
Technical Report
Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Packet Switched (PS) conversational multimedia applications; Performance characterization of default codecs (3GPP TR 26.935 version 9.0.0 Release 9)
Reference
RTR/TSGS-0426935v900
Keywords
GSM, LTE, UMTS
ETSI
650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88
Important notice
Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp
Copyright Notification
No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2010. All rights reserved. DECT , PLUGTESTS , UMTS , TIPHON , the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. TM 3GPP is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. LTE is a Trade Mark of ETSI currently being registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association.
TM TM TM TM
ETSI
Foreword
This Technical Report (TR) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http://webapp.etsi.org/key/queryform.asp.
ETSI
Contents
Intellectual Property Rights ................................................................................................................................2 Foreword.............................................................................................................................................................2 Foreword.............................................................................................................................................................6 1 2 3
3.1
4
4.1 4.2 4.3
General Overview.....................................................................................................................................9
Introduction ........................................................................................................................................................ 9 Tests over DCH radio channels .......................................................................................................................... 9 Tests over HSDPA/EUL radio channels .......................................................................................................... 10
5
5.1 5.2 5.2.1 5.2.2 5.2.3 5.2.3.1 5.2.3.2 5.2.3.3 5.2.4 5.2.5 5.2.6 5.2.6.1 5.2.6.2 5.3 5.4
6
6.1 6.1.1 6.1.2 6.1.3 6.2
7
7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8
Analysis of test results for DCH channels for Phase 1 and 2 .................................................................23
Conversation Tests ........................................................................................................................................... 23 Experimental Design and Statistical Procedures .............................................................................................. 24 Narrowband Test - Symmetric conditions (Set 1) ............................................................................................ 25 Narrowband Test Asymmetric Conditions (Set 2)......................................................................................... 31 Wideband Test Symmetric Conditions (Set 3) .............................................................................................. 33 Wideband Test Asymmetric Conditions (Set 4) ............................................................................................ 38 Phase 2 - ITU-T Codec Tests (Set 5) ................................................................................................................ 41 Summary of Test Result Analysis .................................................................................................................... 44
ETSI
8.A.6 Test Results ................................................................................................................................................. 47 8.A.7 Delay analysis ............................................................................................................................................. 52 8.A.8 Listening only test conclusions ........................................................................................................................ 58 8.B Conversation Tests ........................................................................................................................................... 58 8.B.1 Introduction................................................................................................................................................. 58 8.B.2 The Test Plan .............................................................................................................................................. 58 8.B.3 Cross-check of Test Lab Results................................................................................................................. 60 8.B.4 Test Results ................................................................................................................................................. 60 8.B.4.1 Mean Scores by Experiment and by Test Lab ....................................................................................... 60 8.B.4.2 Subject Consistency Measures for Test Labs ........................................................................................ 64 8.B.4.3 Multivariate Analysis of Variance (MANOVA) ................................................................................... 64 8.B.4.3.1 MANOVA Results and Statistics .................................................................................................... 65 8.B.4.3.2 Composite Scores Conversational Quality ................................................................................... 67 8.B.4.3.3 Conversational Quality by Experimental Factors ............................................................................ 69 8.B.5 Conversation tests conclusions ................................................................................................................... 73
9
9.1 9.2 9.3 9.4
Conclusions ............................................................................................................................................73
Tests over DCH radio channels ........................................................................................................................ 73 Tests over HSDPA/EUL radio channels; listening only tests ........................................................................... 73 Tests over HSDPA/EUL radio channels; conversation tests ............................................................................ 73 General consideration ....................................................................................................................................... 73
Conversation test composite dependent variable scores by condition and Lab ..............74 Instructions to subjects .........................................................................................................76 Example Scenarios for the conversation test ......................................................................77 Test Plan for the AMR Narrow-Band Packet Switched Conversation Test....................79 Test Plan for the AMR Wide-Band Packet Switched Conversation Test ........................96 Test plan for Packet Switched Conversation Tests for Comparison of Quality Offered by Different Speech Coders .................................................................................113 Test Plan for Global Analysis of PSS Conversation Tests...............................................125 Test Plan for Performance characterisation of VoIMS over HSDPA/EUL channels; listening only tests ...............................................................................................................131
Introduction .................................................................................................................................................... 131 Listening only test conditions ......................................................................................................................... 131 End-to-end delay analysis .............................................................................................................................. 132 Listening only experiments ............................................................................................................................ 132 Test material processing ................................................................................................................................. 133
Annex I:
I.1 I.2 I.2.1 I.2.2
Annex J:
J.1 J.2 J.3 J.4 J.5 J.6 J.7 J.8 J.9
Annex K:
ETSI
Annex L:
L.1 L.2 L.2.1 L.2.2 L.2.3 L.2.4. L.3. L.3.1 L.3.2 L.3.2.1 L.3.2.2 L.3.2.3 L.3.2.4 L.3.3 L.4 L.4.1 L.4.2 L.4.3 L.4.4 L.4.5 L.5 L.5.1 L.5.2 L.5.3 L.5.4 L.5.5 L.5.6 L.5.7 L.5.8 L.5.8.1 L.5.8.2 L.5.8.3 L.5.9 L.5.9.1 L.5.9.2 L.5.A.1 L.5.A.2 L.5.A.3 L.5.A.4
Test Plan for the AMR NB/WB Conversation Test in UMTS over HSDPA/EUL ........154
Introduction .................................................................................................................................................... 154 General Information ....................................................................................................................................... 154 Permanent Documents .............................................................................................................................. 154 Key Acronyms .......................................................................................................................................... 154 Contacts .................................................................................................................................................... 155 Participants ............................................................................................................................................... 155 Test Methodology........................................................................................................................................... 156 Introduction............................................................................................................................................... 156 Test Design ............................................................................................................................................... 157 Description of the Test Bed................................................................................................................. 157 Transmission System .......................................................................................................................... 158 Radio Access Bearers .......................................................................................................................... 159 Test environment................................................................................................................................. 159 Test Conditions ......................................................................................................................................... 160 Test Procedure ................................................................................................................................................ 164 Time Projection ........................................................................................................................................ 164 Instructions to the Subjects ....................................................................................................................... 165 Test Materials ........................................................................................................................................... 166 Deliverables .............................................................................................................................................. 167 Data Analysis ............................................................................................................................................ 167 Working Document for the Performance Characterization of VoIMS over HSDPA/EDCH ......................... 167 Introduction............................................................................................................................................... 167 System Overview ...................................................................................................................................... 167 Radio Access Bearers ............................................................................................................................... 169 Delay ......................................................................................................................................................... 172 RN Simulator ............................................................................................................................................ 173 Core Network............................................................................................................................................ 174 VoIP Client ............................................................................................................................................... 174 Interfaces................................................................................................................................................... 176 Interface 1 ........................................................................................................................................... 177 Interface 2 ........................................................................................................................................... 177 Interface 3 ........................................................................................................................................... 178 Simulated HSPA Air-Interface ................................................................................................................. 178 General Description ............................................................................................................................ 178 Error-Delay Profiles ........................................................................................................................... 179 Network Parameters ............................................................................................................................ 182 Traffic Assumptions (example: AMR 7.95)........................................................................................ 183 Other Assumptions .............................................................................................................................. 184 Simulation Methodology ..................................................................................................................... 185
History ............................................................................................................................................................187
ETSI
Foreword
This Technical Report has been produced by the 3 Generation Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the document.
rd
ETSI
Scope
The present document provides information on the performances of default speech codecs in packet switched conversational multimedia applications. The codecs under test are AMR-NB (Adaptive Multi-Rate Narrowband) and AMR-WB (Adaptive Multi-Rate Wideband). In addition, several ITU-T codecs (G.723.1, G.729, G.722 and G.711) are included in the testing. Experimental test results from the speech quality testing are reported to illustrate the behaviour of these codecs. The results give information of the performance of PS conversational multimedia applications under various operating and transmission conditions (e.g., considering radio transmission errors, IP packet losses, end-to-end delays, and several types of background noise). The performance results can be used e.g. as guidance for network planning and to appropriately adjust the radio network parameters.
References
References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. For a specific reference, subsequent revisions do not apply. For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] ITU-T Recommendation P.800: "Methods for Subjective Determination of Transmission Quality". ITU-T Recommendation P.831: "Subjective performance evaluation of network echo cancellers". ITU-T Recommendation G.711: "Pulse code modulation (PCM) of voice frequencies". ITU-T Recommendation G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraiccode-excited linear-prediction (CS-ACELP)". ITU-T Recommendation G.723.1: "Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s". ITU-T Recommendation G.722: "7 kHz audio-coding within 64 kbit/s". IETF RFC 1889: "RTP: A Transport Protocol for Real-Time Applications". IETF RFC 3267: "Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs". 3GPP TS 34.121: "Terminal Conformance Specification, Radio Transmission and Reception (FDD)" (downlink). 3GPP TS 25.141: " Base Station (BS) conformance testing (FDD)" (uplink). 3GPP TR 25.853 "Delay budget within the access stratum". 3GPP TS 26.235: "Packet switched conversational multimedia applications; Default codecs". 3GPP TS 26.071: "AMR speech Codec; General description". 3GPP TS 26.171: "AMR speech codec, wideband; General description". 3GPP TS 25.322: "Radio Link Control (RLC) protocol specification". IETF RFC 3095: "RObust Header Compression (ROHC): Framework and four profiles: RTP, UDP, ESP, and uncompressed".
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
ETSI
3GPP TS 34.108: "Common test environments for User Equipment (UE) conformance testing". ETSI TR 101 112: "Universal Mobile Telecommunications System (UMTS); Selection procedures for the choice of radio transmission technologies of the UMTS" (UMTS 30.03 v3.1.0). 3GPP TS 26.114 : IP Multimedia Subsystem (IMS); Multimedia Telephony; Media handling and interaction ITU-T Recommendation P.805 (P.CONV): Subjective evaluation of conversational quality
3
3.1
Abbreviations
Abbreviations
Adaptive Multi-Rate Narrowband Speech Codec Adaptive Multi-Rate Wideband Speech Codec Analysis of Variance ASYmmetric conditions Block Error Rate Cumulative Distribution Function Codec Mode Request Test CONDitions Core Network Conversational Quality Cyclic Redundancy Check Dedicated Channel Downlink Degradation Mean Opinion Score Dedicated Physical Channel Dedicated Traffic Channel Ratio of energy per modulating bit to the noise spectral density Error Insertion Device Frame Erasure Rate, Frame Error Rate Global Analysis Laboratory Global Quality (of the conversation) High Mobility High Traffic High Speed Downlink Packet Access/Enhanced UpLink InterAction (with your partner) Internet Protocol International Telecommunication Union - Telecommunications Standardization Sector Jitter Buffer Management Listening LABoratory Low Mobility LowTraffic Medium access control Multivariate Analysis of Variance Logarithmic Maximum A Posteriori Mean Opinion Score Narrowband PerCeption of impairments (also: Personal Computer) Packet Data Convergence Protocol Protocol Data Unit Sound Pressure Level (in Pascal) Packet Loss Packet Loss Concealment Radio Conditions Packet Switched
For the purposes of the present document, the following abbreviations apply: AMR-NB (or AMR) AMR-WB ANOVA ASY BLER CDF CMR COND CN CQ CRC DCH DL DMOS DPCH DTCH Eb/No EID FER GAL GQ HM HT HSDPA/EUL IA IP ITU-T JBM LAB LM LT MAC MANOVA Log-MAP MOS NB PC PDCP PDU Pa PL plc RC PS
ETSI
RB RAB RCV RLC ROHC RRM RTCP RTP SYM TB size TF ToC TrCH TTI UDP UE UL UM UMD US VOIP VQ WB XMIT
Radio Bearer Radio Access Bearer Receive Radio Link Control Robust Header Compression Radio Resource Management Real-Time Control Protocol Real-time Transport Protocol SYMmetric conditions Transport Block size Transport Format Table of Content Transmission Channel Transmission Time Interval User Datagram Protocol User Equipment Uplink Unacknowledged Mode Unacknowledged Mode Data difficulty UnderStanding (your partner) Voice over IP Voice Quality (of your partner) Wideband Transmit
4
4.1
General Overview
Introduction
The performance of default speech codecs (AMR-NB and AMR-WB) for packet switched conversational multimedia [12, 19] was characterised over DCH channels and over HSDPA/EUL radio channels. The testing over DCH channels was carried out from October 2003 until February 2004. Further subjective testing was carried out from June until October 2007 in order to characterize the performance over HSDPA/EUL radio channels. The main purpose of the latter testing was to evaluate and verify adequate performance of the AMR-NB and AMR-WB speech codecs used as defined in IMS Multimedia Telephony TS 26.114 [19] with a specific focus on jitter buffer management.
4.2
The tests over DCH channels were separated into two phases: Phase 1 considered the default speech codecs AMR-NB [13] and AMR-WB [14] in various operating conditions. Phase 2 considered also several other codecs including ITU-T codecs G.723.1 [5], G.729 [4], G.722 [6] and G.711 [3]. In Phase 1, France Telecom R&D acted as host laboratory. The subjective testing laboratories were ARCON for the North American English language, France Telecom R&D for the French language and NTT-AT for the Japanese language. Phase 1 tests consisted of 24 test conditions both for the AMR codec (modes 6.7 and 12.2 kbit/s) and the AMR-WB codec (modes 12.65 and 15.85 kbit/s) with error conditions covering both IP packet loss of 0% and 3% and 2 3 -4 radio conditions with 10 , 10 and 5 10 BLER (Block Error Rate). End-to-end delays of 300 and 500 ms were covered. Robust Header Compression (ROHC), an optional UMTS functionality, was included for some test cases for AMR-WB. Three types of background noise were used: car, street and cafeteria. In Phase 2, France Telecom R&D acted as host and listening laboratory. Two languages were used (French and Arabic). The following codecs were tested: AMR-NB (modes 6.7 and 12.2 kbit/s), AMR-WB (modes 12.65 and 15.85 kbit/s), ITU-T G.723.1 (mode 6.4 kbit/s), ITU-T G.729 (mode 8 kbit/s), ITU-T G.722 (mode 64 kbit/s) and ITU-T G.711 (64 kbit/s). Transmission error conditions covered IP packet loss of 0% and 3%.
ETSI
10
Siemens provided the real time air interface simulator for the Phase 1. France Telecom provided the IP core network simulator and terminal simulator used in both experiments Phase 1 and Phase 2. IPv6 was employed in the testing. (IPv6 is fully simulated over the radio interface. The CN simulator employs IPv4 but since the only impact is a marginal difference in the end-to-end delay - of the order of ~16 s - the use of a particular IP-version in CN part has no impact on the performance results.) These tests were the first ever conversational tests conducted in any standardization body. Performance evaluation consisted of assessment of 5 aspects: 1) voice quality, 2) difficulty of understanding words, 3) quality of interaction, 4) degree of impairments, and 5) global communication quality. A 5-category rating scale was used for each aspect. Dynastat performed the global analysis for Phases 1 and 2. The results are contained in Clause 7.
4.3
These listening-only tests characterized the performance of AMR-NB and AMR-WB speech codecs over HSDPA/EUL channels when conducting buffer adaptation to the network delay variations using a simple jitter buffer management (JBM) algorithm. The tests focused on the effect of channel errors and channel jitter to speech quality instead of the impact of overall end-to-end delay in speech conversation. The end-to-end delay impact was considered separately by conducting a delay analysis on the whole processed test material. The subjective listening-only tests were conducted in Finnish and Swedish languages at Nokia and Ericsson, respectively. The tests consisted of eight different channel conditions in clean speech and in background noise conditions. AMR-NB was tested in 12.2 and 5.9 kbit/s modes, and AMR-WB at 12.65 kbit/s. The outstanding issue was to evaluate the performance of adaptive JBM operation in HSDPA/EUL channel conditions. The applied adaptive jitter buffer was a simple implementation conducting buffer adaptation mainly during discontinuous transmission, i.e. speech pauses, and not using any time scaling operation. A non-implementable fixed jitter buffer with the full a priori knowledge on the channel characteristics was used as a reference. Although the average end-to-end delays of both adaptive and fixed jitter buffers were the same, the number and locations of jitter buffer induced frame losses were different depending on the channel conditions. The results are contained in Clause 8A. A program of Conversation Tests was organized to evaluate the performance of AMR-NB and AMR-WB for UMTS over HSDPA/EUL. Three test labs were contracted to conduct the conversation tests and deliver raw voting data to Dynastat, the Global Analysis Lab (GAL), for processing and statistical analysis. Three conversation tests were conducted in each of three test labs. The test labs were FTRD, testing in the French language, BIT, testing in the Chinese language, and Dynastat, testing in North American English. Each of the three conversation tests involved a different speech codec: Exp.1 - AMR operating at 5.9k bps Exp.2 - AMR operating at 12.2k bps Exp.3 - AMR-WB operating at 12.65k bps The experiments were conducted according to specifications contained in the ITU-T Recommendation for Conversation Testing, P.805 [20]. Alcatel-Lucent provided the network impairment simulation test-bed. The raw voting data for each test lab and each Experiment was delivered to the GAL. The GAL conducted statistical analyses on the raw voting data and the results of those analyses are contained in Clause 8B.
This section describes the test plan for the Phase 1 of the conversation test of the AMR-NB (AMR) and AMR-WB in PS networks. All the laboratories participating to this conversation test phase used the same test plan, just the language of the conversation changed. Even if the test rooms or the test equipments are not exactly the same in all the laboratories, the calibration procedures and the tests equipment characteristics and performance guaranteed the similarity of the test conditions. Annex B contains the instructions for the subjects participating to the conversation tests.
ETSI
11
5.1
Test methodology
The protocol described below evaluates the effect of degradation such as delay and dropped packets on the quality of the communications. It corresponds to the conversation-opinion tests recommended by the ITU-T P.800 [1]. First of all, conversationopinion tests allow subjects passing the test to be in a more realistic situation, close to the actual service conditions experienced by telephone customers. In addition, conversation-opinion tests are suited to assess the effects of impairments that can cause difficulty while conversing (such as delay). Subjects participate to the test by couple; they are seated in separate sound-proof rooms and are asked to hold a conversation through the transmission chain performed by means of UMTS simulators, and communications are impaired by means of an IP impairments simulator part of the CN simulator and by the air interface simulator, as Figure 1 describes it. The network configurations (including the terminal equipments) are symmetrical (in the two transmission paths). The only dissymmetry will be due to presence of background noise in one of the test rooms.
5.2
5.2.1
Test arrangement
Description of the testing system
IP Network
Terminal 1 Terminal 5
Packets send To 5
IP Network Perturbations
Packets send To 1
ETSI
12
The PS audio communication has been simulated using 5 PCs as shown in Figure 2.
Figure 2: Simulation Platform PC 1 and PC 5 are running under Windows OS with the VOIP Terminal Simulator Software of France Telecom R&D. PC 2 and PC 4 run under Linux OS with the Air Interface Simulator coming from Siemens AG. And PC 3 runs under WinNT OS with Network Simulator Software (NetDisturb). The platform simulates a PS interactive communication between two users using PC 1 and PC 5 as their relative VOIP terminals. PC 1 sends AMR (or AMR-WB) encoded packets that are encapsulated using IP/UDP/RTP headers to PC 5. PC 1 receives IP/UDP/RTP audio packets from PC 5. In fact, the packets created in PC 1 are sent to PC 2. PC 2 simulates the air interface uplink (UL) transmission and then forwards the transmitted packets to PC 4. In the same way, PC 4 simulates the air interface downlink (DL) transmission and then forwards the packets to PC 5. PC 5 decodes and plays the speech back to the listener.
5.2.2
Network simulator
The core network simulator, as implemented, works under IPv4. However, as the core network simulator acts only on packets (loss, delay,) the use of IPv4 or IPv6 is equivalent for this test conversation context. Considering the networks perturbations introduced by the simulator and the context of the interactive communications, the simulation using IPv4 perturbation network simulator is adapted to manage and simulate the behaviours of an IPv6 core network. Figure 3 shows the possible network simulator parameters that can be modified.
ETSI
13
Figure 3: IP simulator interface On both links, one can choose delay and loss laws. Both links can be treated separately or in the same way. For example, delay can be set to a fixed value but can also be set to another law such as exponential law. Only loss law and delay law were given values, for delay law the values are 0 or 200 ms and for loss law the possible values: 0% or 3% under bursty law. Both links were treated in the same way.
5.2.3
The transmission of IP/UDP/RTP/AMR (or AMR-WB) packets over the UMTS air interface is simulated using the RAB described in Section 5.2.3.1. The required functions of the RLC layer are implemented according to [15] and work in real-time. The underlying Physical Layer is simulated offline. Error patterns of block errors (i.e. discarded RLC PDUs) are inserted in the real-time simulation as described in Section 5.2.3.2. For more details on the parameter settings of the Physical Layer simulations see Section 5.2.3.3.
5.2.3.1
For the narrowband conversational tests, the AMR is encoded with a maximum of 12.2 kbit/s. The bitstream is encapsulated using IP/UDP/RTP protocols. The air interface simulator receives IPv4 packets from the CN simulator. The RTP packets are extracted and before transmission over the air interface, IPv6/UDP headers are inserted. Finally real IPv6 packets are transmitted over the air interface simulator. The payload format is the following: RTP payload format for AMR-NB (cf. [8]) is used; Bandwidth efficient mode is used; One speech frame is encapsulated in each RTP packet; Interleaving is not used; The payload header consists of the 4 bits of the CMR (Codec Mode Request). Then 6 bits are added for the ToC (Table of Content). For IPv4, this corresponds to a maximum of 72 bytes per frame that is to say 28.8 kbit/s. This goes up to 92 bytes (36.8 kbit/s) when using IPv6 protocol on the air interface.
ETSI
14
RTCP packets are sent. However, in the test conditions defined in the conversation test plans, RTCP is not mandatory, as it is not in a multicast environment (cf. [7]). RTCP reports were sent but not used. ROHC is an optional functionality in UMTS. In order to reduce the size of the tests and the number of conditions, the ROHC algorithm is not used for the AMR-NB conversation test. This functionality is only tested in the wideband condition. For the WB conversational tests, the AMR-WB encodes speech at a maximum of 15.85 kbit/s. The bitstream is also encapsulated and transmitted in the same way as for the NB case. For IPv4 a maximum of 81 bytes (41 bytes for the AMR and its payload header plus the 40 bytes of the IP/UDP/RTP headers) per frame are transmitted that is to say 32.4 kbit/s, this goes up to 101 bytes (40.4 kbit/s) when using IPv6 protocol on the air interface. ROHC algorithm is supported in the AMR-WB conversation test, for the 12.65 kbit/s mode and the 15.85 modes. Header compression is done on the IP/UDP/RTP headers (profile 1). ROHC starts in the unidirectional mode and switches to bi-directional mode as soon as a packet has reached the decompressor and replied with a feedback packet indicating that a mode transition is desired. The Conversational / Speech / UL:46 DL:46 kbps / PS RAB coming from [17] was used. It is not an optimal RAB for PS conversational test but it was the only one available at the time the test bed and the air interface simulator were designed. The RAB description is given in Table 1. Table 1: RAB description
Higher layer PDCP RLC RAB/Signalling RB PDCP header size, bit Logical channel type RLC mode Payload sizes, bit Max data rate, bps UMD PDU header, bit MAC header, bit MAC multiplexing TrCH type TB sizes, bit TFS TF0, bits TF1, bits TF2, bits TF3, bits TTI, ms Coding type CRC, bit Max number of bits/TTI after channel coding Uplink: Max number of bits/radio frame before rate matching RM attribute RAB 8 DTCH UM 920, 304, 96 46000 8 0 N/A DCH 928, 312, 104 0x928 1x104 1x312 1x928 20 TC 16 2844 1422 180-220
MAC Layer 1
5.2.3.2
The UMTS air interface simulator (implemented in PC 2 and 4) receives IP/UDP/RTP/AMR (or AMR-WB) packets on a specified port of the network card (see Figure 4). The IP/UDP/RTP/AMR (or AMR-WB) packets are given to the transmission buffer of the RLC layer, which works in Unacknowledged Mode (UM). The RLC segments or concatenates the IP bitstream in RLC PDUs, adding appropriate RLC headers (sequence number and length indicators). It is assumed that always Transport Format TF 3 is chosen on the physical layer, providing an RLC PDU length including header of 928 bits. In the regular case, one IP packet is placed into an RLC PDU that is filled up with padding bits. Due to delayed packets from the network simulator it may also occur that there are none or no more than one IP packet in the RLC transmission buffer to transmit in the current TTI. Each TTI of 20ms, an RLC PDU is formed. It is then given to the error insertion block that decides if the RLC PDU is transmitted successfully over the air interface or if it is discarded due to a block error after channel decoding. The physical layer is not simulated in real time, but error pattern files are provided. The error patterns of the air interface transmission are simulated offline according to the settings given in Section 5.2.3.1. They consist of binary decisions for each transmitted RLC PDU, resulting in a certain BLER.
ETSI
15
After the error pattern insertion, the RLC of the air interface receiver site receives RLC PDUs in the reception buffer. The sequence numbers of the RLC headers are checked to detect when RLC PDUs have been discarded due to block errors. A discarded RLC PDU can result in one or more lost IP packets, resulting in a certain packet loss rate of the IP packets and thereby in a certain FER of the AMR (or AMR-WB) frames. The IP/UDP/RTP/AMR (or AMR-WB) packets are reassembled and transmitted to the next PC. This PC is either the network simulator (PC 3) in case of uplink transmission, or is one of the terminals (PC 1 or PC 5) in case of downlink transmission.
RLC Layer Reassembly Transmission Buffer IP/UDP/RTP/AMR packet IP/UDP/RTP/AMR packets Remove RLC header
5.2.3.3
The parameters of the physical layer simulation were set according to the parameters for a DCH in multipath fading conditions given in [9] for the downlink and [10] for the uplink. The TB size is 928 bits and the Turbo decoder uses the Log-MAP algorithm with 4 iterations. The rake receiver has 6 fingers at 60 possible positions. The different channel conditions given in Tables 2, 3 and 4 were extracted from [18] (Selection procedures for the choice of radio transmission technologies of the UMTS).
ETSI
16
Table 5 (DL) and Table 6 (UL) show approximate results of the air interface simulation for corresponding to the considered BLERs. Table 5: Downlink performance - approximate
Channel
I = 9 dB) Indoor, 3 km/h ( I or oc I = 9 dB) Outdoor to Indoor, 3 km/h ( I or oc I = -3 dB) Vehicular, 50 km/h ( I or oc I = -3 dB) Vehicular, 120 km/h ( I or oc
-2
-2
-3
-4
ETSI
17
Table 6: Uplink performance - approximate Eb/N0 for the different channels and BLER
Channel Indoor, 3 km/h Outdoor to Indoor, 3 km/h Vehicular, 50 km/h Vehicular, 120 km/h 5*10 3.9 dB 3.7 dB -0.9 dB 0.2 dB
-2
-3
-4
Outdoor to Indoor channel was used for uplink and downlink in the simulations.
5.2.4
To avoid echo problems headsets were used instead of handsets. The monaural headsets are connected to the sound cards of the PCs supporting the speech codec simulators. The sound level in the earphones can be adjusted, if needed, by the users. But, in practice, the original settings, defined during the preliminary tests, and producing a comfortable listening level, are not modified. The microphones are protected by a foam ball in order to reduce the "pop" effect. It is also suggested to the user to avoid placing the acoustic opening of the microphone in front of the mouth.
5.2.5
Test environment
Each of the two subjects participating to the conversations are installed in a test room. They sit on an armchair, in front of a table. The test rooms are acoustically insulated. All the test equipments are installed in a third room, connected to the test rooms. When needed, the background noise is generated in the appropriate test room through a set of 4 loudspeakers. The background noise level is adjusted and controlled by a sound level meter. The measurement microphone, connected to the Sound level meter is located at the equivalent of the center of the subject's head. The noise level is A weighted.
5.2.6
5.2.6.1
Before the beginning of a set of experiment, the end-to-end transmission level is checked subjectively, to ensure that there is no problem. If it is necessary to check the speech level following procedure is applied. An artificial mouth placed in front of the microphone of the Headset A, in the LRGP position (see ITU-T Rec. P.64), generates in the artificial ear (according to ITU-T Rec. P57), coupled to the earphone of the Headset B, the nominal level. If necessary, the level is adjusted with the receiving volume control of the headset. Similar calibration is done by inverting headsets A and B.
5.2.6.2
Delay
The overall delay (from the input of sound card A to the output of sound card B) is calculated as shown: On the air interface side, the simulator only receives packets on its network card, processes them and transmits every 20 ms these packets to the following PC. Only processing delay and a possible delay due to a jitter can be added (a packet arrives just after the sending window of the air interface). The delay is calculated as shown below:
Encoder side: delay due to account framing, look-ahead, processing and packetization = 45ms Uplink delay between UE and Iu: 84.4 ms (see [11]) Core network delay: a few ms Routing through IP: depending on the number of routers. Downlink delay between Iu and UE: 71.8 ms (see [11]) Decoder side, taking into account jitter buffer, de-packetization and processing: 40 ms
ETSI
18
5.3
Tables 7 - 9 summarise the test conditions used for AMR-NB testing. For both AMR-NB and AMR-WB codecs two representative modes were chosen for the testing. The lowest codec modes (such as AMR-NB 4.75) were not included since these are intended to be used mainly temporarily to cope with poor radio conditions. They were expected to provide insufficient quality for conversational applications if used throughout the call (as done in these characterisation tests).
Table 7: Test conditions for AMR-NB
Cond. Background noise in Room A Background noise in Room B Radio cond. (BLER) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 No No No No No No No No No No No No No No No No No No Car No Cafeteria No Street No No No No No No No No No No No No No No No No No No No No Car No Cafeteria No Street 10 10 10 10 10 10 10 10 10 10 10 10
2 2 2 2 2 2 -3 -3 -3 -3 -3 -3 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
Experimental factors
Mode + delay
6.7 kbit/s (delay 300 ms) 12.2 kbit/s (delay 500 ms) 12.2 kbit/s (delay 300 ms) 6.7 kbit/s (delay 300 ms) 12.2kbit/s (delay 500 ms) 12.2 kbit/s (delay 300 ms) 6.7 kbit/s (delay 300 ms) 12.2 kbit/s (delay 500 ms) 12.2 kbit/s (delay 300 ms) 6.7 kbit/s (delay 300 ms) 12.2 kbit/s (delay 500 ms) 12.2 kbit/s (delay 300 ms) 6.7kbit/s (delay 300 ms) 12.2kbit/s (delay 500 ms) 12.2 kbit/s (delay 300 ms) 6.7kbit/s (delay 300 ms) 12.2 kbit/s (delay 500 ms) 12.2 kbit/s (delay 300 ms) 12.2 kbit/s (delay 300 ms) 12,2 kbit/s (delay 300 ms) 6.7 kbit/s (delay 300 ms) 6.7 kbit/s (delay 300 ms) 12.2kbit/s (delay 500 ms) 12.2kbit/s (delay 500 ms)
5 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10
ETSI
19
5.4
Mode 12,65 kbit/s, ROHC 12,65 kbit/s 15,85 kbit/s, ROHC 12,65 kbit/s, ROHC 12,65 kbit/s 15,85 kbit/s, ROHC 12,65 kbit/s, ROHC 12,65 kbit/s 15,85 kbit/s, ROHC 12,65 kbit/s, ROHC 12,65 kbit/s 15,85 kbit/s, ROHC 12,65 kbit/s, ROHC 12,65 kbit/s 15,85 kbit/s, ROHC 12,65 kbit/s, ROHC 12,65 kbit/s 15,85 kbit/s, ROHC
5. 10 5. 10 5. 10 5. 10 5. 10 5. 10
ETSI
20
Experimental factors
Mode
12,65 kbit/s, ROHC 12,65 kbit/s, ROHC 12,65 kbit/s 12,65 kbit/s 15,85 kbit/s, ROHC 15,85 kbit/s, ROHC
6
-
The Phase 2 of the listening test was conducted by one listening test laboratory (FT R&D). The different speech coders used in this test are:
ETSI
21
As there is no standardized packet loss concealment for G.711 and G.722, proprietary packet loss concealment algorithms were used for them. The simulated network was tested under two values of IP packet loss (0% and 3%). The testing was done in one test laboratory only, but in two different languages (Arabic and French). The IP packet contains 20 ms speech frames except for G.723.1 for which IP packet contains 30 ms speech. For G.729 the 20 ms packet consists of two 10 ms frames. The test methodology was the same as the one applied in Phase 1. Annex B contains the instructions for the subjects participating to the conversation tests.
6.1
6.1.1
Test arrangement
Description of the proposed testing system
ETSI
22
PC 1 and PC 5 run under Windows OS with VOIP Terminal Simulator Software of France Telecom R&D. PC 3 run under WinNT OS with Network Simulator Software (NetDisturb). The platform simulates a packet switched interactive communication between two users using PC 1 and PC 5 as their relative VOIP terminals. PC 1 sends encoded packets that are encapsulated using IP/UDP/RTP headers to PC 5. PC 1 receives these IP/UDP/RTP audio packets from PC 5.
6.1.2
Network simulator
The core network simulator is the same as the one presented in Section 5. The different parameters that can be modified are presented in Figure 3 (Section 5.2.2). In this test, only "loss law" has two values, all the others settings are fixed. On both links, one can choose delay and loss laws. Both links can be treated separately or in the same way. For example, delay can be set to a fixed value but it can also be set to another law such as exponential law. Only loss law was given values: 0% or 3% under bursty law. Both links were treated in the same way. Headsets were here also used to reduce echo problems. The monaural headsets are connected to the sound cards of the PCs supporting the different codecs. The sound level in the earphones can be adjusted, if needed, by the users. But, in practice, the original settings, defined during the preliminary tests, and producing a comfortable listening level, are not modified. The microphones are protected by a foam ball in order to reduce the "pop" effect. It is also suggested to the user to avoid placing the acoustic opening of the microphone in front of the mouth. The same test environment as in test Phase 1 is used. Each of the two subjects participating to the conversations are installed in a test room. They sit on an armchair, in front of a table. The test rooms are acoustically insulated. All the test equipments are installed in a third room, connected to the test rooms. The background noise level is checked by a sound level meter. The measurement microphone, connected to the Sound level meter is located at the equivalent of the center of the subject's head. The noise level is A weighted.
6.1.3
The speech level checking is done in the same way as for Phase 1 (see Section 5.2.6.1). The overall delay (from the input of sound card A to the output of sound card B) is adjusted for each test condition taking into account the delay of the related codec in order to have a fixed delay around 250ms. This value of 250ms is close to the hypothetical delay computed for AMR-NB and AMR-WB through the UMTS network.
ETSI
23
6.2
Test Conditions
Table 14: Test conditions
Cond. Experimental factors IP conditions Mode (Packet loss ratio) 0% 0% 0% 0% 0% 0% 0% 0% 3% 3% 3% 3% 3% 3% 3% 3% AMR-NB 6,7kbit/s AMR-NB 12,2 kbit/s AMR-WB 12,65 kbit/s AMR-WB 15,85 kbit/s G. 723.1 6,4 kbit/s G.729 8 kbit/s G.722 64 kbit/s + plc G.711 + plc AMR-NB 6,7kbit/s AMR-NB 12,2 kbit/s AMR-WB 12,65 kbit/s AMR-WB 15,85 kbit/s G. 723.1 6,4 kbit/s G.729 8 kbit/s G.722 64 kbit/s + plc G.711 + plc
The test conditions and details are described in Tables 14 and 15.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Listening Level Listeners Groups Rating Scales Languages Listening System Listening Environment
1 32 16 5 2 1
79 dBSPL Nave Listeners per language 2 subjects/group French, Arabic Monaural headset (flat response in the audio bandwidth of interest: 50Hz-7kHz). The other ear is open. Room Noise: Hoth Spectrum at 30dBA (as defined by ITU-T Recommendation P.800: Annex A, section A.1.1.2.2.1 Room Noise, with table A.1 and Figure A.1)
This section presents the Global Analysis of the results. The analysis work was performed by Dynastat in its function as the Global Analysis Laboratory (GAL). Annex G presents the GAL Test Plan for characterizing the results of the conversation tests. (Detailed test plans are given in Annexes D and E for Phase 1 and in Annex F for Phase 2). It should be noted that this is the first instance in any standardisation body of conversation tests being used to characterize the performance of standardized speech codecs, and the first instance of codecs in 3GPP being characterized for packet-switched networks. Moreover, the analyses reported in this document represent a new approach to evaluating the results of conversation tests.
7.1
Conversation Tests
The Phase 1 test plan describes the methodology for conducting the conversation tests. In general, the procedure involved a pair of subjects located in different rooms and communicating over a simulated packet-switched network. The subjects were involved in a task, which required them to communicate in order to solve a specific problem. At the end of their task, each subject was required to rate various aspects of the quality of their conversation. Each of these
ETSI
24
ratings involved a five-point scale with descriptors appropriate to the aspect of the conversation being rated. Table 16 shows a summary of the five rating scales. (The first row in each column shows the scale abbreviation that will be used throughout this report).
Table 16: Summary of Rating Scales used in the Conversation Tests
VQ Voice Quality of your partner 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad US Difficulty Understanding your partner 5 Never 4 Rarely 3 Sometimes 2 Often 1 All the time IA Interaction with your partner 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad PC Perception of impairments None Not disturbing Slightly disturbing Disturbing Very Disturbing GQ Global Quality of the conversation 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad
5 4 3 2 1
Since each subject makes five ratings for each condition, there are five dependent variables involved in analyses of the response data. We would expect the ratings on the scales in Table 16 to show some degree of inter-correlation across test conditions. If, in fact, all five were perfectly correlated then we would conclude that they were each measuring the same underlying variable. In this scenario, we could combine them into a single measure (e.g., by averaging them) for purposes of statistical analyses and hypothesis testing. If, on the other hand, the ratings were uncorrelated, we would conclude that each scale is measuring a different underlying variable and should be treated separately in subsequent analyses. In practice, the degree of intercorrelation among such dependent variables usually falls somewhere between these two extremes. Multivariate Analysis of Variance (MANOVA) is a statistical technique designed to evaluate the results of experiments with multiple dependent variables and determine the nature and number of underlying variables. MANOVA was proposed in the GAL test plan for the conversation tests and was used extensively in the analyses presented in this report.
7.2
The two Phase 1 test plans, AMR Narrowband (AMR-NB) and AMR Wideband (AMR-WB), described similar experimental designs, each experiment involving 24 test conditions (COND) and 16 pairs of subjects. The test plans also specified that the experiments would be conducted by three Listening Laboratories (LAB), each in a different language: Arcon for North American English, NTT-AT for Japanese, and France Telecom for French. Of the 24 conditions in both the NB and WB experiments, 18 were described as Symmetrical conditions (SYM), six as Asymmetrical (ASY). In the SYM conditions all subjects were located in a Quiet room, i.e., with no introduced background noise. The six ASY conditions were actually three pairs of conditions where one subject in each conversation-pair was located in a noisy background and the other subject was in the quiet. The data from these sets of paired conditions were sorted to effect a comparison of sender in noise/receiver in quiet and sender in quiet/receiver in noise for the three conditions involving noise in the rooms. The Phase 2 test plan described a single experiment involving 16 conditions conducted by one listening lab (France Telecom) but in two languages, French and Arabic. For purposes of the GAL, the data from the three experiments, Phase 1-NB, Phase 1-WB, and Phase 2 were separated into five Sets of conditions for statistical analyses: Set 1. Phase 1 - NB/SYM conditions (1-18) Set 2. Phase 1 - NB/ASY conditions (19-24) Set 3. Phase 1 - WB/SYM conditions (1-18) Set 4. Phase 2 - WB/ASY conditions (19-24) Set 5. Phase 2 - Ph2 conditions (1-16) For each of these five set of conditions, a three-step statistical process was undertaken to attempt to simplify the final analyses and arrive at the most parsimonious and unambiguous statistical method for characterizing the results of the conversation tests. These procedures involved the following steps:
ETSI
25
Step 1)
Compute an intercorrelation matrix among the dependent variables for the Set of conditions. Substantial inter-correlation among the dependent variables (i.e., correlation coefficients > .50 or < -.50) indicates that the number of dependent variables can be reduced - that there is a reduced set of underlying variables accounting for the variance in the dependent variables. Conduct a MANOVA on the Set of scores for the effects of conditions (COND) in the Set, (18 COND for Set 1, 6 COND for Set 2, etc.) ignoring other factors. The MANOVA procedure determines the linear combination of the dependent variables that best separates the linear combination of the independent variable, i.e., COND. The initial linear combination of dependent variables is the root that accounts for maximum variance in the independent variables - it also represents the first underlying variable. A Chisquare test is conducted to determine the significance of the root. Subsequent roots are also extracted from the residual variance and tested with Chi-square for significance with each subsequent root being orthogonal to the preceding root. The number of significant roots indicates the number of significant underlying variables that account for the variance in the dependent variables. If there is only one significant root for the COND effect, the Canonical coefficients for that root are used to compute a weighted average of the dependent variables to estimate the underlying variable. This composite dependent variable is then used in a univariate ANOVA to test the factors involved in the experiment. Such ANOVAs will produce results that are more parsimonious and less complicated than presenting the results in the multi-dimensional space which would be necessary with multiple dependent variables.
Step 2)
Step 3)
7.3
Table 18 shows the 1 to 18 test conditions involved in the NB symmetric condition conversation tests. Also shown in the table are the Mean scores for each rating scale by condition and by listening lab. Each score shown in the table is the average of ratings from 32 subjects. The first step in the process described in the previous section is to examine the inter-correlations among the dependent variables for indications of underlying variables. Table 17 shows the inter-correlation matrix of the five dependent variables for the NB/SYM conditions. Absolute values of correlation above .50 have been bolded in the table. The table shows a high degree of inter-correlation among the dependent variables indicating the presence of a reduced set of underlying variables.
Table 17: Intercorrelations Among the Dependent Variables for the NB/SYM Conditions
NB/S VQ US IA PC GQ
IA
PC
GQ
1 0.56 0.47
1 0.69
The second step in the analysis is designed to determine how many underlying variables account for the variance in the five dependent variables. MANOVA for the effects of COND was conducted on the NB/SYM data conditions 1-18. Table 19 summarizes the results of the MANOVA analysis. The table contains two sections. The top section shows the analysis for the main effect of COND. It includes the results of univariate ANOVAs for each of the five dependent variables followed by results for the Multivariate-ANOVA (i.e., the MANOVA) for the combination of dependent variables. In Table 19 we can see that the COND main effect is highly significant for each of the five individual dependent variables in the univariate ANOVAs as well as for the combination of dependent variables
ETSI
26
Table 18: Test Conditions and Mean Scores for each Condition and for each Lab for the Narrowband Experiment
Narrowband - Experimental Parameters Cond Rm-A Rm-B RC PL Mode 2 1 Quiet Quiet 0 6.7 10 2 2 Quiet Quiet 0 12.2 10 2 3 Quiet Quiet 0 12.2 10 4 Quiet Quiet 10 2 3 6.7 2 5 Quiet Quiet 3 12.2 10 2 6 Quiet Quiet 3 12.2 10 3 0 6.7 7 Quiet Quiet 10 3 8 Quiet Quiet 0 12.2 10 3 0 12.2 9 Quiet Quiet 10 3 10 Quiet Quiet 3 6.7 10 3 11 Quiet Quiet 3 12.2 10 3 12 Quiet Quiet 3 12.2 10 -4 13 Quiet Quiet 0 6.7 5 x 10 -4 14 Quiet Quiet 5 x 10 0 12.2 -4 0 12.2 15 Quiet Quiet 5 x 10 16 Quiet Quiet 5 x 10-4 3 6.7 -4 3 12.2 17 Quiet Quiet 5 x 10 -4 18 Quiet Quiet 3 12.2 5 x 10 -4 19 Car Quiet 3 12.2 5 x 10 3 12.2 20 Quiet Car 5 x 10-4 -4 21 Cafeteria Quiet 0 6.7 5 x 10 22 Quiet Cafeteria 5 x 10-4 0 6.7 -4 23 Street Quiet 0 12.2 5 x 10 -4 0 12.2 24 Quiet Street 5 x 10 Voice Quality Arcon FT NTT 3.47 3.81 3.28 3.50 3.81 3.06 3.81 3.63 3.47 3.25 3.22 2.75 3.44 3.38 2.84 3.41 3.63 3.16 3.91 4.16 3.41 3.72 4.22 3.59 4.00 4.56 3.47 3.28 3.66 3.16 3.75 3.84 3.19 3.50 3.91 3.41 3.91 4.25 3.59 3.97 4.34 3.50 4.03 4.44 4.03 3.63 3.84 3.19 3.66 3.88 3.22 3.56 3.75 3.25 3.16 3.63 2.88 3.81 3.88 3.50 3.69 4.06 3.13 3.97 4.31 3.53 3.66 4.03 3.25 3.84 4.19 3.53 Understanding Arcon FT NTT 3.94 4.06 4.34 4.16 4.16 4.09 4.16 3.94 4.34 3.66 3.31 3.78 3.69 3.66 3.63 3.88 3.78 4.03 4.19 4.47 4.44 4.22 4.41 4.50 4.38 4.69 4.44 3.72 3.94 4.16 4.13 3.97 4.31 4.00 4.22 4.44 4.19 4.63 4.47 4.22 4.47 4.56 4.53 4.50 4.75 3.91 3.97 4.25 4.03 4.22 4.25 4.03 3.88 4.22 3.13 2.97 3.34 4.13 3.91 4.44 3.59 3.69 3.88 4.41 4.50 4.50 3.53 3.72 4.16 4.22 4.38 4.28 Interaction Arcon FT NTT 3.78 3.69 4.63 3.59 3.66 4.09 3.88 3.72 4.56 3.66 3.13 4.25 3.72 3.38 4.00 3.88 3.56 4.41 3.94 4.00 4.84 3.72 4.03 4.72 4.03 4.38 4.72 3.78 3.88 4.44 3.81 3.56 4.38 3.97 4.09 4.66 4.06 4.16 4.72 3.75 3.97 4.44 4.09 4.19 4.88 4.03 3.72 4.63 3.78 3.78 4.34 3.69 3.63 4.59 3.84 3.06 3.88 3.94 3.63 4.44 3.97 3.53 4.38 4.06 4.06 4.66 4.00 3.47 4.28 4.00 3.91 4.47 Perception Arcon FT NTT 4.00 3.84 4.13 4.06 4.00 3.81 4.19 3.84 4.19 3.66 2.94 3.59 3.84 2.94 3.72 3.88 3.44 4.00 4.34 4.38 4.31 4.09 4.44 4.53 4.44 4.78 4.31 3.91 3.72 4.00 3.94 3.91 4.13 3.88 4.13 4.25 4.38 4.59 4.44 4.31 4.53 4.44 4.47 4.50 4.69 3.91 3.75 4.06 4.13 4.13 4.09 4.09 3.78 4.19 3.66 2.72 3.66 4.31 3.78 4.25 4.13 3.44 4.00 4.34 4.50 4.38 3.94 3.44 4.22 4.44 4.22 4.19 Global Quality Arcon FT NTT 3.56 3.53 3.34 3.66 3.63 3.13 3.88 3.56 3.53 3.28 2.81 2.72 3.50 2.94 2.72 3.41 3.22 3.13 3.78 4.00 3.50 3.97 4.06 3.72 4.16 4.50 3.44 3.31 3.41 3.16 3.66 3.69 3.25 3.53 3.97 3.53 4.00 4.25 3.59 3.94 3.97 3.44 3.97 4.19 3.97 3.50 3.56 3.34 3.69 3.78 3.19 3.72 3.44 3.19 3.41 2.53 2.81 3.78 3.28 3.53 3.78 3.28 3.16 3.69 4.09 3.56 3.81 3.31 3.22 3.91 3.91 3.53
Del 300 500 300 300 500 300 300 500 300 300 500 300 300 500 300 300 500 300 300 300 300 300 500 500
Rm-A/Rm-B (Noise environment) RC (Radio Conditions) PL (% Packet Loss) Mode (Bit rate in kbps) Del (Delay in msec)
ETSI
27
The bottom section of Table 19 shows the Chi-square tests of the MANOVA roots. It shows only a single significant root (1 through 5), indicating that a single underlying variable accounts for the significant variation in the dependent variables for these conditions. The canonical coefficients for this root are also shown in the table and are used to compute the composite dependent variable that represents the underlying variable for the NB/SYM conditions. The composite dependent variable (NB/S-CTQ for NarrowBand/Symmetric-Conversation Test Quality) is used to characterize the ratings in the NB/SYM conditions. NB/S-CTQ scores for all conditions and all LABs in Set 1 are listed in the Annex A. Equation 1 shows the formula used to compute the composite score for the NB/SYM conditions.
Table 19: Results of MANOVA for COND for NB/SYM Conditions
Univariate ANOVA's for Effect COND (df = 17, 1710) VQ US IA PC 8.25 8.07 5.51 11.80 0.00 0.00 0.00 0.00 MANOVA for Effect: COND Value F-Statistic df Prob 0.16 3.38 85, 8550 0.00 Test of Residual Roots Chi-Square df 292.56 85 73.44 64 34.14 45 11.27 28 4.23 13
GQ 10.99 0.00
5 5 5 5 5
Dep.Var. VQ US IA PC GQ
Formula used to compute the Conversation Test Quality Score (NB/S-CTQ) for the conditions in Set 1: NB/S-CTQ = .0426*VQ + .0620*US - .0015 * IA + .5664 * PC + .4470 * GQ The SYM conditions in the NB experiment are categorized by four experimental factors: Radio conditions 10 , 10 , and 5x10 Packet Loss 0% and 3% AMR-NB mode or bit rate 6.7 kbps and 12.2 kbps Delay 300 msec and 500 msec
-2 -3 -4
(1)
These conditions are assigned to two factorial experimental designs for analysing the effects of three of these factors. Table 20a shows the allocation of the 12 conditions used to evaluate the effects of Radio Conditions, Packet Loss, and Mode with Delay held constant at 300 msec. Table 20b shows the allocation of the 12 conditions used to evaluate the effects of Radio Conditions, Packet Loss, and Delay with Mode held constant at 12.2 kbit/s.
ETSI
28
Table 20a: NB/SYM: Factorial Design for Effects of Radio Cond., Packet Loss, and Mode
No Noise - 300 msec delay 6.7kbps / 0% PL RC Cond.# 10 -3 10 -4 5x10
-2
Table 20b: NB/SYM: Factorial Design for the Effects of Radio Cond., Packet Loss, and Delay
No Noise - 12.2 kbps 300 msec / 0% PL RC Cond.# 10 -3 10 -4 5x10
-2
1 7 13
4 10 16
3 9 15
6 12 18
3 9 15
6 12 18
2 8 14
5 11 17
The composite dependent variable, NB/S-CTQ, was computed for the NB/SYM conditions using the equation shown in Eq.1. These composite scores were subjected to factorial ANOVA for the two experimental designs shown in Tables 20a and 20b. The results of those ANOVAs are shown in Tables 21 and 22, respectively.
Table 21: Results of ANOVA of NB/S-CTQ for the Effects of Lab, Radio Conditions (RC), Packet Loss (PL), and Mode
ANOVA for Composite Variable NB/S-CTQ Sum-of-Squares df Mean-Square 1.12 2 0.56 39.49 2 19.74 64.20 1 64.20 9.74 1 9.74 10.37 4 2.59 4.42 2 2.21 0.08 2 0.04 0.63 2 0.32 1.76 2 0.88 0.51 1 0.51 2.17 4 0.54 2.69 4 0.67 0.43 2 0.22 0.91 2 0.46 2.36 4 0.59 797.99 1116 0.72 938.88 1151
Source LAB RC PL MODE LAB*RC LAB*PL LAB*MODE RC*PL RC*MODE PL*MODE LAB*RC*PL LAB*RC*MODE LAB*PL*MODE RC*PL*MODE LAB*RC*PL*MODE Error Total
F-ratio 0.79 27.61 89.79 13.62 3.62 3.09 0.06 0.44 1.23 0.71 0.76 0.94 0.30 0.64 0.82
Prob 0.46 0.00 0.00 0.00 0.01 0.05 0.94 0.64 0.29 0.40 0.55 0.44 0.74 0.53 0.51
Table 21 shows that the main effects for Radio Conditions, Packet Loss, and Mode are significant (p<.05) for the NB/SCTQ composite variable as are the interactions of LAB x RC and LAB x PL. Figure 7 shows the NB/S-CTQ scores with 95% confidence-interval bars for the factors tested in Table 21. The significant interactions of RC x LAB and PL x LAB indicate that the pattern of scores for the levels of RC and PL were significantly different across the three LABs. Figure 9 illustrates the interaction of LAB x RC, Fig.10 the interaction of LAB x PL.
ETSI
29
5.0
4.0
3.91
3.0 LAB- LABArcon FT LABNTT RC10-2 RCRC10-3 5x10-4 PL-0% PL-3% Mode- Mode6.7 12.2
Figure 7: NB/S-CTQ Scores for the Effects of LAB, Radio Conditions, Packet Loss, and Mode
5.0
4.5 4.20 NB/S-CTQ 4.06 4.0 3.82 3.58 3.5 3.49 3.98 4.08 3.95 3.96 10 2 10 3 5 x 10-4
ETSI
30
5.0
4.5
NB/S-CTQ
3.55 3.5
3.0
Arcon FT Interaction of LAB x PL NTT
Figure 9: NB/S-CTQ Scores showing the Interaction of LAB x Packet Loss Table 22: Results of ANOVA of NB/S-CTQ for the Effects of LAB, Radio Conditions (RC), Packet Loss (PL), and Delay
ANOVA for Composite Variable NB/S-CTQ Sum-of-Squares df Mean-Square 3.10 2 1.55 42.54 2 21.27 44.72 1 44.72 4.06 1 4.06 10.47 4 2.62 3.52 2 1.76 0.64 2 0.32 0.10 2 0.05 1.01 2 0.50 0.37 1 0.37 1.45 4 0.36 4.46 4 1.12 0.80 2 0.40 1.81 2 0.90 4.29 4 1.07 717.03 1116 0.64 840.39 1151
Source LAB RC PL DELAY LAB*RC LAB*PL LAB*DELAY RC*PL RC*DELAY PL*DELAY LAB*RC*PL LAB*RC*DELAY LAB*PL*DELAY RC*PL*DELAY LAB*RC*PL*DELAY Error Total
F-ratio 2.41 33.10 69.61 6.32 4.07 2.74 0.50 0.08 0.79 0.58 0.57 1.74 0.62 1.41 1.67
Prob 0.09 0.00 0.00 0.01 0.00 0.07 0.61 0.92 0.46 0.45 0.69 0.14 0.54 0.25 0.15
The results in Table 22 show that the main effects for Radio Conditions, Packet Loss, and Delay are significant while only one interaction, LAB x RC, is significant. Figure 10 shows the NB/S-CTQ scores with 95% confidence-interval bars for the factors tested in Table 22. Figure 11 illustrates the significant interaction of Lab x RC. The figure shows that the pattern of scores for RC is significantly different across LABs.
ETSI
31
5.0
4.5
NB/S-CTQ
4.04
4.0
3.95
4.03
3.92
3.83 3.63
3.5
3.0
LABArcon LAB-FT LAB-NTT RC-10-2 RC-10-3 RC5x10-4 PL-0% PL-3% Delay 300ms Delay 500ms
Figure 10: NB/S-CTQ Scores for the Effects of LAB, Radio Conditions, Packet Loss, and Delay
5.0
4.5
4.14 NB/S-CTQ 4.03 4.0 3.75 3.63 3.5 3.44 3.94 3.87 4.04 3.99 10 2 10 3 5 x 10-4
Figure 11: NB/S-CTQ Scores showing the Interaction of LAB x Radio Conditions
7.4
Table 18 shows the 6 test conditions involved in the NB asymmetric condition conversation tests (conditions 19 to 24). Also shown in the table are the Mean scores for each rating scale by condition and by listening lab. Each score shown in the table is the average of ratings from 32 subjects. Table 23 shows the inter-correlation matrix for the dependent variables in the NB/ASY conditions. The degree of intercorrelation among the dependent variables suggests that a reduced set of underlying variables accounts for their variation.
ETSI
32
Table 23: Inter-correlations Among the Dependent Variables for the NB/ASY Conditions WB/A VQ US IA PC GQ VQ 1 0.60 0.35 0.44 0.65 US 1 0.56 0.65 0.64 IA PC GQ
1 0.59 0.56
1 0.68
Table 24 shows the results of MANOVA for the effects of COND for the NB/ASY conditions. The analysis shows significant COND effects for all the univariate ANOVAs as well as for the MANOVA. The Chi-square tests of the MANOVA roots shows only a single significant root (1 through 5), indicating that a single underlying variable accounts for the significant variation in the dependent variables for these conditions. The canonical coefficients for this root are used to estimate the composite dependent variable that represents the underlying variable for the NB/ASY conditions. The composite dependent variable (NB/A-CTQ for NarrowBand/Asymmetric-Conversation Test Quality) is used to characterize the ratings in the NB/ASY conditions. NB/A-CTQ scores for all conditions and all LABs in Set 2 are listed in Annex A. Equation 2 shows the formula that was used to compute the values of the composite variable, NB/ACTQ, for characterizing the NB/ASY conditions.
Table 24: Results of MANOVA for COND for NB/ASY Conditions Univariate ANOVA's for Effect: COND (df = 5, 570) VQ US IA PC F-Ratio 7.05 22.40 5.99 13.32 Prob 0.00 0.00 0.00 0.00 MANOVA for effect: COND Statistic Value F-Ratio df Prob Pillai Trace 0.18 4.38 25, 2850 0.00 Test of Residual Roots Chi-Square df 114.89 25 7.23 16 2.70 9 0.31 4 0.04 1 Dependent Variable VQ US IA PC GQ
GQ 10.20 0.00
Formula used to compute the Conversation Test Quality Score (NB/A-CTQ) for the NB/ASY conditions: NB/A-CTQ = .0894*VQ + .3420*US + .1851 * IA + .2761 * PC + .1074 * GQ (2)
The six NB/ASY conditions are distinguished by two factors. One factor has three levels with each level differing along a number of dimensions Noise, Packet Loss, Mode, and Delay. These differences are listed in Table 18, but the factor will be referred to in the following analyses by the factor-name, Noise, noting that the conditions differ in more dimensions than noise alone. The second factor relates to the source of the noise. The noise is either in the room of the transmitting subject or in the room of the receiving subject. This factor will be referred to as Room. Table 25 shows the results of ANOVA for NB/A for the factors of LAB, Noise, and Room.
ETSI
33
Table 25: Results of ANOVA of NB/A-CTQ for the Effects of LAB, Noise, and Room
ANOVA for Composite Variable - NB/A-CTQ Sum-of-Squares df Mean-Square 7.09 2 3.55 17.07 2 8.54 43.76 1 43.76 3.28 4 0.82 2.39 2 1.19 3.31 2 1.65 1.19 4 0.30 349.80 558 0.63 427.89 575
Source LAB Noise Room LAB x Noise LAB x Room NOISE x Room LAB x Noise x Room Error Total
The results of the ANOVA for NB/A-CTQ show that all three factors, LAB, Noise, and Room, are significant, but that none of the interactions are significant. Figure 12 shows the NB/A-CTQ scores with 95% confidence-interval bars for the three factors tested in Table 25.
5.0
4.5
NB/A-CTQ
4.16
4.0
3.92 3.73
3.99
4.03
3.98
3.64
3.60
3.5
3.0 LAB-AR LAB-FT LABNTT N OISE- N OISE- N OISECAR CAF STR Significant Factors - LAB, Noise & Room ROOM- ROOMXMIT RC V
Figure 12: NB/A-CTQ Scores for the Effects of LAB, Noise, and Room
7.5
Table 27 shows the 18 test conditions involved in the AMR-WB conversation tests (conditions 1 to 18). Also shown in the table are the Mean scores for each rating scale by condition and by listening lab. Each score shown in the table is the average of ratings from 32 subjects. The initial step in the analysis is to examine the inter-correlation among the dependent variables for indications of underlying variables. Table 26 shows the inter-correlation matrix of the dependent variables for the WB/SYM conditions. Absolute values of correlation above .50 have been bolded in the table. The table shows a high degree of inter-correlation among the dependent variables indicating the presence of a reduced set of significant underlying variables.
Table 26: Intercorrelations Among the Dependent Variables for the WB/SYM Conditions
ETSI
34
WB/S VQ US IA PC GQ
IA
PC
GQ
1 0.51 0.55
1 0.66
The second step in the analysis is designed to determine how many underlying variables account for the variance in the five dependent variables. MANOVA for the effects of COND was conducted on the WB/SYM data conditions 1-18. Table 28 summarizes the results of the analysis. The top section shows the analysis for the main effect of COND. This section includes the results of the univariate ANOVAs for each of the five dependent variables followed by the results of the MANOVA. In the table we can see that the COND main effect is highly significant for each of the five individual dependent variables in the univariate ANOVAs as well as for the combination of dependent variables in the MANOVA. The bottom section of the table shows the Chi-square test of the MANOVA roots or underlying variables extracted from the five dependent variables. In Table 28, only the first root (1 through 5) is significant, indicating that a single underlying variable accounts for the significant variation in the dependent variables for these conditions. The canonical coefficients shown in the table are used to estimate the composite dependent variable that represents this root or underlying variable. The composite dependent variable (WB/S-CTQ for WideBand/Symmetric-Conversation Test Quality) is computed and used in the third step ANOVAs to test and characterize the factors of interest in the Wideband/SYM conditions. WB/S-CTQ scores for all conditions and all LABs for Set 3 are listed in Annex A. Equation 3 shows the formula that was used to compute the values of the composite variable, WB/S-CTQ, for characterizing the WB/SYM conditions.
ETSI
35
Table 27: Test Conditions and Mean Scores for each LAB for the Wideband Experiment
Wideband - Experimental Parameters Rm-A Rm-B RC PL Mode 2 Quiet Quiet 0 12.65 10 2 Quiet Quiet 0 12.65 10 2 Quiet Quiet 0 15.85 10 2 Quiet Quiet 3 12.65 10 2 Quiet Quiet 3 12.65 10 2 Quiet Quiet 3 15.85 10 3 Quiet Quiet 0 12.65 10 3 Quiet Quiet 0 12.65 10 3 Quiet Quiet 0 15.85 10 3 Quiet Quiet 3 12.65 10 3 Quiet Quiet 3 12.65 10 3 Quiet Quiet 3 15.85 10 Quiet Quiet 0 12.65 5 x 10-4 Quiet Quiet 0 12.65 5 x 10-4 5 x 10-4 Quiet Quiet 0 15.85 Quiet Quiet 3 12.65 5 x 10-4 5 x 10-4 Quiet Quiet 3 12.65 Quiet Quiet 3 15.85 5 x 10-4 5 x 10-4 Car Quiet 3 12.65 5 x 10-4 Quiet Car 3 12.65 5 x 10-4 Cafeteria Quiet 0 12.65 Quiet Cafeteria 5 x 10-4 0 12.65 5 x 10-4 Street Quiet 0 15.85 Quiet Street 5 x 10-4 0 15.85 Voice Quality NTT RoHC Arcon FT RoHC 4.09 4.22 3.84 4.00 4.44 3.97 RoHC 4.13 4.28 4.13 RoHC 3.88 3.72 3.72 3.63 3.75 3.72 RoHC 3.91 3.97 3.84 RoHC 4.22 4.38 4.00 4.06 4.47 4.06 RoHC 3.88 4.63 3.94 RoHC 3.97 4.31 3.97 4.03 4.25 3.75 RoHC 4.03 4.03 3.91 RoHC 4.09 4.34 4.19 4.09 4.59 4.06 RoHC 4.19 4.47 4.03 RoHC 3.94 3.97 3.91 4.06 4.19 3.88 RoHC 4.13 4.34 3.81 RoHC 3.50 4.09 2.97 RoHC 3.97 4.03 3.78 3.75 4.38 3.66 4.16 4.56 4.13 RoHC 3.81 4.31 3.72 RoHC 3.94 4.44 4.16 Understanding Arcon FT NTT 4.38 4.41 4.34 4.22 4.84 4.53 4.38 4.50 4.69 4.19 4.09 4.03 4.06 3.88 4.06 4.19 4.44 4.28 4.50 4.56 4.69 4.28 4.69 4.72 4.34 4.75 4.53 4.19 4.50 4.41 4.41 4.56 4.34 4.34 4.38 4.47 4.34 4.63 4.66 4.47 4.81 4.59 4.47 4.69 4.66 4.25 4.53 4.41 4.25 4.47 4.41 4.38 4.53 4.56 3.59 3.63 3.00 4.09 4.34 4.38 3.78 4.38 3.88 4.47 4.72 4.69 3.63 3.91 4.22 4.31 4.59 4.69 Interaction Arcon FT NTT 4.25 4.13 4.53 4.06 4.38 4.72 4.31 4.19 4.66 3.91 4.09 4.28 3.91 3.81 4.38 4.06 4.13 4.53 4.25 4.22 4.75 4.22 4.25 4.69 4.16 4.38 4.75 4.13 4.13 4.66 4.09 4.16 4.50 4.16 4.09 4.66 4.16 4.22 4.81 4.16 4.44 4.75 4.44 4.31 4.78 4.00 3.97 4.63 4.19 4.13 4.47 4.31 4.06 4.59 3.97 3.66 3.47 4.19 3.97 4.50 3.94 4.09 4.06 4.25 4.25 4.72 4.13 3.75 4.19 4.19 4.03 4.66 Perception Arcon FT NTT 4.47 4.25 4.31 4.28 4.41 4.31 4.50 4.28 4.59 4.34 3.84 4.06 4.22 3.88 4.16 4.22 4.03 4.28 4.69 4.56 4.63 4.31 4.47 4.69 4.44 4.50 4.53 4.47 4.19 4.53 4.69 4.16 4.28 4.28 4.22 4.38 4.59 4.53 4.63 4.50 4.56 4.56 4.59 4.47 4.59 4.25 4.16 4.38 4.59 4.28 4.28 4.59 4.19 4.44 4.03 3.38 3.19 4.34 3.88 4.31 4.31 3.97 3.84 4.59 4.44 4.59 4.41 3.34 4.19 4.56 4.25 4.69 Global Quality Arcon FT NTT 4.09 4.06 3.75 3.78 4.31 4.00 4.28 4.09 4.22 3.88 3.53 3.59 3.72 3.63 3.69 3.84 3.84 3.81 4.28 4.19 4.00 4.16 4.25 4.22 3.94 4.38 4.06 4.03 3.94 3.97 3.94 3.97 3.81 4.00 3.81 3.91 4.00 4.13 4.22 4.16 4.38 4.09 4.38 4.16 4.06 3.84 3.88 4.00 4.09 3.94 3.84 4.09 3.91 3.81 3.81 3.34 2.78 4.03 3.75 3.84 3.81 3.81 3.34 4.13 4.16 4.22 4.13 3.41 3.59 4.03 4.09 4.16
Cond 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Rm-A/Rm-B (Noise environment) RC (Radio Conditions) PL (% Packet Loss) Mode (Bit rate in kbps) RoHC
ETSI
36
GQ 4.14 0.00
5 5 5 5 5
Dep.Var. VQ US IA PC GQ
The following formula is used to compute the Conversation Test Quality Score (WB/S-CTQ) for the WB/SYM conditions: WB/S-CTQ = .0685*VQ + .3519*US + .1612 * IA + .2619 * PC + .1565 * GQ The SYM conditions in the WB experiment are categorized by four experimental factors:
Radio conditions 10 , 10 , and 5x10 Packet Loss 0% and 3% AMR-WB mode or bit rate 12.65 kbps and 15.85 kbps ROHC
-2 -3 -4
(3)
These conditions are assigned to two factorial experimental designs for analysing the effects through ANOVA of three of these factors. Table 29a shows the allocation of the 12 conditions used to evaluate the effects of Radio Conditions, Packet Loss, and Mode with ROHC held constant. Table 29b shows the allocation of the 12 conditions used to evaluate the effects of Radio Conditions, Packet Loss, and ROHC Mode held constant at 12.65kbps.
Table 29a: WB/SYM: Factorial Design for the Effects of Radio Cond., Packet Loss, and Mode
No Noise - RoHC 12.65kbps / 0% PL RC Cond.# 10 -3 10 -4 5x10
-2
Table 29b: WB/SYM: Factorial Design for the Effects of Radio Cond., Packet Loss, and Mode
No Noise - 12.65 kbps RoHC / 0% PL RC Cond.# 10 -3 10 -4 5x10
-2
1 7 13
4 10 16
1 7 13
4 10 16
3 9 15
6 12 18
2 8 14
5 11 17
The composite dependent variable, WB/S-CTQ, was computed for the WB/SYM conditions and subjected to factorial ANOVA for the two experimental designs shown in Tables 29a and 29b. The results of the ANOVAs are shown in Tables 30 and 31, respectively.
ETSI
37
Table 30: Results of ANOVA of WB/S-CTQ for the Effects of Lab, Radio Conditions (RC), Packet Loss (PL), and Mode ANOVA for Composite Variable WB/S-CTQ Sum-of-Squares df Mean-Square 6.53 2 3.26 6.90 2 3.45 14.33 1 14.33 1.41 1 1.41 0.98 4 0.24 0.23 2 0.12 0.04 2 0.02 0.35 2 0.18 1.96 2 0.98 0.09 1 0.09 0.45 4 0.11 2.25 4 0.56 0.11 2 0.05 0.01 2 0.01 1.00 4 0.25 558.34 1116 0.50 594.97 1151
Source LAB RC PL MODE LAB*RC LAB*PL LAB*MODE RC*PL RC*MODE PL*MODE LAB*RC*PL LAB*RC*MODE LAB*PL*MODE RC*PL*MODE LAB*RC*PL*MODE Error Total
F-ratio 6.52 6.90 28.65 2.81 0.49 0.23 0.04 0.35 1.96 0.17 0.23 1.12 0.11 0.01 0.50
Prob 0.00 0.00 0.00 0.09 0.75 0.79 0.96 0.70 0.14 0.68 0.92 0.34 0.90 0.99 0.74
Table 30 shows that the main effects for LAB, Radio Conditions, and Packet Loss are significant for the WB/S-CTQ composite variable. The factor Mode is not significant nor are any of the interactions. Figure 13 shows the WB/S-CTQ scores with 95% confidence-interval bars for the factors tested in Table 30.
5.0
4.82 4.83
4.81
4.5 WB/S-CTQ
4.0
3.5
3.0 LA B- LA BA rcon FT LA BNTT RC- 10- RC-10- RC2 3 5x 10-4 PL- 0% PL-3% Mode- Mode12.6 5 15.85
Figure 13: WB/S-CTQ Scores for the Effects of LAB, Radio Conditions, Packet Loss, and Mode
ETSI
38
Table 31: Results of ANOVA of WB/S-CTQ for the Effects of LAB, Radio Conditions (RC), Packet Loss (PL), and ROHC ANOVA for Composite Variable WB/S-CTQ Sum-of-Squares df Mean-Square 5.24 2 2.62 13.59 2 6.80 19.41 1 19.41 0.07 1 0.07 0.80 4 0.20 2.46 2 1.23 0.70 2 0.35 1.57 2 0.78 0.24 2 0.12 0.11 1 0.11 0.98 4 0.25 1.90 4 0.47 2.02 2 1.01 0.50 2 0.25 0.85 4 0.21 573.40 1116 0.51 623.84 1151
Source LAB RC PL ROHC LAB*RC LAB*PL LAB*ROHC RC*PL RC*ROHC PL*ROHC LAB*RC*PL LAB*RC*ROHC LAB*PL*ROHC RC*PL*ROHC LAB*RC*PL*ROHC Error Total
F-ratio 5.10 13.23 37.79 0.14 0.39 2.39 0.68 1.52 0.24 0.21 0.48 0.92 1.97 0.48 0.41
Prob 0.01 0.00 0.00 0.71 0.82 0.09 0.51 0.22 0.79 0.65 0.75 0.45 0.14 0.62 0.80
The results in Table 31 show that the main effects for LAB, Radio Conditions, and Packet Loss are significant. The factor ROHC is not significant nor are any of the interactions. Figure 14 shows the WB/S-CTQ scores with 95% confidence-interval bars for the factors tested in Table 31. These listening tests were conducted using a fixed size RAB available at this time (size: 46 kbit/s). The test results show that when using ROHC the quality stays the same and the bitrate can be drastically reduced by suppressing the IP/UDP/RTP headers. As a result, a smaller RAB could be used.
5.0 4.67 4.5 WB/S-CTQ 4.73 4.5 9
4.84
4.83
4.82
4.0
3.5
3.0 LA B- LA B- LA BA rcon FT NTT RC-10- RC- 10- RC2 3 5x10-4 PL-0% PL-3% RoHC No RoHC
Figure 14: WB/S-CTQ Scores for the Effects of LAB, Radio Conditions, Packet Loss, and ROHC
7.6
Table 27 shows the 6 test conditions involved in the AMR-WB asymmetric condition conversation tests (condition 19 to 24). Also shown in the table are the Mean scores for each rating scale by condition and by listening lab. Each score shown in the table is the average of ratings from 32 subjects.
ETSI
39
Table 32 shows the inter-correlation matrix for the dependent variables in the WB/ASY conditions. The high degree of inter-correlation shown in the table suggests that a reduced set of underlying variables accounts for the variation in the five dependent variables.
Table 32: Inter-correlations Among the Dependent Variables for the WB/ASY Conditions WB/S VQ US IA PC GQ VQ 1 0.67 0.56 0.55 0.72 US 1 0.64 0.65 0.73 IA PC GQ
1 0.66 0.69
1 0.73
Table 33 shows the results of MANOVA for the effects of COND for the WB/ASY conditions. The analysis shows significant COND effects for all the univariate ANOVAs as well as for the MANOVA. The Chi-square tests of the MANOVA roots show only a single significant root (1 through 5), indicating that a single underlying variable accounts for the significant variation in the dependent variables for these conditions. The canonical coefficients for this root were used to compute the composite dependent variable that represents the underlying variable for the WB/Asymmetric conditions. The composite dependent variable (WB/A-CTQ for WideBand/Asymmetric-Conversation Test Quality) is used to characterize the ratings in the WB/ASY conditions. WB/A-CTQ scores for all conditions and all LABs for Set 4 are listed Annex A. Equation 4 shows the formula that was used to compute the values of the composite variable, WB/A-CTQ, for characterizing the WB/ASY conditions.
Table 33: Results of MANOVA for COND for WB/ASY Conditions Univariate ANOVA's for Effect: COND (df = 5, 570) VQ US IA PC F-Ratio 8.38 21.63 8.16 14.10 Prob 0.00 0.00 0.00 0.00 MANOVA for effect: COND Statistic Value F-Ratio df Prob Pillai Trace 0.19 4.53 25, 2850 0.00 Test of Residual Roots Chi-Square df 118.45 25 11.19 16 3.80 9 1.85 4 0.00 1 Dependent Variable VQ US IA PC GQ
GQ 10.97 0.00
The following formula used to compute the Conversation Test Quality Score (WB/ACTQ) for the WB/ASY conditions. WB/A-CTQ = -.0970*VQ + .8979*US - .1103 * IA + .4136 * PC - .1042 * GQ (4)
The six WB/ASY conditions are distinguished by two factors. One factor has three levels with each level differing along a number of dimensions Noise, Packet Loss, Mode, and ROHC. These differences are listed in Table 27 but the factor will be referred to in the following analyses by the factor-name, Noise, noting that the conditions differ in more dimensions than noise alone. The second factor relates to the source of the noise and has two levels. The noise is either in the room of the transmitting subject or in the room of the receiving subject. This factor is referred to as Room in the following analyses. Table 34 shows the results of ANOVA for WB/A-CTQ for the factors of LAB, Noise, and Room.
ETSI
40
Table 34: Results of ANOVA of WB/A-CTQ for the Effects of LAB, Noise, and Room
ANOVA for Composite Variable - WB/A-CTQ Sum-of-Squares df Mean-Square 6.06 2 3.03 20.41 2 10.21 63.10 1 63.10 8.15 4 2.04 3.16 2 1.58 2.19 2 1.09 6.20 4 1.55 444.37 558 0.80 553.64 575
Source LAB NOISE ROOM LAB*NOISE LAB*ROOM NOISE*ROOM LAB*NOISE*ROOM Error Total
The results of the ANOVA for WB/A-CTQ show that all three factors, LAB, Noise, and Room, are significant but only one of the interactions, LAB x Noise is significant. Figure 15 shows the WB/A-CTQ scores with 95% confidenceinterval bars for the three factors tested in Table 34. Figure 16 shows how the pattern of scores for the Noise factor is different over the three LABs resulting in the significant interaction of Lab x Noise.
5.0
4.50
4.5
4.22
WB/A-CTQ
4.27 3.91
4.35 4.25
4.03
4.0
3.84
3.5
3.0 LAB-AR LAB-FT LABNTT NOISE- NOISE- NOISECAR CAF STR Significant Factors - LAB, Noise & Room ROOM- ROOMXMIT RCV
Figure 15: WB/A-CTQ Scores for the Effects of LAB, Noise, and Room
ETSI
41
5.0
4.55
4.15 WB/A-CTQ
3.5
7.7
Table 35 shows the test conditions involved in the conversation tests designed to compare the performance of standardized ITU-T codecs in packet switched networks. The test involves eight codecs and two levels of packet loss, 0% and 3%. Scores are shown for each of the five dependent variables by Condition and by Language (Language is referred to by factor-name LAB in the following analyses). Each score shown in the table is the average of ratings from 32 listeners.
Table 35: Test Conditions and Scores for each Condition and Lab (Language) for the Codec (Phase 2) Experiment Set 5 - Phase II Experimental Parameters Cond PL Codec, Mode 1 0 AMR-NB, 6.7kbit/s 2 0 AMR-NB, 12.2kbit/s 3 0 AMR-WB, 12.65kbit/s 4 0 AMR-WB, 15.85kbit/s 5 0 G. 723., 6.4 kbit/s 6 0 G.729, 8kbit/s 7 0 G.722, 64 kbit/s + plc 8 0 G.711 + plc 9 3 AMR-NB, 6.7kbit/s 10 3 AMR-NB, 12.2 kbit/s 11 3 AMR-WB, 12.65kbit/s 12 3 AMR-WB, 15.85kbit/s 13 3 G. 723.1, 6.4 kbit/s 14 3 G.729, 8kbit/s 15 3 G.722, 64 kbit/s + plc 16 3 G.711 + plc Ph2-CTQ Scores French Arabic Average 4.22 3.94 4.08 4.31 4.05 4.18 4.33 4.30 4.32 4.46 4.31 4.38 4.15 3.98 4.07 4.11 4.18 4.14 4.34 4.13 4.24 4.32 4.28 4.30 3.79 3.58 3.68 4.03 3.88 3.95 4.28 4.04 4.16 4.14 3.99 4.07 3.87 3.51 3.69 3.99 3.82 3.90 4.33 4.30 4.32 4.34 4.33 4.34
ETSI
42
Table 36 shows the inter-correlation matrix for the dependent variables in the Phase 2 experiment. The moderate degree of inter-correlation shown in the table suggests that a reduced set of underlying variables may account for the variation in the five dependent variables. The following acronyms were used in the tables PL for Packet Loss, FR for French and AB-Arabic.
Table 36: Inter-correlations Among the Dependent Variables for the Codec Conditions. WB/S VQ US IA PC GQ VQ 1 0.47 0.50 0.48 0.60 US 1 0.54 0.42 0.53 IA PC GQ
1 0.51 0.62
1 0.61
Table 37 shows the results of MANOVA for the effects of COND for the Phase 2 experiment. The analysis shows significant COND effects for all the univariate ANOVAs as well as for the MANOVA. The Chi-square tests of the MANOVA roots show only a single significant root (1 through 5), indicating that a single underlying variable accounts for the significant variation in the dependent variables for these conditions. The canonical coefficients for this root were used to compute the composite dependent variable that represents the underlying variable for the Phase 2 conditions. The composite dependent variable (Ph2-CTQ for Phase2-Conversation Test Quality) is computed and used to characterize the ratings in the Phase 2 experiment. Ph2-CTQ scores for all conditions and all LABs for Set 5 are listed in the Appendix. Equation 5 shows the formula that was used to compute the values of the composite variable, Ph2CTQ, for characterizing the Phase 2 conditions.
Table 37: Results of MANOVA for COND for the Phase 2 Conditions Univariate ANOVA's for Effect: COND (df = 15, 1008) VQ US IA PC F-Ratio 5.64 2.43 2.68 2.54 Prob 0.00 0.00 0.00 0.00 MANOVA for effect: COND Statistic Value F-Ratio df Prob Pillai Trace 0.12 1.61 75, 5040 0.00 Test of Residual Roots Chi-Square df 122.26 75 32.44 56 19.29 39 10.45 24 2.58 11 Dependent Prob Variable 0.00 VQ 1.00 US 1.00 IA 0.99 PC 1.00 GQ
GQ 4.25 0.00
The following formula was used to compute the Conversation Test Quality Score (Ph2-CTQ) for the Phase 2 conditions: Ph2-CTQ = .5995*VQ + .0860*US - .0092 * IA + .0459 * PC + .2778 * GQ The 16 Phase 2 conditions are distinguished by two factors, Codec and Packet Loss. Table 38 shows the results of ANOVA for Ph2-CTQ for these factors.
ETSI
43
Table 38: Results of ANOVA of Ph2-CTQ for the Effects of Codec and Packet Loss ANOVA for Composite Variable - Ph2-CTQ Sum-of-Squares df Mean-Square 5.71 1 5.71 27.44 7 3.92 10.33 1 10.33 1.70 7 0.24 0.07 1 0.07 7.09 7 1.01 1.45 7 0.21 474.61 992 0.48 528.38 1023
The results of the ANOVA for Ph2-CTQ show that all three factors, LAB, Codec, and Packet Loss, are significant as well as the interaction Codec x Packet Loss. Figure 17 shows the Ph2-CTQ scores with 95% confidence-interval bars for the factors tested in Table 38. Figure 18 illustrates the interaction of Codec x Packet Loss.
5.0
4.5
4.19 4.24 4.04 4.07 3.88 3.88 4.22 4.02 4.28
PH2-CTQ
4.0
3.5
3.0
French Arabic AMR-NB, AMR-NB, AMR-WB, AMR-WB, G.723.1, G.729, 8k G.722, 6.7k 12.2k 12.65k 15.85k 6.4k 64k+plc G.711 +plc 0% PL 3% PL
Figure 17: Ph2-CTQ Scores for the Effects of LAB, Codec, and Packet Loss
ETSI
44
5.0
4.5
4.32 4.18
4.32
4.30 4.34
PH2-CTQ
4.08
4.0
3.68
3.95
0% PL 3% PL
3.5
3.0 AMR-NB, 6.7k AMR-NB, 12.2k AMR-WB, 12.65k AMR-WB, 15.85k G.723.1, 6.4k G.729, 8k G.722, 64k+plc G.711 +plc
Figure 18: Ph2-CTQ Scores Showing the Interaction of Factors Codec and Packet Loss
7.8
For each of the five sets of conditions in the Packet-Switched Conversation Tests, analysis by MANOVA revealed a single underlying variable that accounts for the significant variation in the five opinion rating scales, VQ, US, IA, PC, and GQ. Conversation Test Quality (CTQ) scores were computed for each set of conditions. The CTQ scores were analysed through ANOVA to characterize the conditions involved in the Conversation Tests.
8
8.A
8.A.1
The HSPDA/EUL listening only characterisation tests were conducted by two listening test laboratories (Nokia and Ericsson). Tested languages were Finnish and Swedish. The tested speech codecs were: Adaptive Multi-Rate narrownand (AMR-NB), in modes 12.2 kbit/s and 5.9 kbit/s, Adaptive Multi-Rate wideband (AMR-WB), in mode 12.65 kbit/s.
The tested jitter buffer implementations were: Fixed reference jitter buffer (as a reference), Adaptive jitter buffer compliant with the functional and performance requirements in TS 26.114.
Subjective quality score and delay were used as metrics to evaluate the results. The test was designed based on P.800.Sec.6.2.
8.A.2
Test arrangement
The subjective tests evaluated the impact of the HSDPA/EUL radio channel conditions on the speech quality especially when the channel is subject to packet losses and jitter. The test items were processed using an error insertion device
ETSI
45
(EID) introducing jitter and packet losses into simulated RTP packet stream. The performance of AMR-NB and AMRWB was evaluated with adaptive jitter buffer management (JBM). Description of the processing of speech material is found in Annex J.
8.A.3
Two different jitter buffer implementations were used in the tests; a fixed JBM and an adaptive JBM. Both are briefly described in the following subsections.
8.A.3.1
Fixed JBM
The fixed jitter buffer i.e. a buffer that does not change the end-to-end delay during a session used in the tests was only used together with the tested codecs as a reference condition. The buffer was not conducting any buffer adaptation at all. The role of the fixed JBM reference condition was to show the performance of a fixed JBM, which was tuned to give the (fixed) end-to-end delay equal to the average end-to-end delay of the adaptive JBM in the same channel condition. This was done by setting the initial buffering delay in a value resulting in the desired end-to-end delay for each channel condition separately. The initial buffer delay for the fixed jitter buffer was thus set having the full a priori knowledge of the behaviour of the transmission channel over the whole session and the transmission delay of the first incoming packet. Such an approach can not be used in real-life implementations where both the (future) channel behaviour and the delay of the first received packet are not known by the receiver. Hence, the fixed JBM was noncausal and thus impossible to use in a real-life implementation. Furthermore, due to its nature of non-adaptivity, it does not pass the minimum performance requirements for JBM schemes set in 3GPP TS 26.114 [19].
8.A.3.2
Adaptive JBM
As opposed to the fixed JBM, an adaptive JBM may change the end-to-end delay during a session with the aim to optmise the trade-off between buffering delay and buffer induced frame lossess. The adaptive jitter buffer management algorithm used in the listening only tests was a simple algorithm conducting buffer adaptation mainly during inactive speech without any time scale modifications with the option to adapt during active speech only to avoid excessive frame losses. Thus, the adaptation was mainly based on insertion and removal of comfort noise frames. Note, however, that to avoid excessive losses the adaptation may also have taken place during active speech if a sudden increase in transmission delay was detected. The algorithm met both the functional requirements and the minimum performance requirements set in 3GPP TS 26.114. The outline of the operation of this adaptive JBM is described in Annex I of this document. Contrary to the fixed JBM described in the previous section, this JBM could be used in real-life implementations and provides performance according to the test results presented in the following sections in this report.
8.A.4
Network conditions
The network conditions used when the test material was processed were divided into eight different channels. The conditions were characterized by low mobility, high mobility, low traffic (LT) and high traffic (HT) in the uplink and downlink respectively. All conditions were presented as channel profiles were the transmission end-to-end delay and link losses could be extracted for test file processing. The following radio network condition definitions were used.
Table 39: Definition of Radio Network Conditions
Condition Name DL: PedB3_km+PedA3_km DL: VehA30km+Veh120km+PedB30km UL: PedB3_km+PedA3_km UL: VehA30km+Veh120km+PedB30km Network Load: 40/45/60 per cell DL-LT DH-LT Network Load: 80/100 per cell DL-HT DH-HT UL UH
ETSI
46
Based on the radio network conditions in the table above, eight different channels were constructed. These network conditions were composed into channel conditions for the listening tests in the following way.
Table 40: Definition of Radio Network Channels conditions
Channel Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Radio Network Condition DL-LT-UL DL-LT-UH DL-HT-UL DL-HT-UH DH-LT-UL DH-LT-UH DH-HT-UL DH-HT-UH
The radio networks conditions were simulated using HSDPA in the downlink and EUL in the uplink. The actual configurations of the radio network simulators can be found in Annex K. The 8 resulting channels all showed a 1% link loss and delay variations in the range of 30-300 msec. The delay profiles of the conditions are shown together with the adaptive JBM buffering in section 8.7 in this report.
8.A.5
Listening experiments
Tables from 41 to 46 provide a summary of the listening-only test conditions, and the full test plan is provided in Annex H.
Table 41: Noise types for listening only test
Noise type Clean Car Cafeteria Level (dBSNR) 15 dB 20 dB
AMR and AMR-WB codecs were tested in both clean and background noise in various channel conditions.
Table 43: Test conditions for listening-only tests with AMR-NB
Cond. Noise Type Frame Loss Rate Channel AMR-Modes (fixed RTP delay)
ETSI
3GPP TR 26.935 version 9.0.0 Release 9 1-1 1-2 1-3 1-4 Clean Clean Clean Clean 0.01 0.01 0.01 0.01
ETSI TR 126 935 V9.0.0 (2010-01) 5.9kbit/s (150 ms) 5.9kbit/s (150 ms) 12.2kbit/s (150 ms) 12.2kbit/s (150 ms)
Table 44: Test conditions for listening-only tests with AMR-NB in background noise
Cond. Noise Type Frame Loss Rate 0.01 0.01 0.01 0.01 Channel AMR-Modes (fixed RTP delay)
5.9kbit/s (150 ms) 5.9kbit/s (150 ms) 12.2kbit/s (150 ms) 12.2kbit/s (150 ms)
12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms)
Table 46: Test conditions for listening-only tests with AMR-WB in background noise
Cond. Noise Type Frame Loss Rate 0.01 0.01 0.01 0.01 Channel AMR-WB (fixed RTP delay)
12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms)
8.A.6
Test Results
Figures from 19 to 26 provide the listening-only test results. For each test condition the MOS/DMOS score with 95 % confidence intervals is shown.
ETSI
MOS
1.0 1.5 2.0 2.5
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
MNRU 5 dB MNRU 13 dB MNRU 21 dB
MNRUNB-21
MNRU 29 dB
MNRUNB-29
MNRU 37 dB
MNRUNB-37
Direct
Direct
AMR 5.9
AMRNB@5.9
AMR 12.2
ETSI
48
AMRNB@12.2
AMRNB@5.9, fixed JB, ch 1 AMRNB@5.9, adaptive JB, ch1 AMRNB@5.9, fixed JB ch2 AMRNB@5.9, adaptive JB, ch2 AMRNB@12.2, fixed JB, ch3 AMRNB@12.2, adaptive JB, ch3 AMRNB@12.2, fixed JB, ch4 AMRNB@12.2, adaptive JB, ch4
DMOS
1.0 1.5 2.0 2.5
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
MNRU 5 dB MNRU 13 dB MNRU 21 dB
MNRU 29 dB
MNRUNB-29
MNRU 37 dB
MNRUNB-37
Direct (clean)
Direct, car
Direct, caf
AMRNB@5.9, car
AMRNB@5.9, caf
ETSI
49
AMRNB@12.2, car
AMRNB@12.2, caf AMRNB@5.9, fixed JB, car ch5 AMRNB@5.9, adaptive JB, car ch5 AMRNB@5.9, fixed JB, caf ch6 AMRNB@5.9, adaptive JB, caf ch6 AMRNB@12.2, fixed JB, car ch7 AMRNB@12.2, adaptive JB, car ch7 AMRNB@12.2, fixed JB, caf ch8 AMRNB@12.2, adaptive JB, caf ch8
MOS
1.0 1.5 2.0 2.5
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
MNRU 5 dB MNRU 13 dB MNRU 21 dB
MNRUWB-21
MNRU 29 dB
MNRUWB-29
MNRU 37 dB
MNRUWB-37
MNRU 45 dB
MNRUWB-45
Direct
Direct
AMR-WB 12.65
ETSI
50
AMRWB@12.65
AMRWB@12.65, fixed JB, ch1 AMRWB@12.65, adaptive JB, ch1 AMRWB@12.65, fixed JB, ch2 AMRWB@12.65, adaptive JB, ch2 AMRWB@12.65, fixed JB, ch3 AMRWB@12.65, adaptive JB, ch3 AMRWB@12.65, fixed JB, ch4 AMRWB@12.65, adaptive JB, ch4
DMOS
1.0 1.5 2.0 2.5
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
MNRU 5 dB MNRU 13 dB MNRU 21 dB
MNRU 29 dB
MNRUWB-29
MNRU 37 dB
MNRUWB-37
MNRU 45 dB
MNRUWB-45
Direct (clean)
Direct, car
Direct, caf
ETSI
51
AMRWB@12.65, car
AMRWB@12.65, caf AMRWB@12.65, fixed JB, car ch5 AMRWB@12.65, adaptive JB, car ch5 AMRWB@12.65, fixed JB, car ch6 AMRWB@12.65, adaptive JB, car ch6 AMRWB@12.65, fixed JB, caf ch7 AMRWB@12.65, adaptive JB, caf ch7 AMRWB@12.65, fixed JB, caf ch8 AMRWB@12.65, adaptive JB, caf ch8
AMR-WB 12.65 Fix JBM car ch 5 AMR-WB 12.65 Ada JBM car ch 5 AMR-WB 12.65 Fix JBM car ch 6 AMR-WB 12.65 Ada JBM car ch 6 AMR-WB 12.65 Fix JBM cafe ch 7 AMR-WB 12.65 Ada JBM cafe ch 7 AMR-WB 12.65 Fix JBM cafe ch 8 AMR-WB 12.65 Ada JBM cafe ch 8
52
8.A.7
Delay analysis
The delay analysis provided in Table 47 and Figures from 27 to 34 has only been done on channels 1 through 8 using AMR-NB with the adaptive JBM for the tests in laboratory 2 using the Swedish language. Including AMR-WB 12.65 in the analysis does not give any additional information since the patterns of voice activity is determined to be quite similar for both codecs. This is also true for including the Finnish language in the analysis. The voice activity for AMRNB 5.9 and AMR-NB 12.2 is identical. The CDF curve is based on the JBM buffering time. The average end-to-end delay figures in Table 47 indicate that the achieved delay performance is suitable for speech conversation. In addition, it can be noted that the error concealment operations caused by the JBM operation (i.e. frames dropped or inserted by the JBM e.g. due to late arrival or buffer under/overflow) are below 0.5% for all test cases restricting the media quality impact to be minor. The adaptation principle of the tested JBM can be traced back when comparing average buffering times of all frame and speech frames. Since the adaptation is conducted mainly during inactive speech, i.e. during silence periods, the delay values are different. SID frames are forwarded for decoding typically immediately they arrive to the receiver, while the jitter buffer target delay is accumulated by delaying the playback of the first frame of speech burst.
ETSI
53
Table 47: Delay analysis of adaptive JBM for AMR-NB 12.2 kbps operation
Condition Channel 1, clean 16000 8746 1029 6225 Channel 2, clean 16000 8746 1029 6225 Channel 3, clean 16000 8746 1029 6225 Channel 4, clean 16000 8746 1029 6225 Channel 5, car 16000 8936 983 6081 Channel 6, caf 16000 9583 939 5478 Channel 7, car 16000 8935 981 6084 Channel 8, caf 16000 9583 939 5478
Encoded frames Encoded speech frames Encoded SID frames Encoded NO_DATA frames Transmitted frames Received frames Received speech frames Received SID frames Lost frames Late frames Late speech frames Late loss rate (speech frames) Average buffering time (all frames) [msec] Average buffering time (speech frames) [msec] Average end-toend delay (all) [msec] Average end-toend delay (speech) [msec] Buffering time (fixed@startPos) [msec]
62,1496
45,7399
77,0119
64,4522
60,5685
41,5635
76,0796
60,3656
98,4819
74,6829
127,074
109,9885
104,3152
76,2146
125,1083
103,1506
103,0551
77,9534
132,6756
114,6473
108,6259
78,7723
130,4654
106,5322
85,0551
57,9534
96,6756
66,6473
46,6259
46,7723
64,4654
68,5322
ETSI
54
Figure 27: Performance, adaptive JBM channel 1, clean. The delay spike at the end of the channel profile was 340 msec. The CDF curve is based on the JBM buffering time.
Figure 28: Performance, adaptive JBM channel 2, clean. The delay spikes of the channel profile were 310, 320 and 300 msec respectively. The CDF curve is based on the JBM buffering time.
ETSI
55
Figure 29: Performance, adaptive JBM channel 3, clean. The delay spike of the channel profile was 320 msec. The CDF curve is based on the JBM buffering time.
ETSI
56
Figure 30: Performance, adaptive JBM channel 4, clean. The CDF curve is based on the JBM buffering time.
Figure 31: Performance, adaptive JBM channel 5, car. The CDF curve is based on the JBM buffering time.
ETSI
57
Figure 32: Performance, adaptive JBM channel 6, caf. The CDF curve is based on the JBM buffering time.
Figure 33: Performance, adaptive JBM channel 7, car. The CDF curve is based on the JBM buffering time.
ETSI
58
Figure 34: Performance, adaptive JBM channel 8, caf. The CDF curve is based on the JBM buffering time.
8.A.8
The listening only test results for HSDPA/EUL radio channels indicate that an adaptive JBM conforming to the MTSI performance requirements is able to provide consistent voice quality over varying transmission conditions. The test also showed that the performance of the JBM directly impacts the voice quality. Furthermore, the tested adaptive JBM provides equal or better voice quality than the reference non-causal fixed JBM in all test cases. In test conditions where the channel delay showed small variations the adaptive JBM provided performance equal to the fixed reference JBM, while in the test conditions where the channel behaviour introduced larger delay variations the adaptive JBM outperformed the fixed reference JBM. Thus, the results indicate that an adaptive JBM is needed to cope with the large variations in channel delay.
8.B
8.B.1
Conversation Tests
Introduction
3GPP/SA4 developed a test plan [see ANNEX L] designed to evaluate the performance of AMR and AMR-WB for UMTS over HSDPA/EUL. Three test labs were contracted to conduct conversation tests according to the test plan and deliver raw voting data to the Global Analysis Lab (GAL) for processing and statistical analysis. This document reports the results for the three test labs and additional statistical analyses conducted by the GAL.
8.B.2
The test plan described three conversations tests to be conducted in each of three test labs. The test labs were FTRD, testing in the French language, BIT, testing in the Chinese language, and Dynastat, testing in North American English. Each of the three conversation tests involved a different 3GPP standardized speech codec: Exp.1 - AMR operating at 5.9k bps
ETSI
59
Exp.2 - AMR operating at 12.2k bps Exp.3 - AMR-WB operating at 12.65k bpsThe test plan specified that the experiments should be conducted according to specifications contained in the ITU-T Recommendation for Conversation Testing, P.805. Alcatel-Lucent provided the network impairment simulation test-bed, which was described in the test plan. The test-bed was shipped to each test lab so that the same test conditions could be reproduced in each lab. Each conversation test involved the same 16 network test connections shown in Table 1. Subjects were paired for the conversation task. Test conditions were designed such that each condition was evaluated by both members of the conversation pair. In each test condition, subjects were seated in one of four simulated noise environments as specified in Table 1: Hoth/Quiet (labeled Q in this document), Cafeteria/Babble (B), Car (C), and Street (S). In half of the test conditions both subjects in the pair were in the same noise environment (QQ, BB, CC, SS). In the other half they were in different noise environments (QC, CQ, SB, BS). The noise conditions were also represented in the network simulation as either High Mobility conditions (HM Car and Street) or Low Mobility conditions (LM Hoth/Quiet and Cafeteria/Babble). In half of the test connections the test-bed simulated High Traffic network connections (HT), in the other half it simulated Low Traffic (LT) network connections. The test plan specified common testing parameters in order that the conversation test results would be comparable across test labs. Those parameters included the test-bed, the experimental design, test conditions, background noise environments, randomized test-condition presentation order, and number of subjects (32 subjects in 16 subject-pairs).
Table 8.B.1 Test Conditions for the Conversation Tests
Cond. #
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Car Car Hoth Cafeteria Cafeteria Street Street Hoth Car Car Hoth Cafeteria Cafeteria Street Street
Radio Network Conditions (RNC) A->B: [1] B->A: [1] A->B: [6] B->A: [6] A->B: [5] B->A: [2] A->B: [2] B->A: [5] A->B: [1] B->A: [1] A->B: [2] B->A: [5] A->B: [5] B->A: [2] A->B: [6] B->A: [6] A->B: [3] B->A: [3] A->B: [8] B->A: [8] A->B: [7] B->A: [4] A->B: [4] B->A: [7] A->B: [3] B->A: [3] A->B: [4] B->A: [7] A->B: [7] B->A: [4] A->B: [8] B->A: [8]
Description
Car Hoth Car Cafeteria Street Cafeteria Street Hoth Car Hoth Car Cafeteria Street Cafeteria Street
Lm.LT.LM LM.LT.Lm Hm.LT.HM HM.LT.Hm Hm.LT.LM HM.LT.Lm Lm.LT.HM LM.LT.Hm Lm.LT.LM LM.LT.Lm Lm.LT.HM LM.LT.Hm Hm.LT.LM HM.LT.Lm Hm.LT.HM HM.LT.Hm Lm.HT.LM LM.HT.Lm Hm.HT.HM HM.HT.Hm Hm.HT.LM HM.HT.Lm Lm.HT.HM LM.HT.Hm Lm.HT.LM LM.HT.Lm Lm.HT.HM LM.HT.Hm Hm.HT.LM HM.HT.Lm Hm.HT.HM HM.HT.Hm
ETSI
60
On each test trial, the subjects evaluated the test connection using five rating scales, where each rating scale involved five categories. In this report the results and analyses for the rating scales are labeled by the following conventions: Question 1 VQ Rate the Voice Quality of your partner. Question 2 UN Rate the difficulty of Understanding your partner. Question 3 LE Rate the Level of Effort required to communicate with your partner. Question 41 DD Did you Detect Disturbances in the conversation? If yes, how annoying were they. Question 5 OQ Rate the Overall Quality of the test connection.
8.B.3
The three test labs delivered their raw voting data to the GAL in the Excel spreadsheets provided by the GAL. Each of the test labs also provided test lab reports containing summary results for the conversation tests. Dynastat processed the raw voting data from the data delivery files and cross-checked the resulting scores against those contained in the test lab reports. In all cases the scores computed by Dynastat agreed with those reported by the test labs. The GAL therefore confirms the integrity of the raw data delivery for the three test labs.
8.B.4
8.B.4.1
Test Results
Mean Scores by Experiment and by Test Lab
The GAL was instructed by 3GPP/SA4 to treat the results of individual experiments from the test labs separately rather than making comparisons across experiments or across labs. This approach is justified by the experimental design of the conversation tests. Each experiment in each lab involved a different codec and each used an independent panel of test subjects. Comparisons of results across experiments within one lab are confounded by both codecs and subject panels. Comparisons across labs are further confounded by language and cultural differences in the subject panels. Finally, there are no common conditions across experiments and therefore no basis for transforming scores to a common origin and scale across experiments. The results and analyses contained in this report are limited to the results from a single experiment in a single Lab. Figures 1-9 show the Mean scores for each of the five rating scales by Experiment and by Test Lab Figs. 1-3 for Exp.1, Figs. 4-6 for Exp.2, and Figs. 7-9 for Exp.3.
1 Question 4 contained two-parts. In the first part the subject answers whether he detects any disturbances - yes or no. If he answers yes, he then rates how annoying the disturbances were on a five-point scale. For practical purposes, a rating of 6 has been assigned to the responses of no disturbances detected. The ITU-T Recommendation for Conversational Testing, P.805, discusses Question 4 but does not address the procedure to be applied to no votes.
ETSI
61
VQ 6 5 4 3 2 1
QQ BB QC BS CQ SB
UN
LE
DD
OQ
CC
SS
BB
QC
BS
CQ
SB
CC
SS
LM>LM
LM>HH
HM>LM
HM>HM
LM>LM
LM>HH
HM>LM
HM>HM
Low Traffic
High Traffic
UN
LE
DD
OQ
Low Traffic
High Traffic
UN
LE
DD
OQ
Low Traffic
High Traffic
ETSI
62
VQ 6 5 4 3 2 1
QQ BB QC BS CQ SB
UN
LE
DD
OQ
CC
SS
BB
QC
BS
CQ
SB
CC
SS
LM>LM
LM>HH
HM>LM
HM>HM
LM>LM
LM>HH
HM>LM
HM>HM
Low Traffic
High Traffic
UN
LE
DD
OQ
Low Traffic
High Traffic
UN
LE
DD
OQ
Low Traffic
High Traffic
ETSI
63
VQ 6 5 4 3 2 1
QQ BB QC BS CQ SB
UN
LE
DD
OQ
CC
SS
BB
QC
BS
CQ
SB
CC
SS
LM>LM
LM>HH
HM>LM
HM>HM
LM>LM
LM>HH
HM>LM
HM>HM
Low Traffic
High Traffic
UN
LE
DD
OQ
Low Traffic
High Traffic
VQ 6 5 4 3 2 1
QQ BB QC BS CQ SB
UN
LE
DD
OQ
CC
SS
BB
QC
BS
CQ
SB
CC
SS
LM>LM
LM>HH
HM>LM
HM>HM
LM>LM
LM>HH
HM>LM
HM>HM
Low Traffic
High Traffic
ETSI
64
8.B.4.2
In most subjective tests there are repeated measures, which may be used to evaluate the reliability of individual subjects performance in the subjective task relative to that of other subjects in the test panel. Furthermore, in those tests subjects hear and evaluate the same materials and there is a basis to compare and evaluate their responses across trials. For conversation tests, however, subjects dont have the same materials on which to base their responses (i.e., each conversation is unique) and there are no repeated measurers on which to evaluate reliability (i.e., there is only one trial per test condition). The only performance measure available for individual subjects within an experiment is the correlation of their responses across trials with the responses of the other subjects in the experiment. Table 2 shows the average correlation (across subjects and across rating scales) for each Test Lab and for each experiment within each lab. These values provide an indication of the consistency of the responses across subjects within an experiment. In general, the values are relatively low compared to values typically obtained for other subjective tests for MOS tests conducted by Dynastat, those average correlations are typically around 0.90.
Table 8.B.2 Consistency Measures by Lab and Experiment
Avg. Subj. R Exp.1 Exp.2 Exp.3 BIT 0.32 0.29 0.25 FTRD 0.38 0.35 0.28 Dynastat 0.44 0.52 0.47
Since the same 16 test conditions were tested in each of the three experiments, though with a different codec, the results across experiments can be expected to be positively correlated. Table 3 shows the intercorrelations across experiments for each of the five rating scales for each of the three Test Labs. The correlations are very high, especially for Labs Dynastat and FTRD, less so for Lab BIT. This finding was encouraging but somewhat unexpected considering the relatively narrow range of mean scores across test conditions, i.e., most mean scores were between 3.0 and 4.5.
Table 8.B.3 Intercorrelations Across Experiments for the Five Rating Scales for Each Lab
Lab
Correlation R(E1,E2) BIT R(E1,E3) R(E2,E3) R(E1,E2) FTRD R(E1,E3) R(E2,E3) R(E1,E2) Dynastat R(E1,E3) R(E2,E3)
Avg. 0.77 0.63 0.64 0.81 0.92 0.78 0.86 0.91 0.88
8.B.4.3
The multiple rating scales used in conversation tests are designed to capture different aspects of the conversation task, e.g., voice quality, difficulty of understanding, level of effort, overall quality. In a previous conversation testing exercise conducted by 3GPP/SA4 [see clause 7] the rating scales were found to be highly intercorrelated and multivariate analyses (i.e., Multivariate Analysis of Variance or MANOVA) revealed that there was only one underlying variable that accounted for the significant variance in the five rating scales. The MANOVA procedure also provides coefficients for weighting the scores on the individual rating scales to produce a composite score corresponding to the underlying variable. The use of such composite scores makes it easier to compare test factors since the multiple criterion variables often give ambiguous or even conflicting results. Furthermore, the composite scores are more reliable than scores based on a single criterion variable. For the results reported here, the GAL conducted a MANOVA for each of the nine experiments involved in the conversation test, where the independent variable was Conditions (n=16) and the dependent variables were the five rating scales VQ, UN, LE, DD, and OQ. The results of the MANOVAs showed that there was more never than one significant composite variable in any experiment. In five of the nine experiments (1F, 1D, 2F, 2D, 3D) there was a single significant underlying variable (criterion = p<0.05). Furthermore, in one experiment (1B) the
ETSI
65
composite variable was close to significant (p=0.08). In the three remaining experiments (2B, 3B, 3F) there was no significant composite variable (p>0.05). Nevertheless, in the interests of a parsimonious solution, the GAL computed a composite variable for each of the nine conversation tests based on results from the appropriate MANOVA. Using the precedent set in the previous 3GPP conversation tests, the GAL has labeled each composite variable as the measure of Conversational Quality for the appropriate experiment.
8.B.4.3.1
The raw voting data from Exp.1, conducted at BIT, was subjected to MANOVA to determine whether the scores for the five rating scales could be represented by a smaller number of underlying variables. Table 4 shows the results of that MANOVA. The following description of Table 4 also applies to the MANOVAs for each of the other eight experiments.
Table 8.B.4 Results of MANOVA for Exp.1 AMR-5.9 Lab BIT
MANOVA for Effects of Conditions x Rating Scales Pillai Trace Value F-Ratio df Probability Statistic 0.1810 1.2421 75,2480 0.0801 Intercorrelations Among Rating Scales Exp.1B VQ UN LE DD OQ VQ 1.000 0.785 0.830 0.847 0.898 UN 0.785 1.000 0.882 0.919 0.881 LE 0.830 0.882 1.000 0.828 0.896 DD 0.847 0.919 0.828 1.000 0.933 0.898 0.881 0.896 0.933 1.000 OQ Test of residual roots Canon.coef. For Dep.Var. Root 1-5 Root ChiSq df Prob. 1-5 93.49 75 0.0729 VQ 0.2606 2-5 45.38 56 0.8441 UN 0.6287 3-5 25.96 39 0.9458 LE 0.2999 4-5 11.42 24 0.9857 DD -0.0765 5-5 3.74 11 0.9770 OQ -0.1127
The first step in the MANOVA process is to examine the intercorrelations among the dependent variables for indications of underlying variables. The left-hand side of Table 4 shows the intercorrelation matrix of the five dependent variables across conditions for Exp.1 for Lab BIT. The table shows a high degree of intercorrelation, indicating the presence of a reduced set of underlying variables. The right-hand side of Table 4 shows the results of the MANOVA for the effects of Conditions (independent variable) x Rating Scales (dependent variables). The top section of the table shows the statistical test for the significance of the combination of dependent variables. The Pillai Trace2 and the associated F-statistic is not significant in this MANOVA, though its probability (p=0.0801) is close to the criterion for significance, p<0.05. The bottom section of Table 4 shows the Chi-square tests of the MANOVA roots. It shows that only the first root (1-5) is close to significant, indicating that a single underlying variable accounts for the (almost) significant variation in the dependent variables. The canonical coefficients for this root are also shown in the table and are used to compute the composite dependent variable that corresponds to the underlying variable. The probability of the Chi-Square value for the initial root (Chisquare = 93.49, df = 75) is similar to that of the Pillai Trace (i.e., p = 0.07). The probability of the second root (2-5) is not even close to significance (p=0.8441). The same applies to the succeeding roots, 3-5, 4-5, and 5-5. The Canonical Coefficients for the first root are used to compute a weighted average of the five dependent variables producing the composite variable, labeled here as Conversational Quality for Exp.1-Lab BIT. The same process is applied to the data for each of the other eight experiments, producing a composite variable, or Conversational Quality measure, for each experiment. Tables 5-12 summarize the results of MANOVA for each of the other eight conversation tests, respectively.
Table 8.B.5 Results of MANOVA for Exp.1 AMR-5.9 Lab FTRD
MANOVA for Effects of Conditions x Rating Scales Pillai Trace Value F-Ratio df Probability 0.2062 1.4223 75,2480 0.0108 Statistic Intercorrelations Among Rating Scales Exp.1F VQ UN LE DD OQ VQ 1.000 0.952 0.926 0.937 0.950 UN 0.952 1.000 0.972 0.906 0.925 LE 0.926 0.972 1.000 0.873 0.887 0.937 0.906 0.873 1.000 0.961 DD OQ 0.950 0.925 0.887 0.961 1.000
Canon.coef. For Test of residual roots Dep.Var. Root 1-5 Root ChiSq df Prob. 1-5 108.85 75 0.0065 VQ 0.2565 2-5 34.51 56 0.9894 UN 0.3720 3-5 17.60 39 0.9987 LE 0.3534 4-5 9.79 24 0.9953 DD -0.1706 5-5 3.33 11 0.9857 OQ 0.1887
2 For MANOVA, there is no single universally accepted procedure for hypothesis testing but rather a number of different methods. For the analyses that follow, we have chosen Pillai Trace and the associated F-statistic as the criterion for significance, primarily because of its robustness to violations of MANOVA assumptions.
ETSI
66
ETSI
67
8.B.4.3.2
The canonical coefficients for the first root were used as weighting factors for the individual rating scales to compute a composite variable, labeled here as Conversational Quality (CQ)3 for each experiment. The CQ scores present a simplified method for evaluating the results for each experiment. The validity of the CQ measures is a function of the reliability of the MANOVA from which it was derived. More confidence can be afforded to CQ values from those experiments with a significant underlying variable (1B, 1F, 1D, 2F, 2D, 3D), less confidence to those experiments with no significant underlying variable (2B, 3B, 3F). Table 13 shows Summary CQ results (Means and Standard Deviations) for Exp.1. Tables 14 and 15 show results for Exp.2 and Exp.3, respectively.
3 The term Conversational Quality was introduced in previous AMR and AMR-WB Conversation Tests [6] but it has not been validated in the ITUT Recommendation for Conversational Testing, P.805. The Conversational Quality values reported in this document are specific to the particular lab and the experiment from which they are derived. Scores are not absolute and comparisons across experiments are not valid.
ETSI
68
ETSI
69
8.B.4.3.3
The conversation tests were designed primarily to evaluate two experimental factors, Traffic and Mobility, for each of three codecs. The Traffic effect had two levels Low Traffic and High Traffic. The Mobility effect had four levels LM LM, LM HM, HM LM, and HM HM. In addition, the Mobility factor had two test conditions (i.e., background noise conditions) representing each level of inter-connection. The experimental design of the conversation tests does not permit a direct comparison of the effects of Codecs, since each codec was evaluated in a separate conversation test using independent test panels. The Traffic conditions were simulated by RNC settings in the test-bed. The Mobility conditions were simulated by a combination of test-bed RNC settings and background noise conditions in the test rooms. Each mobility connection, e.g., LM HM, involved two different background noise conditions and therefore the effects of Mobility connection and background noise were confounded. This confounding means that the effects of Mobility and background noise cannot be separated. For this reason the results for the two background noise conditions were often inconsistent for each level of Mobility. Figures 10-18 show the CQ results for each experiment involved in the conversations tests. Each figure has two parts on the left are CQ scores for every test condition, on the right are average scores for the Traffic and Mobility factors. Each figure stands on its own the scale and origin of the CQ scales apply only to the specific experiment. The caption in each figure indicates whether the CQ variable was significant in the MANOVA from which it was derived. Figures 10-12 show CQ scores for Exp.1 for Labs BIT, FTRD, and Dynastat, respectively. Similarly, Figs. 13-15 show CQ scores for Exp.2, and Figs. 16-18 for Exp.3.
ETSI
70
5 CQ - AMR-5.9 - BIT
CQ - AMR-5.9 - BIT
Q Q B B Q C B S C Q SB C C SS Q Q B B Q C B S C Q SB C C SS LM >LM LM >H M H M >LM H M >H M LM >LM Low Tr af f i c LM >H M H M >LM H M >H M Hi gh Tr af f i c
1
1 LT HT LM>LM LM>HM HM>LM HM>HM Traffic Mobility
Fig.8.B.10 Conversation Quality Scores for Exp.1-AMR-5.9 for Lab BIT (CQ was not significant, p=0.08)
5 CQ - AMR-5.9 - FTRD
CQ - AMR-5.9 - FTRD
Q Q B B Q C B S C Q SB C C SS Q Q B B Q C B S C Q SB C C SS LM >LM LM >H M H M >LM H M >H M LM >LM Low Tr af f i c LM >H M H M >LM H M >H M Hi gh Tr af f i c
1
1 LT HT LM>LM LM>HM HM>LM HM>HM Traffic Mobility
Fig. 8.B.11 Conversation Quality Scores for Exp.1-AMR-5.9 for Lab FTRD (CQ was significant, p<0.05)
CQ - AMR-5.9 - Dynastat
5
CQ - AMR-5.9 - Dynastat
Q Q B B Q C B S C Q SB C C SS Q Q B B Q C B S C Q SB C C SS LM >LM LM >H M H M >LM H M >H M LM >LM Low Tr af f i c LM >H M H M >LM H M >H M Hi gh Tr af f i c
1
1 LT HT LM>LM LM>HM HM>LM HM>HM Traffic Mobility
Fig. 8.B.12 Conversation Quality Scores for Exp.1-AMR-5.9 for Lab Dynastat (CQ was significant, p<0.0001)
ETSI
71
CQ - AMR-12.2 - BIT
QQ BB QC BS CQ SB CC SS QQ BB QC BS CQ SB CC SS
LM >LM
LM >HM
HM >LM
HM >HM
LM >LM
LM >HM
HM >LM
HM >HM
Low Tr af f i c
Hi gh Tr af f i c
Fig. 8.B.13 Conversation Quality Scores for Exp.2-AMR-12.2 for Lab BIT (CQ was not significant, p=0.28)
CQ - AMR-12.2 - FTRD
QQ BB QC BS CQ SB CC SS QQ BB QC BS CQ SB CC SS
LM >LM
LM >HM
HM >LM
HM >HM
LM >LM
LM >HM
HM >LM
HM >HM
Low Tr af f i c
Hi gh Tr af f i c
Fig.8.B.14 Conversation Quality Scores for Exp.2-AMR-12.2 for Lab FTRD (CQ was significant, p<0.05)
5 CQ - AMR-12.2 - Dynastat
QQ BB QC BS CQ SB CC SS QQ BB QC BS CQ SB CC SS
Low Tr af f i c
Hi gh Tr af f i c
Fig. 8.B.15 Conversation Quality Scores for Exp.2-AMR-12.2 for Lab Dynastat (CQ was significant, p<0.0001)
ETSI
72
CD - AMRWB-12.65 - BIT
5
CQ - AMRWB-12.65 - BIT
Q Q B B Q C B S C Q SB C C SS Q Q B B Q C B S C Q SB C C SS LM >LM LM >H M H M >LM H M >H M LM >LM Low Tr af f i c LM >H M H M >LM H M >H M Hi gh Tr af f i c
1
1 LT HT LM>LM LM>HM HM>LM HM>HM Traffic Mobility
Fig. 8.B.16 Conversation Quality Scores for Exp.3-AMRWB-12.65 for Lab BIT (CQ was not significant, p=0.55)
CQ - AMRWB-12.65 - FTRD
5
CQ - AMRWB-12.65 - FTRD
Q Q B B Q C B S C Q SB C C SS Q Q B B Q C B S C Q SB C C SS LM >LM LM >H M H M >LM H M >H M LM >LM Low Tr af f i c LM >H M H M >LM H M >H M Hi gh Tr af f i c
1
1 LT HT LM>LM LM>HM HM>LM HM>HM Traffic Mobility
Fig. 8.B.17 Conversation Quality Scores for Exp.3-AMRWB-12.65 for Lab FTRD (CQ was not significant, p=0.67)
CQ - AMRWB-12.65 - Dynastat
5 CQ - AMRWB-12.65 - Dynastat
Q Q B B Q C B S C Q SB C C SS Q Q B B Q C B S C Q SB C C SS LM >LM LM >H M H M >LM H M >H M LM >LM Low Tr af f i c LM >H M H M >LM H M >H M Hi gh Tr af f i c
Fig. 8.B.18 Conversation Quality Scores for Exp.3-AMRWB-12.65 for Lab Dynastat (CQ was significant, p<0.0001)
ETSI
73
8.B.5
For the Traffic factor, the Conversational Quality results are consistent and confirm expectations. For all three codecs and in all three test-labs, CQ is numerically higher in Low Traffic conditions than in High Traffic conditions. In general, the results of the Conversation Tests show that the effects of Traffic on the performance of AMR and AMRWB for UMTS over HSDPA/EUL are relatively small. For the Mobility factor, however, the results are not as consistent. This is not surprising, since Mobility conditions were confounded with background noise conditions. It is important to note that the Conversational Quality scores computed in this exercise are specific to the particular lab and the experiment from which they are derived. Scores are not absolute and comparisons across experiments are not valid. Furthermore, the variables underlying the CQ scores were not significant in all experiments. Overall, the performance of AMR and AMR-WB for UMTS over HSDPA/EUL is robust under conditions of Traffic, Mobility, and Background noise evaluated in the conversation tests.
9
9.1
Conclusions
Tests over DCH radio channels
The results from conversational tests on DCH channels confirm that the default speech codecs (AMR-NB and AMRWB) operate well for packet switched conversational multimedia applications over various realistic operating conditions (i.e. packet loss, delay, background noise, radio conditions and ROHC). The quality is somewhat reduced when packet losses occur and the end-to-end delay is increased, but the overall quality still remains acceptable even with 3% packet loss rate in the terrestrial IP network and up to a maximum of 1% BLER on each radio leg. The results also indicate that users have clear preference for AMR-WB speech over AMR-NB speech.
9.2
The listening only test results for HSDPA/EUL radio channels indicate that an adaptive JBM conforming to the MTSI performance requirements is able to provide consistent voice quality over varying transmission conditions. The test also shows that the performance of the JBM directly impacts the voice quality. Furthermore, the test results indicate that an adaptive JBM is needed to cope with the large variations in channel delay.
9.3
Overall, the performance of AMR and AMR-WB for UMTS over HSDPA/EUL is robust under conditions of Traffic, Mobility, and Background noise evaluated in the conversation tests.
9.4
General consideration
The performance results can be used as guidance for network planning regarding the QoS parameters for VoIP.
ETSI
74
Annex A: Conversation test composite dependent variable scores by condition and Lab
Cond 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Set 1 - Narrowband/SYM Experimental Parameters Rm-A Rm-B RC PL Mode 2 Quiet Quiet 0 6.7 10 2 Quiet Quiet 0 12.2 10 2 Quiet Quiet 10 0 12.2 2 Quiet Quiet 10 3 6.7 2 Quiet Quiet 10 3 12.2 2 Quiet Quiet 10 3 12.2 3 Quiet Quiet 10 0 6.7 3 Quiet Quiet 0 12.2 10 3 Quiet Quiet 10 0 12.2 3 3 6.7 Quiet Quiet 10 3 Quiet Quiet 10 3 12.2 3 Quiet Quiet 10 3 12.2 -4 Quiet Quiet 5 x 10 0 6.7 -4 Quiet Quiet 0 12.2 5 x 10 -4 Quiet Quiet 5 x 10 0 12.2 -4 Quiet Quiet 3 6.7 5 x 10 -4 Quiet Quiet 3 12.2 5 x 10 -4 Quiet Quiet 3 12.2 5 x 10 Del 300 500 300 300 500 300 300 500 300 300 500 300 300 500 300 300 500 300 Arcon 3.80 3.88 4.05 3.49 3.68 3.67 4.09 4.04 4.31 3.63 3.83 3.73 4.20 4.14 4.26 3.73 3.93 3.92 NB/S-CTQ Scores FT NTT Average 3.73 3.79 3.77 3.85 3.52 3.75 3.73 3.91 3.89 2.92 3.22 3.21 2.99 3.28 3.32 3.38 3.62 3.55 4.22 3.96 4.09 4.28 4.17 4.16 4.66 3.94 4.30 3.60 3.64 3.63 3.82 3.75 3.80 4.06 3.94 3.91 4.45 4.07 4.24 4.30 4.01 4.15 4.37 4.38 4.34 3.69 3.75 3.72 3.98 3.71 3.87 3.65 3.75 3.77
Cond 19 20 21 22 23 24
Set 2 - Narrowband/ASY Experimental Parameters Rm-A Rm-B RC PL Mode -4 Car Quiet 3 12.2 5 x 10 -4 Quiet Car 5 x 10 3 12.2 -4 Cafeteria Quiet 5 x 10 0 6.7 -4 0 6.7 Quiet Cafeteria 5 x 10 -4 Street Quiet 0 12.2 5 x 10 -4 Quiet Street 0 12.2 5 x 10
NB/A-CTQ Scores FT NTT Average 2.93 3.43 3.27 3.75 4.20 4.01 3.58 3.86 3.76 4.36 4.31 4.29 3.58 4.01 3.79 4.18 4.14 4.16
ETSI
75
Cond 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Set 3 - Wideband/SYM - Experimental Parameters Rm-A Rm-B RC PL Mode 2 Quiet Quiet 0 12.65 10 2 Quiet Quiet 0 12.65 10 2 Quiet Quiet 0 15.85 10 2 3 12.65 Quiet Quiet 10 2 Quiet Quiet 3 12.65 10 2 Quiet Quiet 3 15.85 10 3 0 12.65 Quiet Quiet 10 3 0 12.65 Quiet Quiet 10 3 0 15.85 Quiet Quiet 10 3 3 12.65 Quiet Quiet 10 3 Quiet Quiet 3 12.65 10 3 3 15.85 Quiet Quiet 10 -4 0 12.65 Quiet Quiet 5 x 10 -4 0 12.65 Quiet Quiet 5 x 10 -4 0 15.85 Quiet Quiet 5 x 10 -4 3 12.65 Quiet Quiet 5 x 10 -4 3 12.65 Quiet Quiet 5 x 10 -4 3 15.85 Quiet Quiet 5 x 10
RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC
Arcon 4.76 4.55 4.82 4.53 4.42 4.53 4.90 4.68 4.69 4.64 4.77 4.66 4.74 4.80 4.93 4.55 4.73 4.81
Cond 19 20 21 22 23 24
Set 4 - Wideband/ASY - Experimental Parameters Rm-A Rm-B RC PL Mode -4 Car Quiet 5 x 10 3 12.65 -4 3 12.65 Quiet Car 5 x 10 -4 Cafeteria Quiet 5 x 10 0 12.65 -4 5 x 10 Quiet Cafeteria 0 12.65 5 x 10-4 Street Quiet 0 15.85 -4 0 15.85 Quiet Street 5 x 10
WB/A-CTQ Scores FT NTT Average 3.62 3.17 3.49 4.32 4.53 4.33 4.35 4.01 4.06 4.65 4.80 4.64 3.87 4.32 3.97 4.49 4.78 4.54
Set 5 - Phase II Experimental Parameters Cond PL Codec, Mode 1 0 AMR-NB, 6.7kbit/s 2 0 AMR-NB, 12.2kbit/s 3 0 AMR-WB, 12.65kbit/s 4 0 AMR-WB, 15.85kbit/s 5 0 G. 723., 6.4 kbit/s 6 0 G.729, 8kbit/s 7 0 G.722, 64 kbit/s + plc 8 0 G.711 + plc 9 3 AMR-NB, 6.7kbit/s 10 3 AMR-NB, 12.2 kbit/s 11 3 AMR-WB, 12.65kbit/s 12 3 AMR-WB, 15.85kbit/s 13 3 G. 723.1, 6.4 kbit/s 14 3 G.729, 8kbit/s 15 3 G.722, 64 kbit/s + plc 16 3 G.711 + plc
Ph2-CTQ Scores French Arabic Average 4.22 3.94 4.08 4.31 4.05 4.18 4.33 4.30 4.32 4.46 4.31 4.38 4.15 3.98 4.07 4.11 4.18 4.14 4.34 4.13 4.24 4.32 4.28 4.30 3.79 3.58 3.68 4.03 3.88 3.95 4.28 4.04 4.16 4.14 3.99 4.07 3.87 3.51 3.69 3.99 3.82 3.90 4.33 4.30 4.32 4.34 4.33 4.34
ETSI
76
Question 2: Do you have difficulties to understand some words? All the time Often Sometimes Rarely Never
Question 3: How did you judge the conversation when you interacted with your partner? Excellent interactivity (similar to faceto-face situation) Good interactivity (in few moments, you were talking simultaneously, and you had to interrupt yourself) Fair interactivity (sometimes, you were talking simultaneously, and you had to interrupt yourself) Poor interactivity (often, you were talking simultaneously, and you had to interrupt yourself) Bad interactivity (it was impossible to have an interactive conversation)
Question 4: Did you perceive any impairment (noises, cuts,)? In that case, was it: No impairment Slight impairment, but not disturbing Impairment slightly disturbing Impairment disturbing Very disturbing Impairment
Question 5: How do you judge the global quality of the communication? Excellent Good Fair Poor Bad
From then on you will have a break approximately every 30 minutes. The test will last a total of approximately 60 minutes. Please do not discuss your opinions with other listeners participating in the experiment.
ETSI
77
ETSI
78
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you. Subject 2: Your Name : Information from which you should select the details which your partner requires
British Airways
Lufthansa
Flight number LH 2615 London Heathrow 6:30 departure Brussels arrival Brussels departure Dsseldorf arrival 7:35 Name address telephone number number of seats Class: Business or Economy
LH 413 8:20
9:25
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
79
Annex D: Test Plan for the AMR Narrow-Band Packet Switched Conversation Test
Source: Title: Document for: Agenda Item: Siemens1, France Telecom2 Test Plan for the AMR Narrow-Band Packet switched Conversation Test Approval 14.1
1. Introduction This document contains the test plan of one conversation test for the Adaptive Multi-Rate Narrow-Band (AMR-NB) in Packet Switched networks.
All the laboratories participating to this conversation test phase will use the same test plan, just the language of the conversation would change. Even if the test rooms or the test equipments are not exactly the same in all the laboratories, the calibration procedures and the tests equipment characteristics and performance (as defined in this document) will guarantee the similarity of the test conditions.
contact: ImreVarga
Imre.Varga @siemens.com Tel: +49 89 722 47537 Siemens AG, ICM MP Grillparzerstrasse 10a, 81675 Munich, Germany
contacts: Jean-Yves Monfort Jeanyves.monfort@francetelecom.com Tel : +33296053171 France Telecom T&I/R&D 2 avenue Pierre Marzin, 22397 Lannion, France
Catherine Quinquis catherine.quinquis@francetelecom.com Tel: +33 2 96 05 14 93 France Telecom T&I/R&D 2 avenue Pierre Marzin, 22397 Lannion, France
ETSI
80
Section 2 gives references, conventions and contacts, section 3 details the test methodology, including test arrangement and test procedure, and section 4 defines the financial considerations.
Annex A contains the instructions for the subjects participating to the conversation tests.
Annex B contains the description of results to be provided to the Analysis Laboratory (if any) by the testing laboratories.
Considerations about IPV6 versus IPV4 are given in section 3.2. RoHC is not implemented in AMR-NB conversation test. The effect of RoHC should be extrapolated from the results observed in AMR-WB conversation test.
2. References, Conventions, and Contacts 2.1Permanent Documents ITU-T Rec.P.800 Methods for Subjective Determination of Transmission Quality
This Recommendation defines conversation test procedures based on handset telephones, and gives inputs for the calibration.
AMR-WB Adaptive Multi-Rate Wide-band Speech Codec MOS Mean Opinion Score
ETSI
81
2.3 Contact Names The following persons should be contacted for questions related to the test plan.
Section
Contact Person/Email
AOB
ETSI MCC
2.4 Responsibilities Each test laboratory has the responsibility to organize its conversation tests.
Lab 1 2
Reporting
3. Test methodology 3.1 Introduction The protocol described below evaluates the effect of degradation such as delay and dropped packets on the quality of the communications. It corresponds to the conversation-opinion tests recommended by the ITU-T P.800 [1]. First of all, conversationopinion tests allow subjects passing the test to be in a more realistic situation, close to the actual service conditions experienced by telephone customers. In addition, conversation-opinion tests are suited to assess the effects of impairments that can cause difficulty while conversing (such as delay). Subjects participate to the test by couple; they are seated in separate sound-proof rooms and are asked to hold a conversation through the transmission chain performed by means of UMTS simulators and communications are impaired by means of an IP impairments simulator part of the CN simulator and by the air interface simulator, as the figure below describes it. The network configurations (including the terminal equipments) will be symmetrical (in the two transmission paths). The only dissymmetry will be due to presence of background noise in one of the test rooms. 3.2 Test arrangement 3.2.1 Description of the proposed testing system
This contribution describes a UMTS simulator for the characterization of the AMR speech codecs when the bitstream is transmitted over a PS network. The procedure to do the conversational listening test has been earlier described in [1].
ETSI
82
PCs under Windows OS with VOIP Terminal Simulator Software of France Telecom R&D. PCs under Linux OS with Air Interface Simulator of Siemens AG. PCs under WinNT OS with Network Simulator Software (NetDisturb).
The platform simulates a packet switch interactive communication between two users using PC1 and PC5 as their relatives VOIP terminals. PC1 sends AMR encoded packets that are encapsulated using IP/UDP/RTP headers to PC5. PC1 receives these IP/UDP/RTP audio packets from PC5.
ETSI
83
In fact, the packets created in PC1 are sent to PC2. PC2 simulates the air interface Up Link transmission and then forwards the transmitted packets to PC4. In the same way, PC4 simulates the air interface Down Link transmission and then forwards the packets to PC5. PC5 decodes and plays the speech back to the listener.
On both links, one can choose delay and loss laws. Both links can be treated separately or on the same way. For example, delay can be set to a fixed value but can also be set to another law such as exponential law.
ETSI
84
RTP Payload Format for AMR-NB (RFC 3267) will be used; Bandwidth efficient mode will be used; One speech frame shall be encapsulated in each RTP packet; Interleaving will not be used;
The payload header will then consist of the 4 bits of the CMR (Codec Mode Request). Then 6 bits is added for the ToC (Table of Content). For IPv4, this corresponds to a maximum of 72 bytes per frame that is to say 28.8 kbit/s, this goes up to 92 bytes (36.8 kbit/s) when using IPv6 protocol on the air interface. RTCP packets will be sent. However, in the test conditions defined in the conversation test plans, RTCP is not mandatory, as it is not in a multicast environment (see IETF rfc 1889) we are not going to make use of the RTCP reports. ROHC is an optional functionality in UMTS. In order to reduce the size of the tests and the number of condition ROHC algorithm will not be used for AMR-NB conversation test. This functionality will only be tested in the wideband condition. The Conversational / Speech / UL:42.8 DL:42.8 kbps / PS RAB RAB coming from TS 34.108 v4.7.0 will be used: Here is the RAB description:
MAC Layer 1
RAB/Signalling RB PDCP header size, bit Logical channel type RLC mode Payload sizes, bit Max data rate, bps UMD PDU header, bit MAC header, bit MAC multiplexing TrCH type TB sizes, bit TFS TF0, bits TF1, bits TF2, bits TF3, bits TTI, ms Coding type CRC, bit Max number of bits/TTI after channel coding Uplink: Max number of bits/radio frame before rate matching RM attribute
RAB 8 DTCH UM 920, 304, 96 46000 8 0 N/A DCH 928, 312, 104 0x928 1x104 1x312 1x928 20 TC 16 2844 1422 180-220
ETSI
85
interface transmission will be simulated according to the settings given in Section 0. They consist of binary decisions for each transmitted RLC PDU, resulting in a certain BLER. After the error pattern insertion, the RLC of the air interface receiver site receives RLC PDUs in the reception buffer. The sequence numbers of the RLC headers are checked to detect when RLC PDUs have been discarded due to block errors. A discarded RLC PDU will result in one or more lost IP packets, resulting in a certain packet loss rate of the IP packets and thereby in a certain FER of the AMR frames. The IP/UDP/RTP/AMR packets are reassembled and transmitted to the next PC. This PC is either the network simulator (PC 3) in case of uplink transmission, or it is one of the terminals (PC 1 or 5) in case of downlink transmission.
RLC Layer Reassembly Transmission Buffer IP/UDP/RTP/AMR packet IP/UDP/RTP/AMR packets Remove RLC header
ETSI
86
Tap
1 2 3 4 5 6
Channel A Rel. Delay Avg. Power (dB) (nsec) 0 0.0 310 -1.0 710 -9.0 1090 -10.0 1730 -15.0 2510 -20.0
Tap
1 2 3 4 5 6
Channel A Rel. Delay (nsec) Avg. Power (dB) 0 0 110 -9.7 190 -19.2 410 -22.8 -
ETSI
87
Table 4 (DL) and Table 5 (UL) show approximate results of the air interface simulation for
I = 9 dB) Indoor, 3 km/h ( I or oc I = 9 dB) Outdoor to Indoor, 3 km/h ( I or oc I = -3 dB) Vehicular, 50 km/h ( I or oc I = -3 dB) Vehicular, 120 km/h ( I or oc
Indoor, 3 km/h Outdoor to Indoor, 3 km/h Vehicular, 50 km/h Vehicular, 120 km/h
Table 5: Uplink performance - approximately Eb/N0 for the different channels and BLER
3.2.4 Headsets and Sound Card To avoid echo problems, it has been decided to use headsets, instead of handsets. The monaural headsets are connected to the sound cards of the PCs supporting the AMR simulators. The sound level in the earphones can be adjusted, if needed, by the users. But, in practice, the original settings, defined during the preliminary tests, and producing a comfortable listening level, will not be modified. The microphones are protected by a foam ball in order to reduce the "pop" effect. It is also suggested to the user to avoid to place the acoustic opening of the microphone in front of the mouth. 3.2.5 Test environment Each of the two subjects participating to the conversations is installed in a test room. They sit on an armchair, in front of a table. The test rooms are acoustically insulated. All the test equipments are installed in a third room, connected to the test rooms. When needed, the background noise is generated in the appropriate test room through a set of 4 loudspeakers. The background noise level is adjusted and controlled by a sound level meter. The measurement microphone, connected to the Sound level meter is located at the equivalent of the center of the subject's head. The noise level is A weighted.
ETSI
88
3.2.6 Calibration and test conditions monitoring Speech level Before the beginning of a set of experiment, the end to end transmission level is checked subjectively, to ensure that there is no problem. If it is necessary to check the speech level following procedure will apply. An artificial mouth placed in front of the microphone of the Headset A, in the LRGP position -See ITU-T Rec. P.64-, generates in the artificial ear (according to ITU-T Rec. P57) coupled to the earphone of the Head set B the nominal level defined in section 4.3. If necessary, the level is adjusted with the receiving volume control of the headset. The similar calibration is done by inverting headsets A and B.
Delay The overall delay (from the input of sound card A to the output of sound card B) will be evaluated for each test condition. The hypothetical delay is calculated as shown : On the air interface side, the simulator only receives packets on its network card, process them and transmits every 20 ms these packets to the following PC. Only processing delay and a possible delay due to a jitter can be added (a packet arrives just after the sending window of the air interface). The hypothetical delay is calculated as shown : On encoder side, delay have to take into account framing, look-ahead, processing and packetization: 45ms Uplink delay between UE and Iu: 84.4 ms (see TR25.853) Core network delay: a few ms Routing through IP: depending on the number of routers. Downlink delay between Iu and Ue: 71.8 ms (see TR25.853) And delay on decoder side, taking into account jitter buffer, de-packetization and processing, 40 ms The total delay to be considered is at least: 241.2 ms 3.3 Test Conditions Based on circuit switched testing experiments, SA4 expects AMR 4.75 kb/s to provide insufficient quality for conversational applications. SA4 does not recommend testing AMR 4.75kb/s, this mode is considered as fall back solution in case of poor radio conditions.
Condition Additional Background noise Room A Additional Background noise Room B Experimental actors
Radio conditions
Mode + delay 6,7kbit/s (delay 300 ms) 12.2 kbit/s (delay 500 ms)
1 2
No No
No No
10 2 10 2
ETSI
89
Condition
Experimental actors
Radio conditions
Mode + delay 12,2 kbit/s (delay 300 ms) 6,7kbit/s (delay 300 ms) 12.2kbit/s(delay 500 ms) 12,2 kbit/s (delay 300 ms) 6,7kbit/s (delay 300 ms) 12.2kbit/s(delay 500 ms) 12,2 kbit/s (delay 300 ms) 6,7kbit/s (delay 300 ms) 12.2kbit/s(delay 500 ms) 12,2 kbit/s (delay 300 ms) 6,7kbit/s (delay 300 ms) 12.2kbit/s(delay 500 ms) 12,2 kbit/s (delay 300 ms) 6,7kbit/s (delay 300 ms) 12.kbit/s(delay 500 ms) 12,2 kbit/s (delay 300 ms) 12,2 kbit/s (delay 300 ms) 12,2 kbit/s (delay 300 ms) 6,7 kbit/s (delay 300 ms) 6,7 kbit/s (delay 300 ms) 12.2kbit/s(delay 500 ms) 12.2kbit/s(delay 500 ms)
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
No No No No No No No No No No No No No No No No Car
10 2 10 2 10 2 10 2 10-3 10-3 10-3 10-3 10-3 10-3 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4
No Cafeteria No Street No
ETSI
90
Noise types
Level (dB Pa A 60 55 50
1 32 16 5 1 1
See table Monaural headset (flat response in the audio bandwidth of interest: 50Hz-7kHz). The other ear is open. Room Noise: Hoth Spectrum at 30dBA (as defined by ITU-T, Recommendation P.800, Annex A, section A.1.1.2.2.1 Room Noise, with table A.1 and Figure A.1), except when background noise is needed (see table)
Listening Environment
ETSI
91
INSTRUCTIONS TO SUBJECTS
In this experiment we are evaluating systems that might be used for telecommunication services. You are going to have a conversation with another user. The test situation is simulating communications between two mobile phones. The most of the situations will correspond to silent environment conditions, but some other will simulate more specific situations, as in a car, or in a railway station or in an office environment, when other people are discussing in the background. After the completion of each call conversation, you will have to give your opinions on the quality, by answering to the following questions that will be displayed on the screen of the black box in front of you. Your judgment will be stored. You have 8 seconds to answer to each question. After "pressing" the button on the screen, another question will be displayed. You continue the procedure for the 5 following questions. Question 1: How do you judge the quality of the voice of your partner? Excellent Good Fair Poor Bad
Question 2: Do you have difficulties to understand some words? All the time Often Some time to time Rarely Never
Question 3: How did you judge the conversation when you interacted with your partner? Excellent interactivity (similar to face-toface situation) Good interactivity (in few moments, you were talking simultaneously, and you had to interrupt yourself) Fair interactivity (sometimes, you were talking simultaneously, and you had to interrupt yourself) Poor interactivity (often, you were talking simultaneously, and you had to interrupt yourself) Bad interactivity (it was impossible to have an interactive conversation)
Question 4: Did you perceive any impairment (noises, cuts,)? In that case, was it: No impairment Slight impairment, but not disturbing Impairment slightly disturbing Impairment disturbing Very disturbing Impairment
Question 5: How do you judge the global quality of the communication? Excellent Good Fair Poor Bad
From then on you will have a break approximately every 30 minutes. The test will last a total of approximately 60 minutes. Please do not discuss your opinions with other listeners participating in the experiment.
ETSI
92
Examples coming from ITU-T SG 12 COM12-35 "Development of scenarios for short conversation test", 1997
Scenario 1 : Pizza service Subject 1: Your Name : Clemence
Reason for the call Condition which should be applied to the exchange of information Information you want to receive from your partner Information that your partner requires Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
1 large Pizza For 2 people, Vegetarian pizza prefered Topping Price Delivery address : 41 industry street,Oxford Phone : 7 34 20 How long will it take?
Subject 2: Your Name : Information from which you should select the details which your partner requires
Pizzeria Roma
Pizzas Toscana (ham, mushrooms, tomatoes,cheese) Tonno (Tuna, onions, tomatoes, cheese) Fabrizio (salami, ham, tomatoes, heese) Vegetaria (spinach, mushrooms, tomatoes,cheese)
1 person
2 persons 5.95
4 persons 10.5
3.2
3.95
7.5
13.95
4.2
7.95
14.95
4.5
8.5
15.95
Information you want to receive from your partner Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
93
Reason for the call Condition which should be applied to the exchange of information Information you want to receive from your partner Information requires that your partner
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
Intended journey: London Heathrow Dsseldorf On June 23th, Morning flight, Direct flight preferred Departure : Arrival Flight number Reservation : 1 seat, Economy class Address: 66 middle street, Sheffield Phone: 21 08 33 From which airport is it easier to get into Cologne center : Dsseldorf or Cologne/Bonn
Subject 2: Your Name : Information from which you should select the details which your partner requires
Flight schedule Flight number London Heathrow departure Brussels arrival Brussels departure Dsseldorf arrival
7:35
9:25
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
94
Annex C: Results to be provided For contractual purposes, the information which needs to be provided is defined here. The information required from each test Laboratory is a table containing the following information for each of the conditions in the experiment: The "Mean Opinion Score (MOS)" obtained for all the subjects. When the conditions are symmetrical, the mean value is calculated from all the result for the two test rooms.. For the dissymmetric conditions, the mean is calculated on the two test conditions, each result cumulating the results obtained in each condition of background noise. The Standard Deviation of the "MOS" obtained for all the subjects, for each test condition. The specific statistical comparisons are specified in Annex C.
ETSI
95
Annex D: Data analysis and presentation of results D.1 Calculation of MOS and Standard Deviation
The (overall) MOS/DMOS for confounded subjects for condition C (Yc) can then be obtained from:
Yc = 1 T
Y
t =1
c,t
The standard deviation (S) for condition C, denoted as Sc, can be calculated as:
Sc =
1 L T 1
(X
t =1 l=1
c,l,t
Yc ) 2
Finally, the confidence interval (CI) at the (1-) level can be calculated for N = L T as:
CIc = (t 1 , N 1 ) Sc N
D.2
The test results should be reported by the test Laboratory and the Global Analysis Laboratory as follows: Calculate and tabulate "Mean Opinion Scores" for the (opinion scales, Standard Deviations and Confidence Intervals as shown in Table E.1. Table C.1 - Layout for presentation of test results. D.3 Thorough analysis
Two statistical analyses should be conducted on the data obtained with these subjective scales. The first analysis consists in a Multiple ANalysis OF VAriance (MANOVA), which globally indicates the possible effect of the experimental factors (i.e., different conditions). Then, a specific ANOVA should be run on each dependent variable (the five scales) to test if there is an effect of a specific experimental factor for a given subjective variable. In other words, these statistical analyses indicate if the differences observed between the MOS obtained for the different conditions are significant, for one given dependant variable (ANOVA) or for the whole of dependant variables (MANOVA). Finally, Pearson's linear correlations should be computed between the results of all subjective variables, to see which are those preponderant or dependent on others.
ETSI
96
Annex E: Test Plan for the AMR Wide-Band Packet Switched Conversation Test
Source: Title: Document for: Agenda Item: Siemens1, France Telecom2 Test Plan for the AMR Wide-Band Packet Switched Conversation Test Approval 14.1
1. Introduction This document contains the test plan of a conversation test for the Adaptive Multi-Rate Wide-Band (AMRWB) in Packet Switched network.
All the laboratories participating to this conversation test phase will use the same test plan just the language of the conversation would change. Even if the test rooms or the test equipments are not exactly the same in all the laboratories, the calibration procedures and the tests equipment characteristics and performance (as defined in this document) will guarantee the similarity of the test conditions. Section 2 gives references, conventions and contacts, section 3 details the test methodology, including test arrangement and test procedure, and section 4 defines the financial considerations. Annex A contains the instructions for the subjects participating to the conversation tests.
contact: ImreVarga
Imre.Varga @siemens.com Tel: +49 89 722 47537 Siemens AG, ICM MP Grillparzerstrasse 10a, 81675 Munich, Germany
contacts: Jean-Yves Monfort Jeanyves.monfort@francetelecom.com Tel : +33296053171 France Telecom T&I/R&D 2 avenue Pierre Marzin, 22397 Lannion, France
Catherine Quinquis Catherine.quinquis@francetelecom.com Tel: +33 2 96 05 14 93 France Telecom T&I/R&D 2 avenue Pierre Marzin, 22397 Lannion, France
ETSI
97
Annex B contains the description of results to be provided to the Analysis Laboratory (if any) by the testing laboratories. Annex C contains the list of statistical comparisons to be performed. Considerations about IPV6 versus IPV4 are given in section 3.2. RoHC is implemented for AMR-WB conversation test, but only for the AMR-WB mode at 12,65 kbit/s 2. References, Conventions, and Contacts 2.1Permanent Documents ITU-T Rec.P.800 Methods for Subjective Determination of Transmission Quality
This Recommendation defines conversation test procedures based on handset telephones, and gives inputs for the calibration.
AMR WB Adaptive Multi-Rate Wide-band Speech Codec MOS Mean Opinion Score
2.3 Contact Names The following persons should be contacted for questions related to the test plan.
Section
Contact Person/Email
AOB
ETSI MCC
2.4 Responsibilities Each test laboratory has the responsibility to organize its conversation tests.
ETSI
98
Lab 1 2
Language
Statistical analysis
Reporting
3. Test methodology 3.1 Introduction The protocol described below evaluates the effect of degradation such as delay and dropped packets on the quality of the communications. It corresponds to the conversation-opinion tests recommended by the ITU-T P.800 [1]. First of all, conversationopinion tests allow subjects passing the test to be in a more realistic situation, close to the actual service conditions experienced by telephone customers. In addition, conversation-opinion tests are suited to assess the effects of impairments that can cause difficulty while conversing (such as delay). Subjects participate to the test by couple; they are seated in separate sound-proof rooms and are asked to hold a conversation through the transmission chain of the UMTS simulator Communications are impaired by means of an IP impairments simulator simulator part of the CN simulator and by the air interface simulator, as the figure below describes it. The network configurations (including the terminal equipments) will be symmetrical (in the two transmission paths). The only dissymmetry will be due to presence of background noise in one of the test rooms. 3.2 Test arrangement 3.2.1 Description of the proposed testing system
This contribution describes a UMTS simulator for the characterization of the AMR speech codecs when the bitstream is transmitted over a PS network. The procedure to do the conversational listening test has been earlier described in [1]. Figure 1 describes the system that is going to be simulated:
ETSI
99
PC 1 and PC 5: PC 2 and PC 4: PC 3:
PCs under Windows OS with VOIP Terminal Simulator Software of France Telecom R&D. PCs under Linux OS with Air Interface Simulator of Siemens AG. PCs under WinNT OS with Network Simulator Software (NetDisturb).
Basic Principles : The platform simulates a packet switch interactive communication between two users using PC1 and PC5 as their relatives VOIP terminals. PC1 sends AMR encoded packets that are encapsulated using IP/UDP/RTP headers to PC5. PC1 receives these IP/UDP/RTP audio packets from PC5.
ETSI
100
In fact, the packets created in PC1 are sent to PC2. PC2 simulates the air interface Up Link transmission and then forwards the transmitted packets to PC4. In the same way, PC4 simulates the air interface Down Link transmission and then forwards the packets to PC5. PC5 decodes and plays the speech back to the listener.
On both links, one can choose delay and loss laws. Both links can be treated separately or on the same way. For example, delay can be set to a fixed value but can also be set to another law such as exponential law.
ETSI
101
RTP Payload Format for AMR-WB (RFC 3267) will be used; Bandwidth efficient mode will be used; One speech frame shall be encapsulated in each RTP packet; Interleaving will not be used;
The payload header will then consist of the 4 bits of the CMR (Codec Mode Request). Then 6 bits are added for the ToC (Table of Content). For IPv4 a maximum of 81 bytes (41 bytes for the AMR and its payload header plus the 40 bytes of the IP/UDP/RTP headers) per frame will be transmitted that is to say 32.4 kbit/s, this will go up to 101 bytes (40.4 kbit/s) when using IPv6 protocol on the air interface. ROHC algorithm will be supported for AMR-WB conversation test, for the 12.65 kbit/s mode and the 15.85 mode. Header compression will be done on the IP/UDP/RTP headers. ROHC will start in the unidirectional mode and switch to bidirectional mode as soon as a packet has reached the decompressor and it has replied with a feedback packet indicating that a mode transition is desired. The Conversational / Speech / UL:42.8 DL:42.8 kbps / PS RAB RAB coming from TS 34.108 v4.7.0 will be used: Here is the RAB description:
MAC Layer 1
RAB/Signalling RB PDCP header size, bit Logical channel type RLC mode Payload sizes, bit Max data rate, bps UMD PDU header, bit MAC header, bit MAC multiplexing TrCH type TB sizes, bit TFS TF0, bits TF1, bits TF2, bits TF3, bits TTI, ms Coding type CRC, bit Max number of bits/TTI after channel coding Uplink: Max number of bits/radio frame before rate matching RM attribute
RAB 8 DTCH UM 920, 304, 96 46000 8 0 N/A DCH 928, 312, 104 0x928 1x104 1x312 1x928 20 TC 16 2844 1422 180-220
ETSI
102
After the error pattern insertion, the RLC of the air interface receiver site receives RLC PDUs in the reception buffer. The sequence numbers of the RLC headers are checked to detect when RLC PDUs have been discarded due to block errors. A discarded RLC PDU will result in one or more lost IP packets, resulting in a certain packet loss rate of the IP packets and thereby in a certain FER of the AMR frames. The IP/UDP/RTP/AMR packets are reassembled and transmitted to the next PC. This PC is either the network simulator (PC 3) in case of uplink transmission, or it is one of the terminals (PC 1 or 5) in case of downlink transmission.
RLC Layer Reassembly Transmission Buffer IP/UDP/RTP/AMR packet IP/UDP/RTP/AMR packets Remove RLC header
ETSI
103
Tap
1 2 3 4 5 6
Channel A Rel. Delay Avg. Power (dB) (nsec) 0 0.0 310 -1.0 710 -9.0 1090 -10.0 1730 -15.0 2510 -20.0
Tap
1 2 3 4 5 6
Channel A Rel. Delay (nsec) Avg. Power (dB) 0 0 110 -9.7 190 -19.2 410 -22.8 -
Table 4 (DL) and Table 5 (UL) show approximate results of the air interface simulation for
ETSI
104
I = 9 dB) Indoor, 3 km/h ( I or oc I = 9 dB) Outdoor to Indoor, 3 km/h ( I or oc I = -3 dB) Vehicular, 50 km/h ( I or oc I = -3 dB) Vehicular, 120 km/h ( I or oc
Indoor, 3 km/h Outdoor to Indoor, 3 km/h Vehicular, 50 km/h Vehicular, 120 km/h
Table 5: Uplink performance - approximately Eb/N0 for the different channels and BLER
3.2.4Headsets and Sound Card To avoid echo problems, it has been decided to use headsets, instead of handsets. The monaural headsets are connected to the sound cards of the PCs supporting the AMR simulators. The sound level in the earphones can be adjusted, if needed, by the users. But, in practice, the original settings, defined during the preliminary tests, and producing a comfortable listening level, will not be modified. The microphones are protected by a foam ball in order to reduce the "pop" effect. It is also suggested to the user to avoid to place the acoustic opening of the microphone in front of the mouth. 3.2.5 Test environment Each of the two subjects participating to the conversations is installed in a test room. They sit on an armchair, in front of a table. The test rooms are acoustically insulated. All the test equipments are installed in a third room, connected to the test rooms. When needed, the background noise is generated in the appropriate test room through a set of 4 loudspeakers. The background noise level is adjusted and controlled by a sound level meter. The measurement microphone, connected to the Sound level meter is located at the equivalent of the center of the subject's head. The noise level is A weighted.
3.2.6 Calibration and test conditions monitoring Speech level Before the beginning of a set of experiment, the end to end transmission level is checked subjectively, to ensure that there is no problem. If it is necessary to check the speech level following procedure will apply.
ETSI
105
An artificial mouth placed in front of the microphone of the Headset A, in the LRGP position -See ITU-T Rec. P.64-, generates in the artificial ear (according to ITU-T Rec. P57) coupled to the earphone of the Head set B the nominal level defined in section 4.3. If necessary, the level is adjusted with the receiving volume control of the headset. The similar calibration is done by inverting headsets A and B. Delay The overall delay (from the input of sound card A to the output of sound card B) will be evaluated for each test condition. The hypothetical delay is calculated as shown : On the air interface side, the simulator only receives packets on its network card, process them and transmits every 20 ms these packets to the following PC. Only processing delay and a possible delay due to a jitter can be added (a packet arrives just after the sending window of the air interface). The hypothetical delay is calculated as shown : On encoder side, delay have to take into account framing, look-ahead, processing and packetization: 45ms Uplink delay between UE and Iu: 84.4 ms (see TR25.853) Core network delay: a few ms Routing through IP: depending on the number of routers. Downlink delay between Iu and Ue: 71.8 ms (see TR25.853) And delay on decoder side, taking into account jitter buffer, de-packetization and processing, 40 ms The total delay to be considered is at least: 241.2 ms.
ETSI
106
Condition Radio conditions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 10 2 10 2 10 2 10 2 10 2 10 2 10-3 10-3 10-3 10-3 10-3 10-3 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4 5 10-4
12,65 kbit/s, RoHC 12,65 kbit/s 15,85 kbit/s, RoHC 12,65 kbit/s, RoHC 12,65 kbit/s 15,85 kbit/s, RoHC 12,65 kbit/s, RoHC 12,65 kbit/s 15,85 kbit/s, RoHC 12,65 kbit/s, RoHC 12,65 kbit/s 15,85 kbit/s, RoHC 12,65 kbit/s, RoHC 12,65 kbit/s 15,85 kbit/s, RoHC 12,65 kbit/s, RoHC 12,65 kbit/s 15,85 kbit/s, RoHC
Condition
Experimental actors
Mode
ETSI
107
20
No
Car
5 10-4
3%
12,65 kbit/s, RoHC 12,65 kbit/s 12,65 kbit/s 15,85 kbit/s, RoHC 15,85 kbit/s, RoHC
21 22 23
Cafeteria No Street
No Cafeteria No
0% 0% 0%
24
No
Street
0%
Noise types
Noise type
Level (dB Pa A
60 55 50
1 32 16 5 1 1
See table Monaural headset (flat response in the audio bandwidth of interest: 50Hz-7kHz). The other ear is open. Room Noise: Hoth Spectrum at 30dBA (as defined by ITU-T, Recommendation P.800, Annex A, section A.1.1.2.2.1 Room Noise, with table A.1 and Figure A.1),except when background noise is needed (see table)
Listening Environment
ETSI
108
INSTRUCTIONS TO SUBJECTS
In this experiment we are evaluating systems that might be used for telecommunication services. You are going to have a conversation with another user. The test situation is simulating communications between two mobile phones. The most of the situations will correspond to silent environment conditions, but some other will simulate more specific situations, as in a car, or in a railway station or in an office environment, when other people are discussing in the background. After the completion of each call conversation, you will have to give your opinions on the quality, by answering to the following questions that will be displayed on the screen of the black box in front of you. Your judgment will be stored. You have 8 seconds to answer to each question. After "pressing" the button on the screen, another question will be displayed. You continue the procedure for the 5 following questions. Question 1: How do you judge the quality of the voice of your partner? Excellent Good Fair Poor Bad
Question 2: Do you have difficulties to understand some words? All the time Often Some time to time Rarely Never
Question 3: How did you judge the conversation when you interacted with your partner? Excellent interactivity (similar to face-toface situation) Good interactivity (in few moments, you were talking simultaneously, and you had to interrupt yourself) Fair interactivity (sometimes, you were talking simultaneously, and you had to interrupt yourself) Poor interactivity (often, you were talking simultaneously, and you had to interrupt yourself) Bad interactivity (it was impossible to have an interactive conversation)
Question 4: Did you perceive any impairment (noises, cuts,)? In that case, was it: No impairment Slight impairment, but not disturbing Impairment slightly disturbing Impairment disturbing Very disturbing Impairment
Question 5: How do you judge the global quality of the communication? Excellent Good Fair Poor Bad
From then on you will have a break approximately every 30 minutes. The test will last a total of approximately 60 minutes. Please do not discuss your opinions with other listeners participating in the experiment.
ETSI
109
Examples coming from ITU-T SG 12 COM12-35 "Development of scenarios for short conversation test", 1997
Scenario 1 : Pizza service Subject 1:
Your Name : Reason for the call Condition which should be applied to the exchange of information Information you want to receive from your partner Information that your partner requires Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
Clemence 1 large Pizza For 2 people, Vegetarian pizza prefered Topping Price Delivery address : 41 industry street,Oxford Phone : 7 34 20 How long will it take?
Subject 2: Your Name : Information from which you should select the details which your partner requires Pizzeria Roma
Pizzas Toscana (ham, mushrooms, tomatoes,cheese) Tonno (Tuna, onions, tomatoes, cheese) Fabrizio (salami, ham, tomatoes, heese) Vegetaria (spinach, mushrooms, tomatoes,cheese)
1 person
2 persons 5.95
4 persons 10.5
3.2
3.95
7.5
13.95
4.2
7.95
14.95
4.5
8.5
15.95
Information you want to receive from your partner Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
110
Your Name : Reason for the call Condition which should be applied to the exchange of information Information you want to receive from your partner Information requires that your partner
Parker Intended journey: London Heathrow Dsseldorf On June 23th, Morning flight, Direct flight preferred Departure : Arrival Flight number Reservation : 1 seat, Economy class Address: 66 middle street, Sheffield Phone: 21 08 33 From which airport is it easier to get into Cologne center : Dsseldorf or Cologne/Bonn
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
Subject 2: Your Name : Information from which you should select the details which your partner requires Heathrow flight information
Flight schedule Flight number London Heathrow departure Brussels arrival Brussels departure Dsseldorf arrival
7:35
9:25
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
111
Annex C: Results to be provided For contractual purposes, the information which needs to be provided is defined here. The information required from each test Laboratory is a table containing the following information for each of the conditions in the experiment: The "Mean Opinion Score (MOS)" obtained for all the subjects. When the conditions are symmetrical, the mean value is calculated from all the result for the two test rooms.. For the dissymmetric conditions, the mean is calculated on the two test conditions, each result cumulating the results obtained in each condition of background noise. The Standard Deviation of the "MOS" obtained for all the subjects, for each test condition. The specific statistical comparisons are specified in Annex C.
ETSI
112
Annex D: Data analysis and presentation of results D.1 Calculation of MOS and Standard Deviation
The (overall) MOS/DMOS for confounded subjects for condition C (Yc) can then be obtained from:
Yc = 1 T
Y
t =1
c,t
The standard deviation (S) for condition C, denoted as Sc, can be calculated as:
Sc = 1 L T 1
(X
t =1 l=1
c,l,t
Yc ) 2
Finally, the confidence interval (CI) at the (1-) level can be calculated for N = L T as:
CIc = (t 1 , N 1 ) Sc N
D.2
The test results should be reported by the test Laboratory and the Global Analysis Laboratory as follows: Calculate and tabulate "Mean Opinion Scores" for the (opinion scales, Standard Deviations and Confidence Intervals as shown in Table E.1.
Table C.1 - Layout for presentation of test results. D.3 Thorough analysis
Two statistical analyses should be conducted on the data obtained with these subjective scales. The first analysis consists in a Multiple ANalysis OF VAriance (MANOVA), which globally indicates the possible effect of the experimental factors (i.e., different conditions). Then, a specific ANOVA should be run on each dependent variable (the five scales) to test if there is an effect of a specific experimental factor for a given subjective variable. In other words, these statistical analyses indicate if the differences observed between the MOS obtained for the different conditions are significant, for one given dependant variable (ANOVA) or for the whole of dependant variables (MANOVA). Finally, Pearson's linear correlations should be computed between the results of all subjective variables, to see which are those preponderant or dependent on others.
ETSI
113
Annex F: Test plan for Packet Switched Conversation Tests for Comparison of Quality Offered by Different Speech Coders
Source: Title: Document For: Agenda Item:
Introduction This document proposes a conversation test plan to compare the quality obtained with several different speech coders, over packet switched networks. The different speech coders used in this test are Adaptive Multi-Rate Narrow-Band (AMR-NB), in modes 6.7 kbit/s and 12.2 kbit/s, Adaptive Multi-Rate Wide-Band (AMR-WB), in modes 12.65 kbit/s and 15.85 kbit/s, ITU-T G.723.1, in mode 6.4 kbit/s, ITU-T G.729, in mode 8 kbit/s, ITU-T G.722, in mode 64 kbit/s, with packet loss concealment and, ITU-T G.711, with packet loss concealment. As there is no standardized packet loss concealment, plc for G.711 and G.722 are proprietary algorithms. The simulated network will include two values of IP packet loss. The test will be done in one test laboratory, only, but in two different languages. This discussion gives references, conventions and contacts, section 3 details the test methodology, including test arrangement and test procedure, Annex A contains the instructions for the subjects participating to the conversation tests. Annex B contains the description of results to be provided to the Analysis Laboratory (if any) by the testing laboratories. Annex C contains the list of statistical comparisons to be performed.
France Telecom R&D Test plan for packet switched conversation test. Comparison of quality offered by different speech coders. Discussion and Approval
ETSI
114
2. References, Conventions, and Contacts 2.1Permanent Documents ITU-T Rec.P.800 Methods for Subjective Determination of Transmission Quality
Pulse code modulation (PCM) of voice frequencies Coding of speech at8 kbit/s using conjugatestructure algebraic-code-excited linearprediction (CS-ACELP) multimedia communications transmitting at 5.3 and 6.3 kbit/s
ITU-T Rec. G.723.1 Speech coders : Dual rate speech coder for
AMR-WB Adaptive Multi-Rate Wide-band Speech Codec MOS Mean Opinion Score
2.3 Contact Names The following persons should be contacted for questions related to the test plan.
Section Contact Person/Email Organisation Address 2, Avenue P. Marzin, 22307 Lannion Cdex France AOB Paolo Usai paolo.usai@etsi.fr ETSI MCC 650 Route des Lucioles 06921 Sophia Antipolis Cedex France Tel: 33 (0)4 92 94 42 36 Fax: 33 (0)4 93 65 28 17 Telephone/Fax Tel : +3329605 0720 Fax : +33296051316
2.4 Responsibilities Each test laboratory has the responsibility to organize its conversation tests.
ETSI
115
The list of Test laboratories participating to the conversation test phase. Lab Company 1 France Telecom R&D France Telecom R&D Language French Arabic
3. Test methodology 3.1 Introduction The protocol described below evaluates the effect of degradation such as delay and dropped packets on the quality of the communications. It corresponds to the conversation-opinion tests recommended by the ITU-T P.800 [1]. First of all, conversationopinion tests allow subjects passing the test to be in a more realistic situation, close to the actual service conditions experienced by telephone customers. In addition, conversation-opinion tests are suited to assess the effects of impairments that can cause difficulty while conversing (such as delay). Subjects participate to the test by couple; they are seated in separate sound-proof rooms and are asked to hold a conversation through the transmission chain performed by means of networks simulators and communications are impaired by means of an IP impairments simulator part of the CN simulator, as the figure below describes it. 3.2 Test arrangement 3.2.1 Description of the proposed testing system
This contribution describes a networks simulator for the characterization of the different speech codecs when the bitstream is transmitted over a PS network. The procedure to do the conversational listening test has been earlier described in [1]. Figure 1 describes the system that is going to be simulated:
ETSI
116
PC 1 and PC 5: PC 3:
PCs under Windows OS with VOIP Terminal Simulator Software of France Telecom R&D. PCs under WinNT OS with Network Simulator Software (NetDisturb).
Basic Principles: The platform simulates a packet switch interactive communication between two users using PC1 and PC5 as their relatives VOIP terminals. PC1 sends encoded packets that are encapsulated using IP/UDP/RTP headers to PC5. PC1 receives these IP/UDP/RTP audio packets from PC5.
ETSI
117
Figure 3 shows the possible parameters that can be modified, but, in this test, only "loss Law" will have two values, all the others settings being fixed.
On both links, one can choose delay and loss laws. Both links can be treated separately or on the same way. For example, delay can be set to a fixed value but can also be set to another law such as exponential law.
3.2.3 Headsets and Sound Card To avoid echo problems, it has been decided to use headsets, instead of handsets. The monaural headsets are connected to the sound cards of the PCs supporting the AMR simulators. The sound level in the earphones can be adjusted, if needed, by the users. But, in practice, the original settings, defined during the preliminary tests, and producing a comfortable listening level, will not be modified. The microphones are protected by a foam ball in order to reduce the "pop" effect. It is also suggested to the user to avoid to place the acoustic opening of the microphone in front of the mouth. 3.2.4 Test environment Each of the two subjects participating to the conversations is installed in a test room. They sit on an armchair, in front of a table. The test rooms are acoustically insulated. All the test equipments are installed in a third room, connected to the test rooms. The background noise level is checked by a sound level meter. The measurement microphone, connected to the Sound level meter is located at the equivalent of the center of the subject's head. The noise level is A weighted. 3.2.5 Calibration and test conditions monitoring Speech level Before the beginning of a set of experiment, the end to end transmission level is checked subjectively, to ensure that there is no problem. If it is necessary to check the speech level following procedure will apply. An artificial mouth placed in front of the microphone of the Headset A, in the LRGP position -See ITU-T Rec. P.64-, generates in the artificial ear (according to ITU-T Rec. P57) coupled to the earphone of the Head set B the nominal level defined in section 4.3. If necessary, the level is adjusted with the receiving volume control of the headset. The similar calibration is done by inverting headsets A and B.
ETSI
118
Delay The overall delay (from the input of sound card A to the output of sound card B) will be adjusted for each test condition taking into account the delay of the related codec in order to have a fixed delay around 250ms. This value of 250ms is close to the hypothetical delay computed for AMR and AMRWB through the UMTS network. 3.3 Test Conditions
Condition Experimental actors
Mode
0% 0% 0% 0% 0% 0% 0% 0% 3% 3% 3% 3% 3% 3% 3% 3%
AMR NB 6,7kbit/s
AMR-NB 12,2 kbit/s
AMR-WB
12,65 kbit/s
AMR-WB 15,85 kbit/s G. 723.1 6,4 kbit/s G.729 8 kbit/s G.722 64 kbit/s + plc G.711 + plc AMR NB 6,7kbit/s
AMR-NB 12,2 kbit/s
12,65 kbit/s
AMR-WB 15,85 kbit/s G. 723.1 6,4 kbit/s G.729 8 kbit/s G.722 64 kbit/s + plc G.711 + plc
ETSI
119
1 32 16 5 1 1
See table Monaural headset (flat response in the audio bandwidth of interest: 50Hz-7kHz). The other ear is open. Room Noise: Hoth Spectrum at 30dBA (as defined by ITU-T, Recommendation P.800, Annex A, section A.1.1.2.2.1 Room Noise, with table A.1 and Figure A.1),
Listening Environment
References
Tdoc S4-030564- Test Plan for the AMR Narrow-Band Packet switched Conversation test
Tdoc S4-030565- Test Plan for the AMR Wide-Band Packet switched Conversation test
END
ETSI
120
INSTRUCTIONS TO SUBJECTS
In this experiment we are evaluating systems that might be used for telecommunication services. You are going to have a conversation with another user. The test situation is simulating communications between two mobile phones. All the situations will correspond to silent environment condition After the completion of each call conversation, you will have to give your opinions on the quality, by answering to the following questions that will be displayed on the screen of the black box in front of you. Your judgment will be stored. You have 8 seconds to answer to each question. After "pressing" the button on the screen, another question will be displayed. You continue the procedure for the 5 following questions.
Question 1: How do you judge the quality of the voice of your partner? Excellent Good Fair Poor Bad
Question 2: Do you have difficulties to understand some words? All the time Often Some time to time Rarely Never
Question 3: How did you judge the conversation when you interacted with your partner? Excellent interactivity (similar to face-toface situation) Good interactivity (in few moments, you were talking simultaneously, and you had to interrupt yourself) Fair interactivity (sometimes, you were talking simultaneously, and you had to interrupt yourself) Poor interactivity (often, you were talking simultaneously, and you had to interrupt yourself) Bad interactivity (it was impossible to have an interactive conversation)
Question 4: Did you perceive any impairment (noises, cuts,)? In that case, was it: No impairment Slight impairment, but not disturbing Impairment slightly disturbing Impairment disturbing Very disturbing Impairment
Question 5: How do you judge the global quality of the communication? Excellent Good Fair Poor Bad
From then on you will have a break approximately every 30 minutes. The test will last a total of approximately 60 minutes. Please do not discuss your opinions with other listeners participating in the experiment.
ETSI
121
Examples coming from ITU-T SG 12 COM12-35 "Development of scenarios for short conversation test", 1997
Subject 1: Your Name : Reason for the call Condition which should be applied to the exchange of information Information you want to receive from your partner Information that your partner requires Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you. Clemence 1 large Pizza For 2 people, Vegetarian pizza prefered Topping Price Delivery address : 41 industry street,Oxford Phone : 7 34 20 How long will it take?
Subject 2: Your Name : Information from which you should select the details which your partner requires Pizzeria Roma
Pizzas Toscana (ham, mushrooms, tomatoes,cheese) Tonno (Tuna, onions, tomatoes, cheese) Fabrizio (salami, ham, tomatoes, heese) Vegetaria (spinach, mushrooms, tomatoes,cheese)
1 person
2 persons 5.95
4 persons 10.5
3.2
3.95
7.5
13.95
4.2
7.95
14.95
4.5
8.5
15.95
Information you want to receive from your partner Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
122
Your Name : Reason for the call Condition which should be applied to the exchange of information Information you want to receive from your partner Information requires that your partner
Parker Intended journey: London Heathrow Dsseldorf On June 23th, Morning flight, Direct flight preferred Departure : Arrival Flight number Reservation : 1 seat, Economy class Address: 66 middle street, Sheffield Phone: 21 08 33 From which airport is it easier to get into Cologne center : Dsseldorf or Cologne/Bonn
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
Subject 2: Your Name : Information from which you should select the details which your partner requires Heathrow flight information
Flight schedule Flight number London Heathrow departure Brussels arrival Brussels departure Dsseldorf arrival
7:35
9:25
Question to which neither you nor your partner will have information. You should discuss and find a solution that is acceptable to both of you.
ETSI
123
Annex C: Results to be provided For contractual purposes, the information which needs to be provided is defined here. The information required from each test Laboratory is a table containing the following information for each of the conditions in the experiment: The "Mean Opinion Score (MOS)" obtained for all the subjects. When the conditions are symmetrical, the mean value is calculated from all the result for the two test rooms.. For the dissymmetric conditions, the mean is calculated on the two test conditions, each result cumulating the results obtained in each condition of background noise. The Standard Deviation of the "MOS" obtained for all the subjects, for each test condition. The specific statistical comparisons are specified in Annex C.
ETSI
124
Annex D: Data analysis and presentation of results D.1 Calculation of MOS and Standard Deviation
The (overall) MOS/DMOS for confounded subjects for condition C (Yc) can then be obtained from:
Yc = 1 T
Y
t =1
c,t
The standard deviation (S) for condition C, denoted as Sc, can be calculated as:
Sc =
1 L T 1
(X
t =1 l=1
c,l,t
Yc ) 2
Finally, the confidence interval (CI) at the (1-) level can be calculated for N = L T as:
CIc = (t 1 , N 1 )
Sc N
D.2
The test results should be reported by the test Laboratory and the Global Analysis Laboratory as follows: Calculate and tabulate "Mean Opinion Scores" for the (opinion scales, Standard Deviations and Confidence Intervals as shown in Table E.1. Table C.1 - Layout for presentation of test results. D.3 Thorough analysis
Two statistical analyses should be conducted on the data obtained with these subjective scales. The first analysis consists in a Multiple ANalysis OF VAriance (MANOVA), which globally indicates the possible effect of the experimental factors (i.e., different conditions). Then, a specific ANOVA should be run on each dependent variable (the five scales) to test if there is an effect of a specific experimental factor for a given subjective variable. In other words, these statistical analyses indicate if the differences observed between the MOS obtained for the different conditions are significant, for one given dependant variable (ANOVA) or for the whole of dependant variables (MANOVA). Finally, Pearson's linear correlations should be computed between the results of all subjective variables, to see which are those preponderant or dependent on others.
ETSI
125
1. Introduction
This contribution presents a proposal for conducting a Global Analysis of the results derived from the 3GPP Conversation Tests for Packet Switched (PS) networks. Phase I of these tests are described in two test plans -- S4030564 for conversation tests using the Adaptive Multi-Rate Narrow-Band (AMR-NB) codec, S4-030565 for conversation tests using the Adaptive Multi-Rate Wide-Band (AMR-WB) codec. The test plan for the Phase II tests are described in S4-030747 for conversation tests comparing various ITU-T standardized speech codecs. The Phase I test plans specify similar experimental designs involving 24 test conditions and 16 pairs of subjects. They also specify that three Listening Laboratories (LL) will conduct the tests in different languages: Arcon for North American English (NAE), NTT-AT for Japanese, and France Telecom for French. The Phase II test plan involves 16 conditions and a single Listening Lab (France Telecom) conducting the test in two languages (French and Arabic).
ETSI
126
Test conditions 1-18 are symmetrical in that both subjects in a conversation-pair are listening in quiet (i.e., no noise) rooms. Conditions 19-24, on the other hand, are asymmetrical, one subject is listening in a quiet room, the other in a noisy room. Conditions 1-18 are categorized by four experimental factors:
o o o o
Delay 300 msec and 500 msec AMR-NB mode (rate) 6.7 kbps and 12.2 kbps Packet Loss 0% and 3% -2 -3 -4 Radio conditions 10 , 10 , and 5x10
These conditions can be assigned to two factorial designs for analysing the effects of three of these factors. Table 2 shows the conditions involved in the two three-factor analyses for the AMR-NB experiments. Using the 12 conditions shown in Table 2a, the effects of Rate, Radio Conditions, and Packet Loss can be evaluated (Delay held constant at 300 msec). Using the 12 conditions shown in Table 2b, the effects of Delay, Radio Conditions, and Packet Loss can be evaluated (Rate held constant at 12.2 kbps).
ETSI
127
The three sets of paired conditions involving noise (i.e., conditions 19/20, 21/22, and 23/24) can be used to compare the effects of sender in noise/receiver in quiet with those for sender in quiet/receiver in noise for the three noise environments.
RoHC present and absent AMR-NB mode (rate) 6.7 kbps and 12.2 kbps Packet Loss 0% and 3% -2 -3 -4 Radio conditions 10 , 10 , and 5x10
ETSI
128
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC RoHC
RoHC RoHC
Consistent with the AMR-NB tests, conditions 1-18 can be assigned to two factorial designs for analysing the effects of three of these factors. Table 4 shows the conditions involved in the two three-factor analyses for the AMR-WB experiments. Using the 12 conditions shown in Table 4a, the effects of Rate, Radio Conditions, and Packet Loss can be evaluated (RoHC present in all conditions). Using the 12 conditions shown in Table 4b, the effects of RoHC, Radio Conditions, and Packet Loss can be evaluated (Rate held constant at 12.65 kbps).
ETSI
129
Again, consistent with the tests for AMR-NB, the three sets of paired conditions involving noise (i.e., conditions 19/20, 21/22, and 23/24) can be used to compare the effects of sender in noise/receiver in quiet with those for sender in quiet/receiver in noise for the three noise environments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Codec, Mode AMR-NB, 6.7kbit/s AMR-NB, 12.2kbit/s AMR-WB, 12.65kbit/s AMR-WB, 15.85kbit/s G. 723., 6.4 kbit/s G.729, 8kbit/s G.722, 64 kbit/s + plc G.711 + plc AMR-NB, 6.7kbit/s AMR-NB, 12.2 kbit/s AMR-WB, 12.65kbit/s AMR-WB, 15.85kbit/s G. 723.1, 6.4 kbit/s G.729, 8kbit/s G.722, 64 kbit/s + plc G.711 + plc
5. Global Analyses
The purpose of the Global Analysis task is to bring together the results from the different Listening Labs/languages (Phase I - NAE, French, Japanese; Phase II French, Arabic) and combine them, where appropriate, such that conclusions may be drawn about the performance of the AMR-NB and AMR-WB codecs in packet switched networks. This task is complicated by the fact that in the conversation tests multiple criterion measures are collected for each condition. In the tests involved here, listeners are required to rate each condition on five aspects of the communication situation:
ETSI
130
o o o o o
Quality of the voice of their partner Difficulty of understanding words Quality of interaction with their partner Degree of impairments Global communication quality
Each of these criteria is measured using ratings on five-category rating scales. Each criterion also represents a separate independent variable which must be evaluated in a Global Analysis. The appropriate analysis for this situation is a Multivariate Analysis of Variance (MANOVA). The first step in MANOVA involves an omnibus test for the combination of all independent variables. A number of statistical techniques may be employed in MANOVA to determine whether the independent variables are measuring different or the same underlying variable. Other techniques, discriminant analysis in particular, determine the contribution provided by each independent variable to a composite variable that maximally separates the data on the dependent variables. The omnibus MANOVA test is then followed by separate Analyses of Variance (ANOVA) for each independent variable. The F-ratios for the individual ANOVAs are adjusted (Bonferroni) to account for the fact that multiple tests are being performed. It is proposed here to perform MANOVAs and the associated univariate ANOVAs separately for each of the six experiments (AMR-NB and AMRWB from each of the three listening labs). Examination of the results of these analyses will determine if there is a single composite independent variable for each experiment and whether these composites are similar across experiments and across listening labs. The results of these analyses will determine whether it is appropriate to combine the results across listening labs. Pearsons correlation coefficients will be computed to identify and illustrate the inter-relationships among the dependent variables. If the results can be legitimately combined across listening labs, a nested ANOVA for Conditions and Listening Labs will be conducted separately for each codec, AMR-NB and AMR-WB. Table 5 shows a generalized Source Table for the appropriate ANOVA with the effects of Listening Labs nested within the effects of Subjects. One task of the Global Analysis exercise will be to provide an Excel spreadsheet to the individual Listening Labs for delivery of the raw ratings. The Global Analysis task will also include a comprehensive report containing the results of the various statistical analyses described above. Dynastat will present the final report at the February 2004 meeting of 3GPP-SA4.
Effect (Source of Variation) Conditions Subjects Listening Labs (LL) Subjects within LL (SwLL) Conditions x Subjects Conditions x LL Conditions x SwLL Total F Ratio MS Cond / MS Cond x SwLL --MS LL / MS ----MS Cond x LL / MS Cond x SwLL ----SwLL
Table 6. Generalized ANOVA Source Table for Combining Results across Listening Labs.
6. References
S4-030564 S4-030565 S4-030747 coders. Test Plan for the AMR Narrow-Band Packet Switched Conversation Test Test Plan for the AMR Wide-Band Packet Switched Conversation Test Test plan for Packet Switched Conversation Test. Comparison of quality offered by different speech
ETSI
131
Annex H: Test Plan for Performance characterisation of VoIMS over HSDPA/EUL channels; listening only tests
H.1 Introduction
This annex describes subjective evaluation methods for characterising the overall performance of VoIMS over HSDPA/EUL radio channels. The main purpose is to evaluate and verify adequate subjective performance of the AMR and AMR-WB speech codecs defined in TS 26.114. The VoIMS performance characterisation for HSDPA/EUL channels consists of subjective evaluation with listeningonly and conversation test methodology. The former evaluates the basic subjective quality of the selected speech codecs when conducting buffer adaptation to the network delay variations. Listening-only tests are further completed with overall delay analysis. The latter is verifying the effect of overall delay variations in conversational situations. Listening only tests will concentrate on the effect of channel error and channel jitter to speech quality instead of the impact of overall end-to-end delay in speech conversation. The end-to-end delay impact is considered in delay analysis conducted on the whole processed test material.
H.2
ETSI
132 UL UH
H.3
An end-to-end delay analysis shall be evaluated in terms of characterizing the additional delay introduced by the tested jitter buffer. The analysis shall include a statistical representation of the buffering time for all channels as well as an analysis of the introduced error concealment operations from the jitter buffer, i.e so called late losses.
H.4
The goal of this test is to evaluate the impact of the HSDPA/EUL radio channel conditions on the speech quality especially when the channel is subject to packet losses and jitter. Subjective quality score and delay will be used as metrics to evaluate the results. The test will be designed based on P.800.Sec.6.2.
5.9kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.2kbit/s ( 150 ms)
Table H.6: Test conditions for listening-only tests with AMR-NB in background noise
Cond. Noise Type Frame Loss Rate 0.01 0.01 0.01 0.01 Channel AMR-Modes (fixed RTP delay)
5.9kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.2kbit/s ( 150 ms)
12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms)
ETSI
133
Table H.8: Test conditions for listening-only tests with AMR-WB in background noise
Cond. Noise Type Frame Loss Rate 0.01 0.01 0.01 0.01 Channel AMR-WB (fixed RTP delay)
12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms)
H.5
The term VoIP client is used to include speech encoder and RTP packetization on the sender side; a jitter buffer management (JBM) scheme and speech decoder on the receiver side. Figure 1 shows a test scenario.
Figure H.1: Test setup for VoIP codecs for listening only test
The implementation of the system shown in Figure 1 has the following functional components: I VoIP Client/transmitter containing a Pre-processing, including e.g. suitable pre-filtering and signal level control b AMR/AMR-WB encoder c RTP payload packetisation
II Error insertion device (EID) applying error-delay patterns to the transmitted RTP stream III VoIP Client/receiver (and Network interface) containing a RTP payload depacketisation b Jitter buffer management (JBM) c AMR/AMR-WB decoder d Post-processing For the listening-only test, the simulator can be implemented as an off-line tool. It includes Voice Encoding, RTP packetisation and error Insertion. Here, the error insertion device reads input RTP stream stored into a file, applies given error-delay pattern, and writes modified output RTP stream into a file. For this purpose the following storage protocol is introduced: The raw-data speech (linear PCM masked to 14 bits at 8 kHz sampling rate for AMR-NB and at 16 kHz sampling rate for AMR-WB) is carried within VoIP client and receiver. The encoder output is then stored with the AMR-NB/AMRWB file storage format according to media types audio/amr and audio/amr-wb, as specified in sections 5.1 and 5.3 of RFC 3267. The data exchanged between RTP packetization/depacketisation and error insertion device is a stream of encapsulated RTP packets in the RTPdump format shown in Table 9 and Table 10.
ETSI
3GPP TR 26.935 version 9.0.0 Release 9 Start Source Port Padding 32 bits (struct timeval) 32 bits (long) 16 bits (short) 16 bits (short)
134
Start time (GMT) of the file Source (IP) address UDP port number Padding data to provide 32-bit alignment
Preparation of the evaluation speech material can be based on the following pseudo code:
Read in first speech packet receivedPktTime = time of first received speech packet, playoutTime = time of the first received speech packet. lastReceivedPkt = 0 do { While (receivedPktTime <= playoutTime) { Deliver the received speech packet to the VoIP client Read in next speech packet Set receivedPktTime = time of next received speech packet If (no more packets) { lastReceivedPkt=1 break; } } While (playoutTime < receivedPktTime) { Request speech samples from VoIP client VoIP client returns Tp sec of speech samples Write out Tp sec to file output Set playoutTime = playoutTime + Tp } } While (VoIP client has PCM samples & lastReceivedPkt==1)
The VoIP client should, when requested speech samples, return short duration of PCM samples, e.g., 1ms. To ensure fair testing and verify the de-jitter and time warping aspects in a VoIP system, the network-decoder interface controls (i) the delivery of encoded speech packets to the speech decoder and (ii) controls the output of speech data from the speech decoder. However, to enable more realistic operation, the VoIP client is given the freedom of deciding how many speech samples it wants to output for each NCIM speech output request.
ETSI
135
I.1
1. 2.
Pseudo code
Reception functionality, including the decapsulation of received RTP payload and storing the received speech frames into a buffer. Decoding functionality, taking care of reading the frames from the buffer and providing a frame of decoded speech (or error concealment data) upon request.
To illustrate the relationship between these two functional parts in a simple way, the pseudo code is structured in a form of a simulation model in which a main loop handles the reception and decoding functionalities:
The main loop models the time line at each execution of this loop the simulated wall clock time is increased by one clock tick. Furthermore, the other two loops reception loop and the decoding loop are implemented inside the main loop. The reception loop is executed as many times as needed to process the new packets available at the packet input at/before current time. The decoding loop is executed as many times as needed to process all frames in the buffer scheduled for decoding at/before current time.
It is straightforward to implement the contents of the reception loop in function that is called each time a new RTP payload is received to provide the reception functionality. Similarly, the operations in the decoding loop can be implemented in a function that is called each time the audio device requests a new frame of speech to provide the decoding functionality. Table I.1 describes the variables used in the pseudo code. Note that in addition to variables introduced in the table, the pseudo code also uses the constant FRAME_DURATION to indicate a frame duration as number of RTP clock ticks (FRAME_DURATION = 160 for AMR, FRAME_DURATION = 320 for AMR-WB). Furthermore, constants THR1 and THR2 are used to control the fine tuning of the onset frame buffering time.
ETSI
136
current_time
Purpose Current simulation time as clock ticks at RTP time stamp clock rate
rx_time dec_time
Reception time of the current/next RTP packet (as clock ticks at RTP time stamp clock rate) Decoding time of the next frame (as clock ticks at RTP time stamp clock rate)
RTP timestamp of the current/next RTP packet (as clock ticks at RTP time stamp clock rate) RTP timestamp of the current (received) frame (as clock ticks at RTP time stamp clock rate) RTP timestamp of the frame to be decoded next (as clock ticks at RTP time stamp clock rate) Indication of input speech data status
Description / usage The current time is initialised to random value indicated by NOW in the pseudo code. The value is increased by one at the each execution of the main loop to simulate the passing of time. The reception time is initialised to the same value as current_time. The value is updated each time a new packet is available in the packet input. The value is initialised by adding the value of desired buffering delay JBM_BUFFER_DELAY for the initial value of the current_time. This variable is updated after each decoded frame by increasing the value by number of RTP clock ticks corresponding to one frame (160 ticks for 8 kHz clock rate used for AMR, 320 ticks for 16 kHz clock rate used for AMR-WB). The value is updated each time a new input packet is captured The frame timestamp value is set/updated when parsing a packet (containing several frames) The variable is used both to request the next frame in decoding order from the buffer and to detect the frames that arrive late A status variable that is initialised to value FALSE the value is set to TRUE when the end of the input packet file is encountered. A variable that is used to indicate buffering status needed for detecting the end of the simulation and to detect buffer overflows. The value of this variable is increased each time the decoder needs to invoke the error concealment operation. In case the value exceeds a predetermined threshold JBF_LOSS_PERIOD_THR, the resynchronisation operation is initiated by setting resync_flag to value 1. In case of normal decoding the value of loss_burst_len is set to zero. See the description for the variable loss_burst_len above. The value of this variable is set to one in the reception loop when an onset frame (i.e. the first speech frame after a non-speech period) is received. The decoding loop sets this value to zero when a requested frame from a buffer has been found and decoded. Set to non-zero value to force keeping the original frame alignment at speech onsets, i.e. to force rounding the decoding time of the first frame of a talk spurt to occur at an integer multiple of 20 ms since the beginning of the session
buffer_occupancy loss_burst_len
resync_flag onset_flag
Flag to indicate that a resynchronisation is needed. Indication that we are currently on the buffering time period before decoding the onset speech frame
keep_frame_align ment
Indication whether the decoding time of a speech onset frame must be aligned with the current frame structure or not i.e. whether the decoding must take place at time T + n * 20 ms, where T is the decoding time of the first frame of the session and n is an integer number.
/* INITIALISATION */ Read the first input frame, initialise variables based in received packet /* NOTE that time is measured in speech samples at RTP clock rate 8 kHz for AMR, 16 kHz for AMR-WB */
ETSI
137
rx_time = current_time = NOW next_ts = rtp_ts /* Set the desired initial buffering delay */ dec_time = current_time + JBF_INITIAL_DELAY end_of_input = FALSE buffer_occupancy = 0 loss_burst_len = 0 resync_flag = 0 onset_flag = 0; keep_frame_alingment = 1 /* MAIN LOOP */ WHILE end_of_input == FALSE OR buffer_occupancy > 0 { /* RECEPTION LOOP */ WHILE end_of_input == FALSE AND rx_time <= current_time { /* Set RTP timestamp for the frame */ frame_ts = rtp_ts /* Loop over all frames in the packet */ WHILE more frames in this packet { /* Possible NO_DATA frames are discarded */ IF frame_type != NO_DATA { IF speech onset detected { Find bt_min and bt_max, i.e. the minimum and maximum predicted buffering times over the period of JBF_HISTORY_LEN most recent frames /* Set new buffering time */ buffer_delay = bt_max bt_min /* Set this as the next frame to be decoded */ next_ts = frame_ts /* Set decoding time */ dec_time = current_time + buffer_delay /* Apply frame alignment if selected */ IF keep_frame_alignment == 1 { Move dec_time forward to the next frame boundary } /* Indicate for the decoder that we are buffering for speech onset */ onset_flag = 1; } /* Check if the decoder has set the re-synchronisation flag */ ELSE IF resync_flag == 1 { /* Continue decoding from the first frame arriving after a loss period */ next_ts = frame_ts /* Clear the re-synchronisation flag */
ETSI
138
resync_flag = 0 } /* Check if received frame is late by less than one frame slot */ ELSE IF frame_ts + FRAME_DURATION == next_ts AND TS >= next_ts NOT in the buffer { /* Re-schedule this frame to be the next frame to be decoded */ next_ts = frame_ts }
Compute predicted buffering time for the received frame and update buffering time history
/* Check frame arrival time */ IF frame_ts < next_ts { Discard the frame because it arrived late Update RX late_loss log: TIME = rx_time; RTP_TS = frame_ts; RX_STATUS =
} ELSE { /* Check buffer occupancy */ IF buffer_occupancy == MAX_BUFFER_OCCUPANCY { Discard the frame because the buffer is full Update RX log: TIME = rx_time; RTP_TS = frame_ts; RX_STATUS = overflow } ELSE { Store the frame into the buffer Update RX log: TIME = rx_time; RTP_TS = frame_ts; RX_STATUS = ok buffer_occupancy++ } } } /* Update RTP timestamp for the next frame */ frame_ts += FRAME_DURATION } Read the next input packet IF { new packet available Update variables rx_time rtp_ts } ELSE { end_of_input = TRUE
ETSI
139
/* DECODING LOOP */ WHILE dec_time <= current_time { /* Fine tune onset buffering time */ IF onset_flag == 1 { first_ts = TS of the first frame in the buffer /* Early decoding of onset frame if buffer is filling too fast */ IF buffer_occupancy * FRAME_DURATION THR1 >= buffer_delay { next_ts = first_ts } /* Postpone decoding of onset frame if buffer is filling too slowly */ ELSE IF buffer_occupancy * FRAME_DURATION + THR2 < buffer_delay AND next_ts == first_ts { next_ts -= FRAME_DURATION; } } Request frame having the RTP timestamp value next_ts from the buffer IF { requested frame found Decode speech or generate comfort noise (SID or SID_FIRST frame) normally Update DEC log: TIME = dec_time; RX_TIME = rcv_time; RTP_TS = next_ts; DEC_STATUS = ok buffer_occupancy-/* Clear lost burst counter */ loss_burst_len = 0 /* Clear speech onset flag */ onset_flag = 0 } ELSE { IF {
in speech state /* Increase lost burst counter */ loss_burst_len++ /* Check the loss period length */ IF loss_burst_len > JBF_LOSS_PERIOD_THR { Find the oldest frame in the buffer IF { a frame having a time stamp value new_ts found Decode the frame found in the buffer (i.e. reset the decoding to continue from the oldest frame found in the buffer) Update DEC log: TIME = dec_time; RX_TIME = rcv_time; RTP_TS = new_ts; DEC_STATUS = ok
ETSI
140
buffer_occupancy-/* Set the time stamp */ next_ts = new_ts /* Clear lost burst counter */ loss_burst_len = 0 } ELSE { Invoke error concealment Update DEC log: TIME = dec_time; RX_TIME = N/A; RTP_TS = next_ts; DEC_STATUS = error_concealment /* Set the re-synchronisation flag to trigger the decoding to continue from the next arriving frame */ resync_flag = 1 } } ELSE { Invoke error concealment Update DEC log: TIME = dec_time; RX_TIME = N/A; RTP_TS = next_ts; DEC_STATUS = error_concealment } } ELSE { /* DTX */ Continue comfort noise generation Update DEC log: TIME = dec_time; RX_TIME = N/A; RTP_TS = next_ts; DEC_STATUS = comfort_noise } }
/* Update variables for decoding the next frame */ dec_time += FRAME_DURATION next_ts += FRAME_DURATION } /* end of DECODING LOOP */ /* CLOCK/TIMER UPDATE */ current_time++ }
I.2
This section provides a verification of an implementation of JBM according to the pseudo code in Section I.1 against the minimum performance requirements specified in Section 8.2.3 of TS 26.114 [19]. The verification was performed by using the implemented JBM algorithm with the AMR codec. On each channel the simulation was repeated 20 times, each time with different random starting point on the channel. The results provided in the following subsections indicate
ETSI
141
the observed worst-case results (i.e. measured delay CDF closest to the delay requirement CDF and the highest jitter loss rate). The constants used in the pseudo code are set to the values given in Table I.2 for the verification.
I.2.1
Delay performance
Figures from I.1 to I.6 below show the delay performance of the implemented JBM and comparison against the minimum performance requirement specified in Section 8.2.2.2.2 of TS 26.114 [19]. The solid blue curve denotes the delay CDF for the implemented JBM, and the black dash-dotted curve indicates the delay requirement CDF.
20
40
60 ms
80
100
120
ETSI
142
20
40
60
80 ms
100
120
140
160
20
40
60
80 ms
100
120
140
160
180
ETSI
143
20
40
60
80
100 ms
120
140
160
180
200
50
100
150 ms
200
250
300
ETSI
144
50
100
150
200 ms
250
300
350
400
I.2.2
Table I.3 summarizes the jitter loss rates of the implemented JBM for all test channels, computed as specified in TS 26.114 Section 8.2.3.2.3.
Table I.3: The jitter loss for the tested JBM on test channels.
Channel JBM loss rate 1 0.07 % 2 0.40 % 3 0.15 % 4 0.72 % 5 0.95 % 6 0.57%
ETSI
145
J.1
Speech preparation
The processing steps required for generation of the speech samples are described below.
J.2
Pre-processing
The first step is concatenation where all available speech samples are merged into one long speech file. This file is then pre-processed according to the figure below.
MIRS filter
J.3
Noise files are filtered by the MIRS filter. The noise files are then converted to a near-field perception using the SM filter.
16kHz source noise file MIRS filter SM filter 16 kHz filtered noise file
ETSI
146
AL should be -15 for the car noise and -20 for the caf noise.
J.4
Up- and down-sampling is needed because the sample rate of the original speech files is 48 kHz, the processing is made with 8/16 kHz sampling and the listening was made with 16 kHz. The figure below describes the up- and downsampling between 16 kHz and 8 kHz.
Clipping and 16 13 bits conversion (with rounding) Processing (following figures) 16 13 bits conversion (with rounding)
Figure J.4. Sample-rate conversion, rounding and scaling for narrow-band filtered conditions
Clipping and 16 14 bits conversion (with rounding) Processing (following figures) 16 14 bits conversion (with rounding)
Figure J.5. Sample-rate conversion, rounding and scaling for wideband-band filtered conditions
STL2000 syntax (narrow-band)
filter -down HQ2 infile outfile scaldemo -dB -gain 0 -bits 13 -round -nopremask -blk 160 infile outfile
ETSI
147
(Processing ) scaldemo -dB -gain 0 -bits 13 -round -nopremask -blk 160 infile outfile filter -up HQ2 infile outfile 160
scaldemo -dB -gain 0 -bits 14 -round -nopremask -blk 320 infile outfile
J.5
J.6
For AMR-NB the format of the infile is 13 bits 8 kHz and the MNRU levels are 5, 13, 21, 29, 37 dBq. For AMR-WB the format of the infile is 14 bits 16 kHz and the MNRU levels are 5, 13, 21, 29, 37, 45 dBq. STL2000 syntax (narrow-band)
mnrudemo -Q x infile outfile 160 /* x = dBq level */
J.7
The reference conditions with fixed JBM and the test conditions with adaptive JBM are processed as described below.
ETSI
148
The output from the encoder/RTP packetization and the input to the JBM/decoder is in RTP-dump format. The fixed JBM initial buffering delay is set in such a way that the resulting end-to-end delay (including channel delay and buffering delay) is similar to the average end-to-end delay of the adaptive JBM in the same test condition. Command syntax for AMR/AMR-WB encoding & RTP packetization
amr_enc dtx -fpp 1 mode x if infile of outfile /* x = 2 for 5.9 kbit/s mode, x = 7 for 12.2 kbit/s mode */ amrwb_enc dtx -fpp 1 mode 2 if infile of outfile /* x = 2 for 12.65 kbit/s mode */
J.8
Post-processing
16kHz processed file P.56 Level adjustment to -26dBov File separation and windowing (as necessary) 16kHz postprocessed file
J.9
Test conditions
Table J.1. Test conditions
Cond. Codec JBM Noise Type Fram e Loss Rate 1 2 Direct NB Direct WB Clean Clean Channel AMR-Modes (fixed RTP delay)
ETSI
3GPP TR 26.935 version 9.0.0 Release 9 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 NB NB NB NB NB WB WB WB WB WB WB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-WB AMR-WB AMR-WB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-NB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB AMR-WB Direct NB Direct NB Direct WB Direct WB
149 MNRU 5 dBq MNRU 13 dBq MNRU 21 dBq MNRU 29 dBq MNRU 37 dBq MNRU 5 dBq MNRU 13 dBq MNRU 21 dBq MNRU 29 dBq MNRU 37 dBq MNRU 45 dBq Clean Clean Car Car Cafeteria Cafeteria Clean Car Cafeteria Clean Clean Clean Clean Car Cafeteria Car Cafeteria Clean Clean Clean Clean Car Car Cafeteria Cafeteria Clean Clean Clean Clean Car Cafeteria Car Cafeteria Clean Clean Clean Clean Car Car Cafeteria Cafeteria Car Cafeteria Car Cafeteria
Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Fixed Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
Error free Error free Error free Error free Error free Error free Error free Error free Error free Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8
5.9 kbit/s (150 ms) 12.2 kbit/s (150 ms) 5.9 kbit/s (150 ms) 12.2 kbit/s (150 ms) 5.9 kbit/s (150 ms) 12.2 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 5.9kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 5.9kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 5.9kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.2kbit/s ( 150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms) 12.65 kbit/s (150 ms)
ETSI
150
The results from simulation 1 entailed 16 samples for down link and 16 samples for up link with paired channel conditions PedB_3km, PedB30km, VehA_30km and VehA_120km. The location of the reference user was fixed for all simulations. The results from simulation 2 entailed 22 samples, where 20 are for the down link and two for the up link, representing a paired channel PedB_3km. The difference between the 20 samples lied in the network load (number of users) and the location of the reference user (geometry).
Low Traffic (LT): 40, or 45, or 60 mobile users per cell High Traffic (HT): 80, or 100 mobile users per cell
ETSI
151
Low Mobility (LM, Lm): ITU Channel-Model: PedB3_km or PedA3_km High Mobility (HM, Hm): ITU-Channel-Model: VehA30km or Veh120km or PedB30km
43 33 34 21 10 17.1 0 8 -136+35.22*log10(d), d in km 50% 8 per TR 25.896 v6.0.0 A.3.1.1 65 2 25% AWGN 37% PedB 3 kph 13% PedB 30 kph 13% VehA 30 kph 12% VehA 120 kph E-DCH: 40 UEs per cell HSDPA: 40/60/80/100 UEs per cell Case 1: PedB 3 kph Case 2: PedB 30 kph Case 3: VehA 30 kph Case 4: VehA 120 kph Case 1: One cell in active set, UE geometry = 3.3 dB Case 2: Soft handoff with 2 cells in active set, UE geometry = 3.0 dB, UE serving cell geometry = -0.7 dB -18 dB -115 dBm 19 Node Bs/57 cells 3-Cell Clover-Leaf 2500 1990 MHz
Ec/Io Admission Threshold RSCP Admission Threshold Number of Node Bs Cell layout Inter-site Distance [m] Frequency
ROHC dynamics RTCP SIP SID Frames RTP layer aggregation MAC-d PDU Size
ETSI
152
Max. HSDPA Transmit Power (HS-SCCH + HS-PDSCH) HS-SCCH Channel Model Number Errors Impact HS-DSCH Decoding Power Allocation Downlink Over-the air Delay Budget [ms] (MAC-d to MAC-d) Iub delay modelled HSDPA Scheduler Implementation Mobility Model E-DCH Scheduling E-DCH TTI length E-DCH max number of HARQ transmissions E-DCH QoS HS-DPCCH modeled for E-DCH simulation
ETSI
153
Node B resources
RAB for HSDPA EUL format EUL scheduling EUL error modelling
ETSI
154
Annex L: Test Plan for the AMR NB/WB Conversation Test in UMTS over HSDPA/EUL
L.1 Introduction
This document contains the test plan of a conversation test for the selected speech codecs of Adaptive Multi-Rate Narrow-Band (AMR-NB) and Adaptive Multi-Rate Wide-Band (AMR-WB) in Packet Switched networks with HSDPA/HSUPA radio interface, where HSUPA is also referred to as EUL, or EDCH within the terminology of 3GPPTSG-RAN. All the laboratories participating in the conversation test will use the same test plan, while each laboratory uses a different test language. Even if the test rooms or the test equipments are not exactly the same in all the laboratories, the calibration procedures and the tests equipment characteristics will guarantee the similarity of the test conditions. The details of the test plan is given in the following in 3 sections: Section 2 gives the general information regarding the test. Section 3 details the test design and test methodology Section 4 provides procedure for the test arrangement and logistics
L.2
L.2.1
General Information
Permanent Documents
ITU-T Rec. P.800 Methods for Subjective Determination of Transmission Quality ITU-T Rec. P.805 Conversational Tests
L.2.2
AMR-NB
Key Acronyms
Adaptive Multi-Rate Narrowband Speech Codec
AMR-WB
Mean Opinion Score High Speed Packet Access High Speed Downlink Packet Access High Speed Uplink Packet Access
ETSI
155
L.2.3
Contacts
The following persons should be contacted for questions related to the test and test plan.
Responsibility Contacts
Coordination Jim McGowan
Affiliation
Alcatel-Lucent
Mail Address
67 Whippany Rd. Rm 2A-384, Whippany, NJ 07891, USA
Phone/Fax/Email
Tel: +1 908 582 5667 Fax: +1-973-386-4555
Test Bed
mcgowan@lucent.com
3GPP-TSG-SA4SQ-Chair
Paolo Usai
ETSI MCC
650 Route des Lucioles 06921 Sophia Antipolis Cedex France 6850 Austin Center Blvd., Ste.150 Austin, TX 78731
Alan Sharpley
Dynastat
L.2.4.
Participants
Each test laboratory has the responsibility to organize its conversation tests. The list of the participating test laboratories is the following:
ETSI
156
Lab 1
France Telecom RD/TECH/SSTP Technopole Anticipa 2, Av P Marzin 22307 Lannion, Cdex, France
Dynastat
English
Alan Sharpley,
Chinese
L.3.
L.3.1
Test Methodology
Introduction
The method evaluates the effect of degradation on the quality of the communications through the conversation-opinion tests recommended by the ITU-T P.800. The conversationopinion tests allow subjects in the test to be in a more realistic situation in terms of the actual service conditions experienced by telephone customers. In addition, the conversation-opinion tests are suited to assess the effects of impairments that can cause difficulty while conversing. Subjects participate to the test in couple; they are seated in separate sound-proof rooms and are asked to hold a conversation through the transmission chain simulated by a computer that generates the impairment of the
ETSI
157
communication link considered typical for the packet switched network with HSDPA/HSUPA air-interface. The simulated network configurations (including the terminal equipments) will be symmetrical (in the two transmission paths as shown in Figure L.1, but the link conditions in each direction can be asymmetrical (to be elucidated later).
Test Room A
Test Bed
Test Room B
L.3.2
L.3.2.1
Test Design
Description of the Test Bed
The test bed intends to provide an emulated transmission system that resembles the UMTS with HSDPA/HSUPA, as shown by Figure L.2. The real situation to be tested is a process in which a bit-stream is encoded by AMR packet-wise and transmitted through a HSUPA and HSDPA air-interfaces, so that it reaches the receiver, where it is decoded by AMR decoder packet-wise. The bit-stream encounters impairments while traversing through the system. The impairment is simulated by the simulator off-line and played into the test bed during the test.
AMR (A)
HSUPA
Core Network
HSDPA
AMR(B)
HSDPA
HSUPA
ETSI
158
Simulated transmission links are implemented in hardware through two computers, each being responsible for one direction, as shown in Figure L.3. The Internet Protocol is implemented in both computers. Each AMR frame generated by the AMR encoder is wrapped in a unique RTP packet every 20 ms. At the receiver the RTP packets are buffered and delayed according to the lower layer simulated receive time.
Computer B
AMR(Decode)/JBM
Headset Receiver
L.3.2.2
Transmission System
The transmission system is configured as a mobile-to-mobile connection within an IMS with HSDPA downlink and an HSUPA uplink. The protocol stack of the radio interface is shown in Figure L.4. The simulation of the performance of the radio interface simulator is based on a network layout of 19 cells and 57 sectors, while the output of the simulation is a sequence of RLC packet reception status. A RLC packet is transmitted from the mobile to the origination RNC, and from the destination RNC to the destination RNC via the core network, before reaching the receive mobile. The recorded traces include the delay and the error event of the received RLC packets.
MS-D RNC-D
MS-O
Figure L.4: Transmission path through a UMTS
The transmission of IP/UDP/RTP/AMR packets over the core network is not further simulated in details besides a static end-to-end delay.
ETSI
159
L.3.2.3
The AMR-NB/AMR-WB will encode speech at a 5.9 kbps, 12.2 kbps, and 12.65 kbps, respectively. The bit-stream will be encapsulated using IP/UDP/RTP protocols and sent to the air-interface emulator located in the origination computer. The output of the air-interface is the payload of the IP packets, which are then sent through an RJ-45 port of the origination computer and received by the destination computer, where the RTP packets will be extracted and the AMRNB/AMR-WB frames are buffered and decoded.
The RABs underlying the test are specified in TS 25.993 in the following sections:
7.5.3 RB for Conversational / unknown UL: [max bitrate depending on UE category and TTI] on E-DCH DL: [max bitrate depending on UE category] on HS-DSCH / PS RAB + RB for interactive or background / UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bitrate depending on UE category] on HS-DSCH / PS RAB + RB for interactive or background / UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bitrate depending on UE category] on HS-DSCH / PS RAB + UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bit rate depending on UE category] on HS-DSCH SRBs for DCCH 7.5.4 RB for Conversational / Unknown UL: [max bitrate depending on UE category and TTI] on E-DCH DL: [max bitrate depending on UE category] on HS-DSCH / PS RAB + RB for interactive or background / UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bitrate depending on UE category] on HS-DSCH / PS RAB + UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bit rate depending on UE category] on HS-DSCH SRBs for DCCH
"
L.3.2.4
Test environment
An external sound card will be used for each computer of the test bed. To avoid echo problems, headsets, instead of handsets will be used. The monaural supra-aural headsets, the other ear uncovered, are connected to the sound cards. But, in practice, the original settings, defined during the preliminary tests, and producing a comfortable listening level, will not be modified. A foam ball protects the microphones in order to reduce the "pop" effect. The user should avoid to place the acoustic opening of the microphone in front of the mouth Each of the two subjects participating in the conversations is installed in a test room. They sit in an armchair in front of a table. The test rooms are acoustically insulated. All the test equipments are installed in a third room, connected to the test rooms. When needed, the background noise is generated in the appropriate test room through a set of 4 loudspeakers. The background noise level is adjusted and controlled by a sound level meter. The measurement microphone, connected to the sound level meter is located at the equivalent of the center of the subject's head. The noise level is A weighted. Before the beginning of a set of experiments, the end-to-end transmission level is checked subjectively, to ensure that there is no problem. The speech level is checked by the following procedure: An artificial mouth placed in front of the microphone of the Headset A, in the LRGP position -See ITU-T Rec. P.64-, generates in the artificial ear (according to ITU-T Rec. P57) coupled to the earphone of the Head set B the nominal level defined in section 4.3. The level is adjusted according to the bandwidth to -15 dB Pa for NB and to -18 dB Pa for WB , when necessary, with the receiving volume control of the headset. Inverting headsets A and B does a similar calibration.
ETSI
160
At each test laboratory the test bed must be calibrated, so that the given value of fixed delay for the speech transmission is the same for all labs.
L.3.3
Test Conditions
Three codec rates will be tested: AMR-NB 5.9 kbps and 12.2 kbps, as well as AMR-WB 12.65 kbps. Two different categories of test conditions are defined and their combination makes the actual test conditions.
Network Condition
Table L.1: Definition of the radio network conditions
Uplink Lm Hm
Low Traffic (LT): 40, or 45, or 60 mobile users per cell High Traffic (HT): 80, or 100 mobile users per cell Low Mobility (LM, Lm): ITU Channel-Model: PedB3_km or PedA3_km High Mobility (HM, Hm): ITU-Channel-Model: VehA30km or Veh120km or PedB30km
The uplinks are simulated as dedicated channel, hence the traffic conditions apply only to the downlinks. From a mobile-to-mobile connection, the order of the uplink and downlink plays no role. Therefore, we have the following 8 possible construction of channel conditions:
The condition refers the characteristic background noise of the subjects; four classes of noise will be deployed:
ETSI
161
Level (dB Pa ) -30 -35 -35 Spectrum at 30 dBA as defined by ITU-T, Recommendation P.800, Annex A, section A.1.1.2.2.1 Room Noise, with table A.1 and Figure
The production of background noise follows the guide lines of ETSI EG 202 396-1 (clause 6).
x AMR-Mode
y Network Load
Z Experiment
C Swap subjects
Cond. Label
Noise in Room A
Noise in Room B
Description
Cond. Number
A->B: [1] B->A: [1] Car A->B: [6] B->A: [6] Car A->B: [5] B->A: [2] Hoth A->B: [2] B->A: [5] Cafeteria A->B: [1] B->A: [1] Cafeteria A->B: [2] B->A: [5] Street A->B: [5] B->A: [2] Street A->B: [6] B->A: [6]
Hoth
Hoth
Lm.LT.LM LM.LT.Lm Car Hm.LT.HM HM.LT.Hm Hoth Hm.LT.LM HM.LT.Lm Car Lm.LT.HM LM.LT.Hm Cafeteria Lm.LT.LM LM.LT.Lm Street Lm.LT.HM LM.LT.Hm Cafeteria Hm.LT.LM HM.LT.Hm Street Hm.LT.HM HM.LT.Hm
1 2 3 4 5 6 7 8
ETSI
162
A->B: [3] B->A: [3] Car A->B: [8] B->A: [8] Car A->B: [7] B->A: [4] Hoth A->B: [4] B->A: [7] Cafeteria A->B: [3] B->A: [3] Cafeteria A->B: [4] B->A: [7] Street A->B: [7] B->A: [4] Street A->B: [8] B->A: [8]
Hoth
Hoth
Lm.HT.LM LM.HT.Lm Car Hm.HT.HM HM.HT.Hm Hoth Hm.HT.LM HM.HT.Hm Car Lm.HT.HM LM.HT.Hm Cafeteria Lm.HT.LM LM.HT.Lm Street Lm.HT.HM LM.HT.Hm Cafeteria Hm.HT.LM HM.HT.Lm Street Hm.HT.HM HM.HT.Hm
9 10 11 12 13 14 15 16
Cond. Label
Noise in Room A
Noise in Room B
Description
Cond. Number
2-1.1 2-1.2 2-1.3a 2-1.3b 2-1.4 2-1.5a 2-1.5b 2-1.6 2-2.1 2-2.2 2-2.3a 2-2.3b 2-2.4
A->B: [1] B->A: [1] Car A->B: [6] B->A: [6] Car A->B: [5] B->A: [2] Hoth A->B: [2] B->A: [5] Cafeteria A->B: [1] B->A: [1] Cafeteria A->B: [2] B->A: [5] Street A->B: [5] B->A: [2] Street A->B: [6] B->A: [6] Hoth A->B: [3] B->A: [3] Car A->B: [8] B->A: [8] Car A->B: [7] B->A: [4] Hoth A->B: [4] B->A: [7] Cafeteria A->B: [3] B->A: [3]
Hoth
Hoth
Lm.LT.LM LM.LT.Lm Car Hm.LT.HM HM.LT.Hm Hoth Hm.LT.LM HM.LT.Lm Car Lm.LT.HM LM.LT.Hm Cafeteria Lm.LT.LM LM.LT.Lm Street Lm.LT.HM LM.LT.Hm Cafeteria Hm.LT.LM HM.LT.Hm Street Hm.LT.HM HM.LT.Hm Hoth Lm.HT.LM LM.HT.Lm Car Hm.HT.HM HM.HT.Hm Hoth Hm.HT.LM HM.HT.Hm Car Lm.HT.HM LM.HT.Hm Cafeteria Lm.HT.LM LM.HT.Lm
1 2 3 4 5 6 7 8 9 10 11 12 13
ETSI
163
Cafeteria A->B: [4] B->A: [7] Street A->B: [7] B->A: [4] Street A->B: [8] B->A: [8]
Street
Cond. Label
Noise in Room A
Radio Network Condition A->B: [1] B->A: [1] A->B: [6] B->A: [6] A->B: [5] B->A: [2] A->B: [2] B->A: [5] A->B: [1] B->A: [1] A->B: [2] B->A: [5] A->B: [5] B->A: [2] A->B: [6] B->A: [6] A->B: [3] B->A: [3] A->B: [8] B->A: [8] A->B: [7] B->A: [4] A->B: [4] B->A: [7] A->B: [3] B->A: [3] A->B: [4] B->A: [7] A->B: [7] B->A: [4] A->B: [8] B->A: [8]
Noise in Room B
Description
Cond. Number
3-1.1 3-1.2 3-1.3a 3-1.3b 3-1.4 3-1.5a 3-1.5b 3-1.6 3-2.1 3-2.2 3-2.3a 3-2.3b 3-2.4 3-2.5a 3-2.5b 3-2.6
Hoth Car Car Hoth Cafeteria Cafeteria Street Street Hoth Car Car Hoth Cafeteria Cafeteria Street Street
Hoth Car Hoth Car Cafeteria Street Cafeteria Street Hoth Car Hoth Car Cafeteria Street Cafeteria Street
Lm.LT.LM LM.LT.Lm Hm.LT.HM HM.LT.Hm Hm.LT.LM HM.LT.Lm Lm.LT.HM LM.LT.Hm Lm.LT.LM LM.LT.Lm Lm.LT.HM LM.LT.Hm Hm.LT.LM HM.LT.Hm Hm.LT.HM HM.LT.Hm Lm.HT.LM LM.HT.Lm Hm.HT.HM HM.HT.Hm Hm.HT.LM HM.HT.Hm Lm.HT.HM LM.HT.Hm Lm.HT.LM LM.HT.Lm Lm.HT.HM LM.HT.Hm Hm.HT.LM HM.HT.Lm Hm.HT.HM HM.HT.Hm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Preliminary training conditions are 1-1.1 and 1-1.2 (colored within red and blue, respectively, in the table)
Miscellaneous Conditions
ETSI
164
Listening Level Listeners/Speakers Groups Rating Scales Languages Listening System Microphone
1 32 16 5 3 2 2
79 dBSPL or 76 dBSPL (-15 dB Pa or -18 dB Pa) Nave Listeners/Native Speakers 2 subjects/group see section 4.2 French, English, Chinese Monaural headset (flat response in the audio bandwidth of interest: 50Hz-7kHz). The other ear is open. Frequency range: 100Hz-10kHz
L.4
Test Procedure
The procedure and logistic of the test across test laboratories are given in the following:
L.4.1
Time Projection
#acoustic/radio conditions
8 2 3 3 32
2 subjects swapping Light, heavy 5.9kbps, 12.2kbps, 12.65kbps English, French, Chinese 16 pairs
#network load conditions #codecs=#experiments per lab #languages #subjects per experiment
Each lab tests only one language. Each experiment covers 16 test conditions. Each group has to perform 16 conversations, each of ca. 3 minutes. A session consists of 4 consecutive conversations, corresponding to ca. 20 minutes test time. The subject panels for the three experiments shall be independent, i.e. no subject will participate in more than one experiment. The order of the presentation of test conditions are provided in Appendix 2.
Practice and Training per group: 30 minutes Conversation plus setup and data collection: 5 minutes Break between sessions: 10 minutes Number of breaks per experiment: 3 Work hours per day: 8 hours Work days per week: 5 days
ETSI
165
This results in 3 groups per day, i.e. 6 working days per experiment, and 18 working days per laboratory, plus 1 day for system setup. In total, one month per laboratory is estimated as the minimum The project plan can be envisioned as the following:
Starting Date May 15, 2007 June 19, 2007 July 28, 2007 August 28, 2007
The actual time will be adapted to the specific situation of the individual labs. The entire test is expected to take 3+ months.
L.4.2
The following instruction shall be given to the subjects in each lab in the respective native language during the training phase prior to the tests. You are going to have a conversation with another user. The test situation is simulating communications between two mobile phones. The most of the situations will correspond to silent environment conditions, but some other will simulate more specific situations, as in a car, or in a railway station or in an office environment, when other people are discussing in the background. After the completion of each call conversation, you will have to give your opinions on the quality, by answering to the following questions that will be displayed on the screen of the black box in front of you. Your judgment will be stored. You have 8 seconds to answer to each question. After "pressing" the button on the screen, another question will be displayed. You continue the procedure for the 5 following questions.
ETSI
166
Question 1: How do you assess the sound quality of the other persons voice? No distortion at all, natural Minimal distortion Moderate distortion Considerable distortion Severe distortion
Question 2: How well did you understand what the other person was telling you ? No loss of understanding Minimal loss of understanding Moderate loss of understanding Considerable loss of understanding Severe loss of understanding
Question 3: How would you assess your level of effort to converse back and forth during the conversation ? No effort required Minimal effort required Moderate effort required
Question 4: Did you detect (noises, cuts,)? Yes or no ? If yes, how annoying was it ? No annoyance Minimal annoyance Moderate annoyance Considerable annoyance Severe annoyance
Question 5: What is your opinion of the connection you have just been using ? Excellent quality Good quality Fair quality Poor quality Bad quality
From then on you will have a break approximately every 30 minutes. The test will last a total of approximately 60 minutes. Please do not discuss your opinions with other listeners participating in the experiment.
L.4.3
Test Materials
The pretexts used for conversation test are those developed by ITU-T SG12. These scenarios have been elaborated to allow a conversation well balanced within both participants and lasting approximately 230 or 3, and to stimulate the discussion between persons that know each other to facilitate the naturalness of the conversation. They are derived from typical situations of every day life: railways inquiries, rent a car or an apartment, etc. Each condition should be given a different scenario. Each lab is responsible for developing the actual conversation materials to be used. The examples are extracted from ITU-T rec. P.805 (2007) Appendices 4, 5 and 6. Following the examples and the spirit given by this reference, the actual materials should be developed and adapted to the language being tested, the cultural specifics of the country of the lab and the local situations, depending on where the test lab is located.
ETSI
167
L.4.4
Deliverables
The information required from each test laboratory is a table containing the "Opinion Score (OS)", in ASCII file or in spreadsheet, obtained from every subject for each conversation. No post processing is required from the labs. The original data are provided by each lab using a template that includes the following information:
Rating
Conversation Partner ID
Time/Date
Comments
Raw data deliverable spreadsheet will be provided to the test labs by the Global Analysis Lab prior to the beginning of the tests.
L.4.5
Data Analysis
Two statistical analyses should be conducted on the data obtained with these subjective scales. The first analysis consists in a MANOVA, which globally indicates the possible effect of the experimental factors (i.e., different conditions). Then, a specific ANOVA should be run on each dependent variable to test if there is an effect of a specific experimental factor for a given subjective variable. In other words, these statistical analyses indicate if the differences observed between the MOS obtained for the different conditions are significant, for any given dependant variable (ANOVA) or for the entirety of all the dependant variables (MANOVA). Finally, Pearson's linear correlations should be computed between the results of all subjective variables, to find out the specific dependent relations.
L.5
L.5.1
TR 26.935 provides information on the performance of default speech codec in packet switched conversational multimedia applications. The transmission of IP/UDP/RTP/AMR packets over the UMTS air interface (DCHs) wass simulated using the Conversational / Speech / UL:46 kbit/s / PS RAB coming from TS 34.108 v. 4.7.0. During TSG SA#27 Tokyo [SP-050089], the new work item of Performance Characterization of VoIMS over HSDPA/EUL was approved. The goal of the work item is to test the codec performance when VoIP is supported by HS-DSCH in the DL and EDCH in the UL.
L.5.2
System Overview
The goal of the test system is to enable MOS tests of mobile-to-mobile conversational voice services in a representative UMTS system supporting VoIP over HSDPA/EDCH. The test system includes two independent links in opposite directions, used by the two parties of an active conversation, respectively. The two parties of the conversation are referred to as A and B, respectively. Thus, the entities of the test system occur always in pair, and the configuration of the link A-to-B and B-to-A are identical, reflecting the symmetry of the conversational connection. The principle of the design of the test system is the balance of the fidelity to the reality and the feasibility of the implementation. The UMTS system and the IP network with the designated channel types and protocols will be simulated by means of digital computers. It is therefore important that a design of the test system allows for the verifications and repetitions, so that the correct implementation in software can be achieved with the highest probability. To this end, a modular design is taken.
ETSI
168
Considering the fact that HARQ and ROHC introduce sources of delay jitters for the packets in both directions, it is necessary to implement them in two modules. Besides, the speech lab and the IP/Core network are both independent of RAN in nature, it is reasonable to divide the entire test system into 4 separate entities: RN simulator, IP/Core network simulator, VoIP simulator and Test Environment This division results in 6 interfaces in each direction, as shown in Fig.1. On the high level, each entity has the following respective function: Radio Network (RN) Simulator: This simulates the performance of the protocol layers RLC/MAC/PHY for the downlink and the uplink, to produce statistics for the air interfaces on the RLC packet stream. It is noted that the RN simulator defined here is a sub-set of the RAN defined in the UMTS and it aims at capturing the RAN impacts that are essential to the VoIP performance characterisation. IP/Core Simulator: This simulates the routing through a loaded IPv6 network, to capture the impairments of packet loss and delay. For the purpose of testing the conversational services, only two entry/exit pairs for the IP core network are neededone entry/exit for RN(A) and the other entry/exit for RN(B). VoIP Simulator: This simulates the VoIP specific functions between the sound cards and the RAN simulators, which comprises the speech encoder/decoder, AMR/RTP/UDP/IP/PDCP packetizing/depacketizing, robust header compression/decompression for both party A and party B of a conversation, etc. Physically, the two ends of the VoIP are located in the SRNC and belong to MAC-d entities of the two conversation parties, respectively. Speech Lab: This performs the MOS tests on the AMR/AMR-WB under the network conditions simulated by VoIP, RN and IP/Core. Each side of the conversation uses appropriate playback hardware. The requirement for the test material and the test subject can be taken from TR26.935.
Test Room A
Test Room B
Sound Card A
Sound Card B A.12 Interface 1 B.11 VoIP (B) A.22 A.32 A.31 Interface 2 IP/Core B.21 B.31 RN (B) B.22 B.12
B.32
Interface 3
Figure L.5.1 Architecture of the Test System
ETSI
169
The division of the test system into relatively independent entities serves to clarify the concepts involved. The modular structure allows for off-line simulation of each identified entity independently. However, the designated conversational test requires the availability of the simulated radio carrier in a real-time manner. The real-time simulation of the entire system is hardware limited due to the complexity of the RN simulator. Therefore a combination of the off-line simulation of the RN and the on-line simulation of the VoIP is considered. This is justified by the fact that a continuous stream of RLC PDUs can be produced by the RN simulator regardless of the payload.
L.5.3
The radio bearers used for the simulation of the lower layer delay and error performance are extracted from 25.993 in the following: "
7.5.3 RB for Conversational / unknown UL: [max bitrate depending on UE category and TTI] on EDCH DL: [max bitrate depending on UE category] on HS-DSCH / PS RAB + RB for interactive or background / UL : [max bitrate depending on UE category and TTI] on EDCH DL : [max bitrate depending on UE category] on HS-DSCH / PS RAB + RB for interactive or background / UL : [max bitrate depending on UE category and TTI] on EDCH DL : [max bitrate depending on UE category] on HS-DSCH / PS RAB + UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bit rate depending on UE category] on HS-DSCH SRBs for DCCH
" The minimum UE classes supporting this combination are : support of HS-PDSCH, DL on HS-PDSCH: category 11 and support of E-DPDCH, UL on E-DPDCH category 1. This is supported in Release 6. 7.5.3.1 Uplink
Radio Bearer on E-DPCH 7.5.3.1.1.1.1 for conversational RB, 6.10.2.4.6.1.1.1.1.1 of [1] for Interactive/Background RBs (MAC-e muxed)
TFCS Physical Channel 6.10.2.4.6.1.1.2.1 of [1] E-TFCI table index = 0; E-DCH minimum set E-TFCI = = 29 (10 ms TTI, TB size 374 bits) or 32 (2 ms TTI, TB size 368 bits)
Note: MAC-e multiplexing of scheduled and non-scheduled MAC-d flows is allowed. 7.5.3.1.1 Transport channel parameters 7.5.3.1.1.1 Transport channel parameters for E-DCH
7.5.3.1.1.1.1 MAC-d flow#1 parameters for conversational / Unknown UL: [max bit rate depending on UE category and TTI] on E-DCH / PS RAB
ETSI
170
RAB/Signalling RB PDCP header size, bit Logical channel type RLC mode Payload sizes, bit Max data rate, bps UMD PDU header, bit
RAB 0 DTCH UM 88, 104, 136, 152, 168, 184, 200, 216, 280, 288, 304, 336 (alt 328) Depends on UE category and TTI 8 N/A 96, 112, 144, 160, 176, 192, 208, 224, 288, 296, 312, 344 (alt 336) (non-scheduled) (NOTE1) 18 E-DCH 10ms (alt. 2ms) (NOTE2) TC 24
MAC
MAC-e multiplexing MAC-d PDU size, bit Max MAC-e PDU content size, bit MAC-e/es header fixed part, bit
Layer 1
NOTE1: Max MAC-e PDU content sizes dependson non-scheduled grant given by SRNC NOTE2: The support of 2ms TTI depends on the UE category. 7.5.3.2 Downlink
Radio Bearer on HS-PDSCH 7.4.22.2.1.1.1 for Conversational RB 6.10.2.4.5.1.2.1.1.1 of [1] for Interactive/Background RBs
Signalling Radio Bearer Signalling Radio Bearer on DPCH on HS-PDSCH 6.10.2.4.6.3.2.1.1.2 of [1]
TFCS Physical Channel 6.10.2.4.5.1.2.2.2 of [1] The physical channel configuration shall use F-DPCH.
ETSI
171
7.5.4 "
RB for Conversational / Unknown UL: [max bitrate depending on UE category and TTI] on E-DCH DL: [max bitrate depending on UE category] on HS-DSCH / PS RAB + RB for interactive or background / UL : [max bitrate depending on UE category and TTI] on EDCH DL : [max bitrate depending on UE category] on HS-DSCH / PS RAB + UL : [max bitrate depending on UE category and TTI] on E-DCH DL : [max bit rate depending on UE category] on HS-DSCH SRBs for DCCH
" The minimum UE classes supporting this combination are: support of HS-PDSCH, DL on HS-PDSCH: category 11 and support of E-DPDCH, UL on E-DPDCH category 1. This is supported in Release 6. 7.5.4.1 Uplink
Radio Bearer on E-DPCH 7.5.3.1.1.1.1 for Conversational RB 6.10.2.4.6.1.1.1.1.1 of [1] for Interactive/Background
7.5.1.1.1.1.1
TFCS Physical Channel 6.10.2.4.6.1.1.2.1 of [1] E-TFCI table index = 0; E-DCH minimum set E-TFCI = = 29 (10 ms TTI, TB size 374 bits) or 32 (2 ms TTI, TB size 368 bits)
Note: MAC-e multiplexing of scheduled and non-scheduled MAC-d flows is allowed 7.5.4.2 Downlink
Radio Bearer on HS-PDSCH 7.4.22.2.1.1.1 for Conversational RB 6.10.2.4.5.1.2.1.1.1 of [1] for Interactive/Background RB
Signalling Radio Bearer Signalling Radio Bearer on DPCH on HS-PDSCH 6.10.2.4.6.3.2.1.1.2 of [1]
TFCS Physical Channel 6.10.2.4.5.1.2.2.2 of [1] The physical channel configuration shall use F-DPCH.
ETSI
172
L.5.4
Delay
The overall delay consists of the delay of the air interface as well as the networks. The predominant issue that distinguishes VoIP from voice service on circuit switched network is the variation of the delay with respect to a fixed delay value, which is referred to as jitter. In order to capture the impact of jitter on the performance of VoIP, a proper assumption about the overall delay budget is necessary. The fixed delay component is estimated using the following example of delay budget for end-to-end VoIP calls in HSPA when the uplink uses 10 ms TTIs [19].
Uplink (EUL 10 ms TTI) AMR encoder UE L1/L2 processing TTI alignment Uu interleaving UL re-TX RNC/Iub/Node B Iu + Gi Sum min UL Sum max UL
Delay
Downlink (HSDPA)
35 ms AMR decoder 5 ms UE L1/L2 processing 0 10 ms 10 ms Uu interleaving 0 80 ms DL Scheduling 10 ms RNC/Iub/Noted B 5 ms Gi + Iu 65 ms Sum min DL 155 ms Sum max DL
The different delay components are described below: The AMR encoder and decoder delay components includes: buffering time, due to the frame length (20 ms); look-ahead (5 ms); and processing time (10 ms and 5 ms for uplink and downlink respectively). The layer 1 and 2 processing time includes the following protocol layers: Packet Data Convergence Protocol (PDCP); Radio Link Control (RLC); Medium Access Control (MAC); and the Physical (PHY) layer. The TTI alignment delay component is needed in uplink since the packet may need to be buffered to align the transmission to the frame structure of the radio interface. Note that it is possible to adjust the speech encoder framing period to the air interface framing period to get 0 ms TTI alignment delay. Note also that EUL may use 2 ms TTIs, which would reduce this value to 0 2 ms. For downlink, the TTI alignment delay is included in the DL Scheduling delay and is therefore not specified as a separate delay component in this delay budget. The Uu interleaving consists of the actual transmission over the air interface, 10 ms and 2 ms for uplink and downlink respectively. The delay for the uplink can be reduced by using 2 ms TTIs. HARQ re-transmissions add only to the jitter but not to the fixed delay component. For uplink, since 10 ms TTIs are used in this example delay budget, the re-transmission time is estimated to 40 ms and that at most 2 re-transmissions are performed before the packet is dropped. Note that the allowed number of retransmissions, and thus the delay jitter, will be different for different implementations. For downlink, the re-transmission time is included in the variable part of the DL Scheduling delay. In this case, it is assumed that the packet is dropped if it is delayed more than 100 ms in the scheduler. Note that this delay is the sum of scheduling delay and re-transmission delays. Note also that the scheduler is vendor
ETSI
173
specific and thus the delay, and especially the variable part, depends entirely on how different vendors choose to implement it. The RNC/Iub/Node B delay number describes the RAN delays, i.e. Node B and RNC processing times and transmission delays in-between these nodes. The Core Network delay is included in the Iu+Gi delay component. Delay for the backbone network is not included in this example. In summary, the end-to-end packet delay, divided into two parts, is estimated as the following: A fixed part, which is identical to the minimum delay, i.e. 102 ms +30 ms, where the 30 ms accounts for the backbone core network delay. A variable part, which corresponds to the jitter, and is in the 0 185 ms range.
L.5.5
RN Simulator
High Speed Downlink Packet Access (HSDPA) is based on techniques such as adaptive modulation/coding and hybrid ARQ to achieve high throughput. The new channel HS-DSCH is terminated in the Node B and is applicable only to PS domain RABs. MAC-d is retained in the S-RNC, while a new entity, MAC-hs located in Node B, is introduced to host the functionalities of hybrid ARQ, rate selection, and HS-DSCH scheduling. EDCH for the uplink has the same features of fast rate scheduling, hybrid ARQ, and adaptive coding in addition to DCH. It is managed by a new entity MAC-e and terminated in Node B, while another new entity MAC-es is introduced in S-RNC to manage the re-ordering of data from different MAC-ds. The relation is shown in Figure L.5.2.
S-RNC RLC-UM (DL) MAC-d (DL) RLC-UM (UL) MAC-d (UL) MAC-es
ETSI
174
2.
Cellular Network: This consists of assumptions of the cell structure, channel models deployed, traffic load, antenna, locations of users, etc. Interactions between a reference user and the Node B is to be simulated here, for which the buffer configuration, the scheduler algorithm, the delay budget, number of users, etc. are needed. This simulator comprises the functions of Node_B and Iu interface, a part of the radio access network that is extensively simulated in 3GPP-RAN working groups. However, the simulation work done for the pure capacity has a different scope than here. The focus of the present work item is to test a single connection that is representative for the service provided by the network and the final test method is the listening test instead of statistical description. For this reason, the radio network simulator shall produce a sequence of coherent samples of error and delay events, which different objective of the simulator designed to evaluate the capacity or the channel quality based on statistic evaluation. The setup, the parameters and the working assumptions need to be designed specifically for this purpose. The expected main result of the simulation is a sequence of error and delay events with associated attributes necessary for the further processing. Details of the simulation assumptions can be found in Appendix A. Packets stream: Payload traffic of the reference user will be mapped to the bearer by adding. RLC/MAC headers and extracted from the radio bearer by stripping the RLC/MAC header The PDCP/IP/UDP/RTP/AMR packets at interfaces A11 and B11 are given to the transmission buffer of the RLC protocol working in UM. The RLC may segment the given bits to make RLC SDUs, and add RLC headers (sequence number and length indicators). By assumption, one IP packet is placed into an RLC PDU that is filled with padding bits. To simplify the implementation and facilitate the typical continuous speech tests, the design of the simulation should target on steady state of the connection. This implies that we can disregard network re-synchronization (although the terminal may engage in packet resynchronization) and set-up during the simulation. Depending on the assumptions, issues of the packaging, the segmentation and reassembly can also be ignored in case the AMR/AMR-WB frame fits into the RLC-SDU. The given time limit for the determination of the packet loss during the simulation comes from the delay budget planning, which simulates the implementation of the queuing buffers.
3. 4.
5.
Payload exchanged at the interfaces are: A21, B21: PDCP packet with ROHC received in sequence A31, B31: IP packets delivered in sequence A32, B32: IP packets received in sequence A22, B22: PDCP packets with ROHC delivered in sequence
L.5.6
Core Network
A31, B31: IP packets received in sequence A32, B32: IP packets delivered out of sequence
The network introduces time delay for the transmission. Payload exchanged at the interfaces are:
The IP packets are uniquely identified with a RLC PDU, when each AMR/AMR-WB speech frame is conveyed by a single RLC PDU. This assumption will simplify the implementation.
L.5.7
VoIP Client
The current section discusses the actions of PDCP/AMR or PDCP/AMR-WB. The PDCP entity is assumed to map to two RLC UM entities, each used for one of the two directions of the conversation, as shown in Figure L.5.3. The payload exchanged at the interfaces are: A11, B11: speech frames received in order A21, B21: PDCP packets (RLC SDU) delivered in order A22, B22: PDCP packets (RLC SDU) received in order
ETSI
175
A12, B12: speech frames delivered in order within the given time limit For the conversational tests, AMR will encode the speech at the designated rate in accordance with 26.101, to make the RTP/UDP/IP/PDCH payload. Following TS 26.236, the RTP payload format should follow the bandwidth efficient mode defined in RFC-3267, and one speech frame shall be encapsulated in each RTP packet. Header compression according to RFC 3095 and TS 25.323 will be simulated as part of the PDCP protocol. For the VoIP test we are only interested in the normal operation of the PDCP, not the session setup signalling . Lossless RLC PDU size change. This is equal to assume that the RAB remains the same during the call. The assumption reduces the simulation complexity for the RN simulator.
TX: speaker
AMR RTP UDP IP PDCP ROHC PDCP (HC)
RX: listener
AMR RTP UDP IP PDCP ROH PDCP (HC)
RLC-UM
RLC-UM
ETSI
176
L.5.8
Interfaces
The physical composition of the test system is depicted in Fig.1. It shows that an end-to-end connection between A and B consists of the following chain of entities: Sound card (A) VoIP (A) RN(A) simulator IP/Core simulator RN(B) simulator VoIP (B) Sound card (B) The figure, however, is not informative about the logical relation between the protocols that are spread in all entities. Figure L,5,5 visualizes the logical relations among the components. It helps to clarify the scope of each component simulators.
ETSI
177
IP PDC P
Figure L.5.5: Logical Relations between simulator entities and protocols. Color code:
VoIP(A) VoIP (B) RN (A) RN (B) IP/Core Speech Lab
For the convenience of verification, it is of great advantage to implement the system component-wise. Thus, the interfaces between the component simulators have to be specified. The physical interfaces are instances of 3 logical interfaces, respectively: Interface 1 ={A11,A12,B11,B12}: the interface between sound card and VoIP
Interface 2 = {A21, A22, B21, B22}: the interface between VoIP and RN Interface 3 ={A31, A32, B31, B32} : the interface between IP/Core and RN The interfaces determine the information to be exchanged between the adjacent entities in the simulator and are specified in the following.
L.5.8.1
Interface 1
This interface exchanges information regarding operation of the protocol stacks AMR/RTP/UDP/IP/PDCP/RLC and the operation of rate selection. One of the issues is the coherence of the actions when off-line simulation method is used. Since each entity is simulated independent of others and the output files of the simulation are used in a later time, the consistency of the channel conditions and the selection made by AMR at a given moment cannot be warranted unless careful measure is taken. One of the measures to maintain the coherence is to restrict the AMR/AMR-WB to a pre-selected single data rate for each test. This approach is justified by the fact that the enhanced uplink and downlink have already provided sufficient control and adaptation mechanism at the lower layers, so that the channel condition experienced by the interface 1 is sufficiently stable and would hardly require rate switching. The original concept of AMR is targeted at the balance between the individual voice quality and overall capacity. But when we fix the number of the supported users in our simulation in order to test the probe users voice quality, the capacity-quality trade-off would not occur for the simulated cases. Hence, the testing of individual coder from the AMR/AMR-WB would be sufficiently informative about the VoIP performance for the give simulation set-up.
L.5.8.2
Interface 2
The output file of the RN simulator at this interface consists of 3 columns of the following entries for a stream of RLC PDUs:
ETSI
178
Table L.5.2: Data format of interface 2 Sequence Number (int) Loss Indicator (binary) Accumulated Es/Nt after HARQ (dB) .. .. .. .. Time Stamp (int)
0 1 2
1 1 0
L.5.8.3
Interface 3
The transportation of the IP packet depends on the nodes traversed by the datagram within the IP/Core network. What really maters here is the delay and loss of a packet due to routing. This requires the IP/Core, based on a given topology [tbd] and traffic load [tbd], to generate a sequence of random events at A31 and B31, respectively, reflecting the relative delay and the loss of the packet fed into the network at A32 and B32, respectively. Alternatively, the delay and loss can be generated by an appropriate analytical model [tbd]. The file generated by the IP/Core at the interfaces A32 and B32 shall have the following format:
L.5.9
L.5.9.1
For the down link, the over-the-air delay of a speech frame is defined as the latency between the time a MAC-d PDU carrying a speech frame enters the MAC-hs priority queue in the Node-B and the time the MAC-d PDU is delivered (after reordering by the MAC-hs) to the UE. Similarly, for the up link, the over-the-air delay of a speech frame is defined as the latency between the time a MAC-d PDU carrying a speech frame enters the MAC-d of the Node-B. The delay of the network is the time consumed by a packet, while staying within the network. Therefore, it is counted as the time difference between the entry and exit of the network. The delay value for each connection is measured as the sum of the over-the-air delay for the up link and down link plus the network delay and the processing delay at both ends, when the value is within the delay budget. A speech frame is declared to be lost if one of the following is true:
ETSI
179
The MAC-d PDU is discarded at the Node-B transmitter due to expiration of the MAC-hs discard timer
The MAC-d PDU is transmitted but not successfully received post-HARQ The MAC-d PDU is successfully received after a specified delay bound
The MAC-hs discard timer and the MAC-hs T1 timer should be set appropriately for the given the over-the-air delay budget.
L.5.9.2
Error-Delay Profiles
In [2], we received samples coming from different simulation platforms Platform 1: Data contained in R1-061028.zip Platform 2: Data contained in R1-061070.zip Although both are generated following the network layout and configuration of [3], there are subtle differences beyond the schedulers and the trace lengths. The samples from the platform 1 entail 16 samples for down link and 16 samples for up link with paired channel conditions PedB_3km, PedB30km, VehA_30km and VehA_120km. The location of the reference user is fixed for all simulations. The samples from the platform 2 entail 22 samples, where 20 are for the down link and two for the up link, representing a paired channel PedB_3km. The difference between the 20 samples lies in the network load (number of users) and the location of the reference user (geometry). To capture the essential in regard of our subjective tests, the samples in the two groups have the following 4 attributes in common:
Table L.5.4: File attributes of the available data Attribute Name Link Direction Network Load Channel Model Details Up-Link, Down-link 40,45,60,80,100 PedA-3km, PedB-3km, PedB30km, VehA-30km, VehA-120 km. Number 2 5 5
Table L.5.5: Number of files and length of traces, grouped according to the network load Network Load 40 45 60 80 100 Number of Samples 4 10 4 4 14 Length without Repetition 4x60s 2x(215+155+95+55) ms 4x60s 4x60s 4x60s+2x(100+155+95+215+55)ms
ETSI
180
Network Condition
Table L.5.6: Definition of the radio network conditions Radio Network Condition Low Traffic Down Link Low Mobility Mobile High Mobility Mobile
In specifics: Low Traffic (LT): 40, or 45, or 60 mobile users per cell High Traffic (HT): 80, or 100 mobile users per cell Low Mobility (LM, Lm): ITU Channel-Model: PedB3_km or PedA3_km High Mobility (HM, Hm): ITU-Channel-Model: VehA30km or Veh120km or PedB30km
Uplink
LM.LT HM.LT
Lm Hm
The uplinks are simulated as dedicated channel, hence the traffic conditions apply only to the downlinks. From a mobile-to-mobile connection, the order of the uplink and downlink plays no role. Therefore, we have the following 8 possible construction of channel conditions:
Table L.5.7: Notation for the mobile-to-mobile radio network conditions Number [1] [2] [3] [4] [5] [6] [7] [8] Notation Lm.LT.LM Lm.LT.HM Lm.HT.LM Lm.HT.HM Hm.LT.LM Hm.LT.HM Hm.HT.LM Hm.HT.HM Meaning Lm + LT.LM Lm+LT.HM Lm+HT.LM Lm+HT.HM Hm+LT.LM Hm+LT.HM Hm+HT.LM Hm+HT.HM
X AMR-Mode
Y Network Load
z Experiment
c Swap subjects
ETSI
181
The radio network conditions are identical for all the test cases with all three codecs under test. Hence only the table for codec AMR5.9 is shown as example in the following.
Cond.No
Noise in Room A
Hoth Car Car Hoth Cafeteria Cafeteria Street Street Hoth Car Car Hoth Cafeteria Cafeteria Street Street
Noise in Room B
Hoth Car Hoth Car Cafeteria Street Cafeteria Street Hoth Car Hoth Car Cafeteria Street Cafeteria Street
Description
Comments
1-1.1 1-1.2 1-1.3a 1-1.3b 1-1.4 1-1.5a 1-1.5b 1-1.6 1-2.1 1-2.2 1-2.3a 1-2.3b 1-2.4 1-2.5a 1-2.5b 1-2.6
Lm.LT.LM LM.LT.Lm Hm.LT.HM HM.LT.Hm Hm.LT.LM HM.LT.Lm Lm.LT.HM LM.LT.Hm Lm.LT.LM LM.LT.Lm Lm.LT.HM LM.LT.Hm Hm.LT.LM HM.LT.Hm Hm.LT.HM HM.LT.Hm Lm.HT.LM LM.HT.Lm Hm.HT.HM HM.HT.Hm Hm.HT.LM HM.HT.Hm Lm.HT.HM LM.HT.Hm Lm.HT.LM LM.HT.Lm Lm.HT.HM LM.HT.Hm Hm.HT.LM HM.HT.Lm Hm.HT.HM HM.HT.Hm
sym sym asym asym sym asym asym sym sym sym asym asym sym asym asym sym
For the designated tests comprise the following components: a VoIMS sender comprising of input capture (e.g. microphone), AMR encoder, RTP packetization and IP stack, operating in real time; and a VoIMS receiver comprising of IP stack, RTP de-packetization, AMR decoder with appropriate jitter handling and an output devise (e.g. headphone), operating in real tim error-delay profiles (including error mask and time of delivery in milliseconds) are generated using offline system simulations by RAN1. The data files, sorted according to the radio network conditions, are grouped into sets that represent the final test conditions. The data files belong to the same set are concatenated so that a longer trace is made. Up link and down link traces are combined, with addition of a fixed delay value, to simulate delay and error trace of the mobile-to-mobile connection, and use the above error-delay profiles to inject delays and packet losses in the VoIMS traffic in an error insertion devise running in real time. Design and arrangement of the tests are detailed in the test plan.
ETSI
182
L.5.A.1
Network Parameters
Parameter UMTS BS Nominal TX Power [dBm] P-CPICH Tx Power [dBm] UMTS BS Overhead TX Power [dBm] including paging, sync and P/S-CCPCH UMTS UE TX Power Class [dBm] UMTS UE Noise Figure [dB] BS Antenna Gain [dBi] MS Antenna Gain [dBi] Shadowing Standard Deviation [dB] Path Loss Model: COST 231 Shadow Site to site Correlation Other Losses [dB] UMTS BS Antenna pattern beamwidth [degrees] Propagation Channel Mixture for loading users
43 33 34 21 10 17.1 0 8 -136+35.22*log10(d), d in km 50% 8 per TR 25.896 v6.0.0 A.3.1.1 65 25% AWGN 37% PedA 3 kph 13% PedA 30 kph 13% VehA 30 kph 12% VehA 120 kph Case 1: PedA 3 kph Case 2: VehA 30 kph Case 3: VehA 120 kph -18 dB -115 dBm 19 Node Bs/57 cells Geometrical centre of each sectored cell 3-Cell Clover-Leaf 2500 1990 MHz
Ec/Io Admission Threshold RSCP Admission Threshold Number of Node Bs Locations of the Reference UE Cell layout Inter-site Distance [m] Frequency
ETSI
183
L.5.A.2
100% VoIP AMR 7.95 Markov Process with 50% activity (transition probability = 0.01) (AMR
bandwidth 4 bits CMR 6 bits TOC per aggregated speech frame 7 bits padding for octet alignment (assuming no aggregation) Overhead: RTP/UDP/IPv6 uncompressed header 60 bytes Overhead: RLC-UM 2 bytes ROHC 1 byte R-0, 2 bytes UDP checksum (will be zero bytes with UDP-Lite) ROHC Resynchronization ignored RTCP Not modeled SIP Not modeled SID Frames Not transmitted Effective Data Rate with no RTP layer aggregation 10.8 kbps 216 bits (one speech frame per MAC-d PDU Size1 MAC-d PDU)
payload
ETSI
184
L.5.A.3
Other Assumptions
Parameter UMTS Time Modelled [s] Number of Simulation Runs UE Category Receiver Type
Associated DPCH Data Rate Associated DPCH Activity Factor HS-SCCH Channel Model Number Errors Impact HS-DSCH Decoding Power Allocation HSDPA Scheduler Implementation Mobility Model Downlink Over-the air Delay Budget [ms] E-DCH Scheduling E-DCH TTI length E-DCH max number of HARQ transmissions
180 9 5 Rake2 with Mobile Receive Diversity from 2 Antennas (2 Rx correlation = 0.5, mismatch 2 dB) 3.4 kbps, SF 256 5% Depends on loading Yes Fixed Offset
Static location for UE 90 Non-scheduled transmission Both 10ms and 2ms TTI 2 Tx for 10ms TTI 6 Tx for 2ms TTI
ETSI
185
L.5.A.4
Simulation Methodology
The system simulation is dynamic and includes explicit modelling of fast fading, power control, CQI generation, scheduling of users, etc. Channels that connect different transmit/receive antenna pairs are generated at the UMTS slot rate (1500Hz). The instantaneous SINR seen at each receiver is computed at the slot rate. Virtual decoders map a sequence of slot rate SINRs to block error events at the TTI rate for each physical channel. The virtual decoders must generate the same statistical block error events as the true decoders operating on a bit by bit basis in a link level simulation for the same TTI rate for each physical channel under consideration.
Inner and outer loop power control loops are explicitly modelled for the associated DPCH. The OVSF code and transmit power resources consumed by the associated DPCH and HS-SCCH channels are modelled dynamically. Errors made in HS-SCCH decoding are taken into account in determining whether the corresponding HS-DSCH transmission is decoded correctly.
The system simulation attempts to model sufficiently the MAC-d PDU flow and performance from the NodeB to the UE. Thus, the system simulation is considered an over-the-air model and does not capture impairments beyond the NodeB to UE subsystem
Bibliography
[1] 3GPP TS 25.322 RLC Protocol [2] 3GPP TS 34.108 Common test environments for User Equipment (UE) conformance testing [3] 3GPP TR 25.931 UTRAN functions, examples on signaling procedures [4] 3GPP TS 26.236 Performance characterization of the Enhanced aacPlus and Extended Adaptive Multi-Rate Wideband (AMR-WB+) audio codecs [5] 3GPP TS 25.323 Packet Data Convergence Protocol [6] 3GPP TS 25.331 Radio Resource Control Protocol [7] 3GPP TR 25.933 IP transport in UTRAN [8] 3GPP TR 25.896 Feasibility Study for the Enhanced Uplink for UTRA FDD [9] IETF RFC 3095 [10] IETF RFC 3267 [11] 3GPP TR 25.932 Delay budget within the access stratum [12] 3GPP TS 22.105 v3.6.0 Service and Capability
ETSI
186
2004-06 2007-06
Version 6.0.0 approved at 3GPP TSG SA#24 Version for Release 7 Characterisation of VoIMS over HSDPA/EUL Corrections to Characterization of VoIMS over HSDPA/EUL Characterization of VoIMS over HSDPA/EUL Conversation Tests Version for Release 8 Version for Release 9
2.0.0 6.0.0 6.0.0 7.0.0 7.0.0 7.1.0 7.1.0 7.2.0 7.1.0 7.2.0 7.2.0 8.0.0 8.0.0 9.0.0
2007-09 SP-37 2007-12 SP-38 2007-12 SP-38 2008-12 SP-42 2009-12 SP-46
ETSI
187
History
Document history
V9.0.0 January 2010 Publication
ETSI