Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
1 Energy Efficient Transmit-Receive Hybrid Spatial Modulation for Large-Scale MIMO Systems Ahmed Raafat, Merve Yüzgeçcioğlu, Student Member, IEEE, Adrian Agustin, Member, IEEE, Josep Vidal, Senior Member, IEEE, Eduard A. Jorswieck, Senior Member, IEEE and Yoann Corre, Member, IEEE Abstract—We consider a point to point large-scale multipleinput multiple-output system operating in the millimeter wave (mmWave) band and an outdoor scenario. Novel transmit and receive spatial modulation schemes are proposed for uplink (UL) and downlink (DL) data transmission phases based on a novel energy efficient hybrid user terminal architecture. The analog circuitry of the proposed hybrid architecture is divided into two stages: phase shifters and analog switches. The phase shifting stage assures high gain and overcomes the severe path-loss caused by outdoor mmWave propagation. The analog switching stage smartly allocates the antennas to be used at the phase shifting stage and combats the spatial correlation. We provide the analysis of the spectral efficiency (SE) of the UL and DL systems. Next, we propose a reduced complexity algorithm that jointly optimizes the analog beamformer and combiner design of the UL and DL circuitry to maximize the energy efficiency (EE). Finally, we compare and evaluate the performance of the proposed algorithm in terms of the SE and EE assuming both stochastic and realistic channel models. I. I NTRODUCTION W ITH the advances in communications technology and the new wireless-based applications and services, it is forecasted that the number of connected devices will reach tens of billions by 2030 [1]. In addition, every device connected to the network will demand an average data rate increase of 1000x compared to today’s networks. The shortage of bandwidth in low frequency bands has led to the consideration of migrating to the higher frequencies (mmWave) bands, thus attracting the research community as a way to deliver expected large traffic demands [2], [3]. Shortcomings of the mmWave band are the severe pathloss and sensitivity to blockage [4]. Large-scale multiple-input multiple output (MIMO) technology is one of the promising candidate technologies to combat these challenges [5]. By increasing the number of antennas, better performance can be A. Raafat, A. Agustin and J. Vidal are with the Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona 08034, Spain (e-mail: ahmed.raafat@upc.edu; adrian.agustin@upc.edu; josep.vidal@upc.edu). M. Yüzgeçcioğlu and E. A. Jorswieck are with the Department of Electrical Engineering and Information Technology, TU Dresden, 01062 Dresden, Germany (e-mail: merve.yuzgeccioglu@tu-dresden.de ; eduard.jorswieck@tudresden.de). Y. Corre is with Siradel, 35043 Rennes Cedex, France (e-mail: ycorre@siradel.com). The research leading to these results has been funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowzka-Curie grant agreement No. 641985, the project 5G&B RUNNERUPC (TEC2016-77148-C2-1-R (AEI/FEDER, UE)) and the Catalan Government (2017 SGR 578 - AGAUR). achieved in terms of a trade-off between antenna gain and degrees of freedom [6]. Linear precoding schemes can be employed to reduce the complexity of the system drastically [7]. Having accurate knowledge of the channel state information (CSI) in transmission is crucial in the design of the precoder that nulls out the unintended space-time-frequency dimensions. A larger number of antennas comes together with an increased complexity and bandwidth usage required to precisely acquire CSI. A pilot-based channel estimation using polynomial expansion proposed in [8] and a low complexity adaptive compress sensing based algorithm proposed in [9] are two among the many low complexity methods proposed to estimate the CSI of large-scale MIMO systems. Fully digital architectures such as block diagonalization algorithm in [10] have been proven to achieve high performance. However, implementation of a fully digital architecture at a terminal with large number of antennas is not practical due to space limitations and energy consumption constraints. Therefore, hybrid architectures have raised interest as an attractive solution that allows designs with a reduced number of radio frequency (RF) chains by combining analog beamforming together with digital precoding [11], [12]. Various algorithms have been proposed that configure fully connected hybrid precoders at the transmitter side and analog combiner at the receiver end with a small training and feedback overhead [13]. In this study, the authors have shown that the hybrid beamforming system achieves higher data rate values compared to the analog beamforming and closely approach the performance of fully digital beamforming. However, the fully connected hybrid MIMO architecture proposed there comprises many analog devices (phase shifters, power splitters and combiners) which entail higher power consumption than fully digital MIMO designs [14], [15]. Abandoning the fully connected hybrid architecture results in degradation of the spectral efficiency (SE) even with the use of the most sophisticated algorithms in the design the sub-connected hybrid precoders and combiners as illustrated in Fig. 10 in [14] and Fig. 3 in [16]. In [14], the authors studied the SE − EE trade-off of sub-connected hybrid architectures that consist of phase shifters or switches. They showed that a better EE is attained with the proposed hybrid architecture at the cost of losing SE when compared with the fully connected architectures. Spatial modulation (SM) techniques have been initially introduced for sub-6 GHz with the aim of reducing the number of RF chains and minimizing the hardware cost and power consumption [17]. They are able to enhance SE by exploiting the spatial dimension. The concept can be applied 2 at the transmitter (transmit SM (TSM)) [17] or at the receive (receive SM (RSM)) [18]. In TSM, one antenna is active during the transmission and the remaining are silent and thus, only one RF chain is needed. Part of the input data bits is mapped into the index of the active transmit antenna and the other part is mapped into M -ary modulation symbol to be transmitted from the active antenna. After that, the receiver applies maximum likelihood (ML) detector to jointly detect the index of the active transmit antenna and the M -ary symbol assuming that the CSI is available at the receiver. For the sake of improving the SE, generalized SM (GSM) techniques have been developed to enable activating a set of transmit antennas instead of one transmit antenna for the TSM systems [19]. However, TSM schemes suffer from low antenna gain because most of the transmit antennas are silent which is an impairment at higher frequencies (28 GHz) where large beamforming gain is needed to combat the severe path-loss. Hybrid TSM (HTSM) schemes have been reported in [20], [21] to exploit the phase shifters in attaining high beamforming gain where SM bits are mapped into a group of antennas each connected to phase shifter instead of single antenna of TSM. However, outdoor propagation of mmWave signals exhibits poor scattering which entails spatially sparse and rank deficient channel matrices as explained in Sec. V-C and Fig. 10 in [22]. Thus, the performance of HTSM schemes in [20], [21] can be highly degraded if the receiver cannot distinguish between correlated phase shifters groups. On the other hand, in RSM, a subset of receive antennas are active during a transmission. In contrast to the previous case, the SM bits are devoted to indicate the active receive antennas whilst the other bits are mapped into M -ary symbol [18]. In RSM, the transmitter applies zero forcing (ZF) precoding assuming CSI knowledge and the receiver applies the ML principle to jointly detect the index of the active receive antennas set and the M -ary symbol. In the literature, RSM systems adopt fully digital MIMO architectures and thus exhibit high power consumption especially for systems operating at mmWave band. Moreover, ZF precoders suffer from performance degradation in rank deficient MIMO channels. In [23], the authors developed an RSM scheme for indoor propagation of mmWave signals where they control the inter antenna spacing to ensure orthogonal MIMO channel. However, the proposal is not practical in outdoor scenarios as the large transmitter-receiver separation requires large inter antenna spacing. In [24], the authors developed HTSM in the uplink (UL) and RSM in the downlink (DL) assuming outdoor propagation of mmWave signals. However, they do not consider a hybrid structure for the RSM in DL and rely on a rigid hybrid architecture for the UL TSM. In [25], [26], the authors developed HTSM based on partially connected hybrid transmitter and fully digital receiver. The works in [24]-[25], [26], consider that the phase shifting groups, the number of phase shifters per group and the set of antennas per the phase shifting group are fixed which is clearly suboptimal in terms of EE and SE. In this paper, we consider a point to point1 large-scale 1 The extension to multiple users scenario in the DL (broadcast channel) or in the UL (multiple access channel) is the topic of a forthcoming publication. MIMO system that operates in a mmWave outdoor narrowband channel scenario and we tackle the drawbacks of the SM techniques (small antenna gain and MIMO channel rank deficiency). The fully connected hybrid MIMO depends on large number of phase shifters. The power consumption of the switch is much less than that of the phase shifter. Thus, we consider user terminal (UT) architecture consists of partially connected phase shifters network and switches network. Specifically, we propose an energy efficient UL and DL hybrid design that adopts TSM during the UL and RSM during the DL. In order to cope with the small antennas gain of SM schemes, we consider a hybrid architecture in both UL and DL to maintain high beamforming and combining gains, respectively and combat the severe path-loss of the outdoor mmWave propagation. In order to ensure full rank MIMO channel condition, we propose a novel and flexible architecture for the UT, whereby we optimize a number of uncorrelated phase shifters groups, number of antennas per group and the set of antennas inside each group with the goal of maximizing the EE at the UT. The major novelty and contributions of this paper are as follows • • • • We propose novel UT architectures consisting of two stages analog beamformer in the UL and combiner in the DL. We consider an analog phase shifter stage to attain high gain and combat the severe path-loss, and apply the analog switches stage to perform antenna selection and grouping to maximize the EE at the UT. We exploit the spatial modulation principles to transmit two streams (spatially modulated stream and conventionally modulated stream) using a single RF chain. Specifically, in the UL, we propose an HTSM scheme and present the analytical system model followed by two detection schemes. First, we apply an ML detector and prove closed form expressions of the SE using the mutual information. After that, we propose a reduced complexity detector with two combiners (optimal and equal ratio). In DL, we propose an HRSM scheme with a reduced complexity detector that can be implemented using the energy efficient UT architecture proposed in Fig. 1 and then, we prove a closed form expression of its SE. We propose a reduced complexity and efficient optimization algorithm to jointly design the precoder for the UL transmission and the combiner for the DL transmission with the purpose of maximising the EE at the UT. Specifically, the proposed algorithm jointly optimises the number of uniform linear arrays (ULA) phase shifters groups, the set of selected antennas per group and the transmit powers for the spatial symbols both in UL and DL transmissions. We evaluate the system performance by adopting a theoretical channel model and a realistic ray-trace based channel model to validate the results in real world like scenarios. Moreover, we compare the proposed scheme with state of the art SM and hybrid precoding systems. The rest of the paper is organized as follows. The system model is introduced in Sec. II with the assumptions and the adopted channel models. In Sec. III, the UL HTSM 3 . . . N. a . . Σ 1 S Each phase shifter is connected to at most one antenna . . . Ng . . . . . . N. a . . f1 LNA Power amplifier yk LNA x̂DL ŝ1 t = Ts . At most one . . switch is closed . . . N. a . . . . . Σ Ng NU . S . . . . . N. a . . PA S Σ . . . PA Ng S . . . RF Chain DAC PA ADC 1-bit ADC AD xU L γ̂ ŝDL Σ yc RF Chain 1-bit ADC AD t = Ts fNg S Σ Phase shifter Low noise amplifier AD Amplitude detector ŝNg LNA Σ S Σ S Adder/Splitter DAC Digital to analog combiner Adder, Splitter ADC Analog to digital combiner Fig. 1: Block diagram at the user terminal, with transmit and receive circuitry in red and grey respectively. Black elements (like antennas or phase shitfers) are common. For a given number of active groups and phase shifters per group, the structure of the matrices ASW and APS are detailed in tables I and II for UL and DL, respectively. Although the proposed architecture consists of single RF chain, we transmit two streams: Spatially and conventionally modulated streams. and the DL HRSM systems are designed and the analytical SE expressions are derived. Low complexity optimization algorithms for the UL and DL systems are proposed in Sec. V. The system performance is evaluated both in stochastic and deterministic channel environments in Sec. VI. Finally, the paper is concluded in Sec. VII. We adopt the following notations. X(i) for the ith column of X, X(i:j) for the matrix contains from ith to the j th columns of X, X(i,j) for the entry at the ith row and j th column of X, Tr {X} denotes the trace operator, diag{x} denotes the diagonal matrix with elements of vector x on the diagonal, XH denotes the transpose-conjugate operation and XT denotes the transpose operation. II. S YSTEM AND CHANNEL M ODELS A. Transceiver architecture and system assumptions The manufacturing cost and the battery lifetime of the UT are serious issues for wireless 5G modem industry. Having in mind both aspects, in Fig. 1, we propose a novel energy efficient hybrid UT architecture for the UL and DL transmissions that comprises a low number of power hungry devices (a single RF chain and a single high resolution analog-todigital converter (ADC) regardless the number of antennas at the UT) and power efficient devices (RF amplitude detectors (AD) [27], 1-bit ADCs, switches and phase shifters). We consider a fully connected hybrid base station (BS) architecture [11] and assume a few number of RF chains that ensure the hybrid precoding/combining exactly implements a digital precoding/combining. This can be achieved if the number of RF chains is larger or equal to number of channel scattering clusters (C) [28] or at least twice the data streams [15]. The proposed hybrid architecture at the UT consists of two analog stages. The phase shifting stage provides high transmit beamforming gain and high receive combining gain during UL and DL transmissions, respectively. The phase shifting architecture consists of Ng groups of linear antenna arrays where each group comprises Na phase shifters. As we map the spatial bits into phase shifting groups, we consider analog switches stage to obtain uncorrelated groups. This can be achieved by smartly mapping the antennas among the phase shifting groups. Specifically, each antenna can be connected to any phase shifter and hence Na Ng (NaUL NgUL for the UL and NaDL NgDL for the DL architectures) switches per antenna are required. The maximum number of groups is Ng = C (around 7 as shown in the realistic urban scenario channel results in Sec. VI-B) and the maximum of phase shifters per group is Na = NU . Within a specific linear antenna array group, each phase shifter is connected to a distinct antenna but different groups can share the same antenna. The number of active phase shifting groups {NgUL , NgDL }, antennas inside the groups and the number of active phase shifters per group {NaUL , NaDL } are determined to maximize the EE at the beginning of each coherence time by employing the low complexity optimization algorithm proposed in Sec. V. The DL and the UL power consumption of the proposed UT can be expressed as PcDL = NaDL NgDL (PSW + PPS ) + NgDL PLNA + PRF + PADC + PBB , PcUL = NaUL NgUL (PSW + PPS ) + NgUL PPA + PRF + PDAC + PBB ,  (1) where NaUL NgUL , NaDL NgDL switches are on during the coherence time in the UL and DL transmissions, respectively and the rest of switches remain off. The power consumed by UT devices at 28GHz [14]-[29] can be modelled as PPS = PLNA = Pref , PADC = PDAC = FoM × fs × 2n , 4 PSW = 0.25Pref , PRF = 2Pref , PBB = 10Pref ,   1 PPA = − 1 Pt . η (2) Therein, PLNA refers to the power consumption of the low noise amplifier (LNA) and it is taken as the reference Pref in the hardware circuitry. Furthermore, the power consumption of the remaining hardware elements such as phase shifter (PS), power amplifier (PA), ADC, digital-to-analog converter (DAC), switch (SW), RF chain and baseband computation (BB) are defined by using the reference power consumption value Pref , fs is the sampling frequency that equals to twice the bandwidth, n refers to number of ADC/DAC bits, FoM is the figure of merit that depends on the technology and takes value of 34.4 fJ/Conv.-step at n = 12 and fs = 600-MS/s [30], η is the power amplifier efficiency that takes value 40% at 28 GHz [31] and Pt is the transmit power. The power consumption of the AD is negligible. Each UT antenna is connected to a splitter/adder block that is activated either during the DL or the UL phase. The splitter functionality is utilized to split the received signal from the designated antenna to the phase shifters through the switches to perform analog combining at the DL receiving phase. The adder block is utilized to add the signals coming from different phase shifters to the selected antennas through the switches to perform analog beamforming during the UL transmission phase. Only those antennas that are selected as active by the optimization algorithm (see Sec. V) contribute to the communication and the rest remains idle. The incoming data stream prior to the transmission comprises of two parts. The former is modulated according to a conventional M -ary modulation scheme, the latter is mapped onto the indices of the active antenna groups. Under the TSM principle, during the UL phase, NgUL groups of phase shifters transmit the same M UL -ary modulation symbol from the activated antennas either with high power (UL spatial bit 1) or with low power (UL spatial bit 0) that results in transmitting NgUL spatial modulated bits and log2 M UL conventionally modulated bits. The BS employs ZF combiner to detect the UL spatial and modulation symbols. Similarly, in the DL phase and under the RSM principle, the BS applies ZF precoder in such a way that NgDL groups of phase shifters receive the same M DL -ary modulation symbol from the activated antennas either with high or low power based on the DL spatial symbol that results in receiving NgDL spatial modulated bits and log2 M DL conventional modulated bits. The duplexing protocol is assumed to be time-division duplex (TDD) where the CSI is needed only at the BS and the channel reciprocity is assumed. The BS can acquire the CSI during the UL training phase by any method, for instance the authors in [9] exploit the spatially sparse nature of the outdoor mmWave channel in developing low complexity adaptive compress sensing based algorithms. Imperfect CSI at the BS with ZF precoder can be accurately modelled as an increase in the noise power [32]. Thereafter, the BS runs the optimization algorithm detailed in Sec. V to determine the system parameters detailed in Sec. III and Sec. IV. The BS informs the UT about the results of the optimization algorithm during the DL training phase. Since the information required by the UT is limited, the DL training phase results in a low training overhead. Moreover, the BS applies the ZF precoder and the combiner to employ during DL and UL transmissions, respectively using a hybrid architecture like the one proposed in [28]. B. Stochastic channel model In the mmWave band, the number of scatterers typically assumed to be a few, as a result of the severe path-loss of the waves traveling at high frequencies. In order to take this effect into account in the system performance evaluation, we adopt a geometry-based channel model [22] whereby the UL channel matrix is given by r C NBS NU X H gi vr (θi ) vt (φi ) . (3) H= Pl C i=1 Herein, Pl is the path-loss of the channel H ∈ CNU ×NBS between the BS and the UT, gi is the gain of the i-th path that follows a complex Gaussian distribution as CN (0, 1), θi ∼ U [−π/6, π/6] and φi ∼ U [−π/2, π/2] represent the azimuth angles of arrival at the BS and departure from the UT. By assuming ULA, transmit and receive array response vectors of the i-th path vt (φi ) and vr (θi ) are generated as T  v(ϕ) = 1/N 1, ejkd sin(ϕ) , ..., ej(N −1)kd sin(ϕ) , where ϕ is the angle of the considered path, N is the number of elements in the array, k = 2π λ where λ is the signal wavelength and d = λ2 is the inter-elements spacing. The channel model in (3) can be decomposed as H = Ar DAH t where Ar ∈ CNU ×C and At ∈ CNBS ×C comprise the array response vectors of all the paths Ar = [vr (θ1 ), vr (θ2 ), . . . , vr (θC )] and At = [vt (φ1 ), vt (φ2 ), . . . , vt (φC )]. The diagonal matrix D ∈ CC×C has the p complex path gains and the path loss at the diagonal entries NBS NU /Pl C[g1 , g2 , . . . , gC ]. In this study, we adopt the stochastic channel model given in (3) in order to evaluate the system performance. Although the considered channel model is widely used in literature and provides analogous model for mmWave channel environment, real world channels are highly dependent on the propagation scenario. Therefore, we validate the performance of the proposed system model in Sec. VI-B with the channels predicted from the ray-based Volcano technology by SIRADEL [33]. III. U PLINK H YBRID T RANSMIT S PATIAL M ODULATION We propose a novel two stages analog precoding aided HTSM scheme assuming ZF combiner at the BS and energy efficient UT architecture. In order to combat the severe pathloss associated with the mmWave propagation, we apply a transmit beamformer stage that consists of linear arrays of phase shifters. As the application of the ZF combiner at the BS requires a full-rank MIMO channel, we use at the UT analog switches stage to perform antenna selection and grouping that reduces the correlation among the UT antennas and ensures the rank condition. Antenna selection is needed even if with MMSE combiner, as illustrated in [34]. The design of the UL transmitter is done in the following steps. First, the phase shifters stage consists of NgUL transmit analog beamformers that are employed to boost the transmit beamforming gain. Each analog beamformer contains NaUL active phase shifters. 5 xUL UL modulation symbol ∈ C1×1 sUL i UL spatial symbol ∈ RNg UL ×1 TABLE I: Elements of the uplink signal model   with E xUL xULH = 1 UL UL Ng , i = 1, · · · , 2Ng mapped from NgUL bits from the incoming data bits with Pr(sUL i ) = 1/2 UL UL UL UL the mapped version of the spatial symbol sUL i to high and low amplitudes {aH = 1 − a0 , aL = a0 }; respectively and tUL i {0 6 aUL 0 6 1 } 2 AUL PS UL analog phase shifters matrix ∈ CNa flUL UL analog beamformer response vector ∈ CNa AUL SW UL analog switches matrix ∈ RNU ×Na UL NgUL ×NgUL UL UL , AUL PS = blockdiag{f1 , · · · , fN UL } g UL UL NgUL ×1 UL (i,j) , ASW Next, the switches stage is used to combat the spatial correlation among the UT antennas and as a result the received power at the BS is maximized. This is achieved by selecting the best number and the best set of the UT antennas to be connected to each transmit analog beamformer. Mapping the incoming UL bit stream to symbols is performed in two parts, the first NgUL bits (spatially modulated bits) are mapped into transmit power levels of the NgUL transmit analog beamformers such that the ith beamformer transmits high or low power if the ith spatial bit is 1 or 0, respectively. The remaining log2 M UL bits (modulation bits) are modulated using standard M UL -ary modulation schemes. As a result, the signal transmitted to the BS can be expressed as p UL UL UL xUL αUL Pt AUL with t = SW APS ti x UL UL UL (4) tUL i = (1 − 2a0 )si + a0 1NgUL , where the details of the parameters are given in Table I. Moreover, flUL is the phase shifters response vector so it has UL(i) UL(j) constant amplitude and satisfies |fl | = |fl | ∀ i, j, l = UL UL 1, · · · , Ng , ASW connects the phase shifters to the UT UL (i,j) antennas in such a way that ASW = 1 when the j th phase th shifter represented by the j column of AUL SW is connected to the ith UT antenna represented by the ith row of AUL SW UL(j) and hence, kASW k0 = 1, j = 1, · · · , NaUL NgUL . The phase shifter inside a specific beamformer is connected to distinct UT UL(i,(k−1)NaUL +1:kNaUL ) antenna and thus AUL k0 ∈ SW satisfies kASW UL UL {0, 1}, i = 1, · · · , NU , k = 1, · · · , Ng . Finally, α is the coefficient that ensures the constant average transmit power and can be expressed as αUL = = 1 E  UL UL UL 2 kAUL SW APS ti x k2  1  , UL UL ULH ULH Tr AUL A R ss APS ASW SW PS N UL with RUL ss = 1 UL 2Ng g 2X ULH tUL . i ti (5) i=1 As an illustrative example to link matrix AUL SW and the proposed architecture in Fig. 1, let us consider NU = 4, NgUL = 2, NaUL = 2 and the UL switching matrix is   1 0 0 0  0 0 0 0   AUL (6) SW = 0 0 1 0 , 0 1 0 1 ∈ {0, 1}, i = 1, · · · , NU , j = 1, · · · , NaUL NgUL , The first row of AUL SW in Eq. (6) shows that the first antenna is connected to the first phase shifter of the first group. The third row of AUL SW shows that the third antenna is connected to the first phase shifter of the second group. Similarly, the fourth row of AUL SW describes that the fourth antenna combines the signals coming from second phase shifter of the first group and second phase shifter of the second group. Our target in Sec. UL UL V, Algorithm 2 is to jointly optimise aUL 0 , APS and ASW to maximize the EE under SE requirements. The received symbol at the BS is p UL UL UL + nUL . (7) rUL = αUL Pt HUL AUL SW APS ti x Therein, rUL ∈ CNBS ×1 is the received signal vector at the BS and nUL ∈ CNBS ×1 is the noise vector with independent and identically distributed (i.i.d.) circularly symmetric complex Gaussian elements CN (0, σ 2 ). Furthermore, the effective UL channel as HUL = e i h matrix can be indicated UL UL UL UL UL UL UL H ASW APS = H1 f1 , · · · , HN UL fNgUL ∈ CNBS ×Ng g UL((k−1)N UL +1:kN UL ) UL UL a a where HUL ∈ CNBS ×Na is the k = H ASW th effective sub-channel matrix of the k beamforming group. In the UL reception, the BS applies a ZF combiner to enable the spatial and modulation symbols detection as follows yUL = WrUL , (8) where the ZF combiner matrix is computed as W = −1 HULH HUL HULH and can be implemented using the e e e hybrid architecture proposed in [28]. In Sec. V, we select the beamforming groups and the antennas per group to ensure full rank effective channel HUL e . The post-processed signal UL yUL ∈ CNg ×1 comprises the spatial and modulation symbols. Hence, the k th entry of yUL takes the following values (√ ′ UL αUL Pt aUL + nk if sUL UL H x ik = 1 yk = √ UL (9) ′ UL UL α Pt aL x + nk if sUL ik = 0 ′ ′ where nk ∈ CN (0, σk2 ) is the k th entry of the post′ ′ processed noise variable n = WnUL with variance σk2 =   (k,k) WWH σ2 . A. Uplink maximum likelihood detector Since we assume that the number of RF chains at the BS is at least C, we can jointly detect the spatial and the modulation symbols using an ML approach as:  UL UL   UL ŝ , x̂ = max f yUL |sUL i , xj UL sUL i ,xj 6   √ −1 UL Rn′ 2n′ yUL − αUL tUL x i j = min UL sUL i ,xj 2 , (10) 2 where Rn′ n′ = σ 2 WWH . Although the ML detector provides optimal performance, exhaustive search in Eq. (10) is computationally complex if a large number of bits per spatial symbol is being transmitted. Since our goal is to maximize the EE under SE constraint, we prove in the sequel an expression for the SE of the ML detector assuming Gaussian xUL , and propose a low-complexity detector assuming M -PSK xUL . Finally, we show that the SE of the reduced complexity detector achieves a tight lower bound on the mutual information. B. Spectral efficiency of the maximum likelihood detector with Gaussian xUL metric to evaluate the proposed scheme. By applying the mutual information chain rule, the SE can be defined as    I sUL , xUL ; yUL = I sUL ; yUL + I xUL ; yUL |sUL ,    I sUL ; yUL = h yUL − h yUL sUL , ISUL ,  UL I xUL ; yUL |sUL = h(yUL |sUL ) − h(yUL |sUL , xUL ) , IM . (11) Assuming Gaussian-distributed xUL , the differential entropies in Eq. (11) can be computed from Z  h(yUL ) = − f yUL log2 f (yUL )dyUL (12) UL C N UL f (yUL ) =   Pr sUL f yUL sUL i i i=1 = N UL 1 NgUL 2 Σi = α UL g 2X 1 π NgUL i=1 Pt tULH tUL i i |Σi | ULH UL Σ−1 i y h(y )= UL f (ykUL |yk−1 ,··· N UL h(yUL |sUL ) = = UL Ng 1 NgUL 2 + g X 2X g 2X i=1  Pr sUL h(yUL |sUL = sUL i i ) UL ULH log2 (πe)Ng αUL Pt tUL + R n′ n′ i ti i=1 N UL = 1 UL 2Ng g 2X i=1  + log2 Rn′ n′ NgUL log2 (πe)  1 + αUL Pt tULH R−1 tUL i n′ n ′ i + R n′ n′ . (13) UL h(ykUL |yk−1 , · · · , y1UL ) with k=2 UL , y1 ) = k 2  1 X UL , · · · , y1UL , tUL f ykUL yk−1 i (1 : k) . k 2 i=1  . (14) (15) Based on the low computational complexity method of evaluating h(yUL ) and the closed from of h(yUL |sUL ), we can evaluate the spatial rate ISUL in Eq. (11) efficiently. Moreover, the differential entropy h(yUL |sUL , x) can be expressed as UL h(yUL |sUL , x) = h(n ) = log2 (πe)Ng Rn′ n′ . with N UL h(y1UL ) Therefore, we can determine the k th differential entropy UL h(ykUL |yk−1 , · · · , y1UL ) by evaluating the double integral numerically regardless the size of yUL . Thus, we significantly reduce the computational complexity. As an illustrative example that highlightsthe proposed reduced complexity method of evaluating h yUL , let us consider NgUL = 3. The proposed 2- D integral method achieves the same exact values of h yUL as the 16-D integral in Eq. (12). The differential entropy h(yUL |sUL ) can be expressed in closed form as ′ e−y The closed form expression of the differential entropy of a Gaussian mixture h(yUL ) is unknown [35]. Moreover, the numerical evaluation of h(yUL ) is computationally complex especially in large-scale MIMO systems when the size of yUL can be large. In the sequel, we propose a novel and reduced complexity method to evaluate h(yUL ). In this method, we apply the conditional entropy chain rule on h(yUL ) and we prove a closed form expression for the conditional probability UL density function f (ykUL |yk−1 , · · · , y1UL ). Finally, we reduce the computational complexity by simplifying the integral in Eq. (12) to be sum of double integrals as follows UL 2 The proof of Lemma 1 and the values of σK and Pi,K are in Appendix A. Ng where f (yUL ) is the probability density function (PDF) of the complex Gaussian mixture random vector g 2X Lemma 1. The conditional density function of the random UL , · · · , y1UL , tUL variable (ykUL yk−1 i (1 : k)) is distributed as a zero-mean complex Gaussian :   UL 2 , · · · , y1UL , tUL f ykUL yk−1 i (1 : k) = CN 0, σK + Pi,K . (16) UL in Eq. (11) can be According to Eq. (15) and Eq. (16), IM expressed in closed form as N UL UL IM = 1 2 NgUL g 2X i=1   UL log2 1 + αUL Pt tULH . R−1 ′ ′ ti i n n (17) Finally, according to Eq. (11), the UL SE can be evaluated as UL SEUL = ISUL + IM . C. Uplink reduced complexity detection with M -PSK xUL Let us propose a low complexity detection method whereby the spatial and the modulation symbols are detected separately. The size of the search space for the reduced complexity detector is NaUL + M which is much smaller than that of the UL ML detector (2Na ×M ). In Fig. 2 in [36], the authors showed that constant amplitude constellations (M -PSK) achieves the best performance with the reduced complexity SM detector. From Eq. symbol detector is t̂UL = √ (10), the ML spatial UL UL UL UL x y / α Pt . However, x is unknown so we exploit the fact that M -PSK modulation symbols have constant amplitude and hence, |xUL | = 1. Then, in order to detect the k th binary 7 spatially modulated bit, we compare the absolute value of ykUL with a threshold γ UL as follows ( 1 if |ykUL | > γ UL UL with ŝik = 0 if |ykUL | < γ UL  1 p UL UL α Pt aUL . (18) γ UL = H + aL 2 According to Eq. (10), the optimal modulation symbol detector can be expressed as n o UL ULH ULH −1 y xUL⋆ = max ℜ x t R ′ ′ j j i n n xUL j ∈M −PSK = min xUL j ∈M −PSK OCH UL xUL y j − vi UL tUL = HULH HUL viOC = R−1 e e ti , n′ n′ i 2 with (19) where viOC is the optimal modulation symbol combiner. Since tUL is unknown at the UT. We propose to use the detected i spatial symbol in Eq. (18). Therefore, the combined signal can be expressed as p UL UL ycUL,OC = αUL Pt t̂ULT HULH HUL l e e ti x + t̂ULT HULH nUL . e l where |ykUL | is Ricean distributed and Q1 (.) is the first order Marcum-Q-function. The combined signals in Eq. (20) and in Eq. (21) include one M -PSK symbol. Thus, the UL modulation rate with OC UL,OC UL,ERC (IM,r ) and with ERC (IM,r ) can be expressed by the MISO rate expression with the asymptotic M -PSK shaping loss approximation [38], [39]  UL,OC IM,r = I xUL ; ycUL,OC ŝUL , sUL 1TNgUL WnUL . (21) UL Ng N UL = g 2X i=1 (20) Eq. (20) shows that the SNR perceived for the spatial symbol detection affects the modulation symbol detection. Thus, we propose the use of an equal ratio combiner (ERC) combined signal that is independent from the detected spatial symbol as p UL ycUL,ERC = 1TNgUL yUL = αUL Pt 1TNgUL tUL i x + Herein, the entropy function is H (p) = −p log2 (p) − (1 − p) log2 (1 − p) and the probabilities of the false detection of the spatially modulated bits 1 and 0 are   UL UL UL UL P1k = Pr ŝUL ik = 1|sik = 0 = Pr |yk | > γ |sik = 0  p ′ √ ′ UL (24) 2αUL Pt aUL = 1 − Q1 H /σk , 2γ /σk ,   UL UL UL UL P0k = Pr ŝUL ik = 0|sik = 1 = Pr |yk | < γ |sik = 1  p ′ √ ′ UL (25) 2αUL Pt aUL = Q1 L /σk , 2γ /σk , where Pr sUL i  2X UL Pr ŝUL l |si l=1 UL 2 |t̂ULT HULH HUL e e ti | l SNRUL i,l = ULT 2 kt̂l HULH k2 e NgUL Pr UL ŝUL l |si  = Y k=1  and Pr sUL = i 1 2 ŝUL lk =1 NgUL ŝUL lk =0 SNRUL i = UL UL SEUL r = IS,r + IM,r . (22) We transmit binary spatial symbol with input sUL ∈ ik {0, 1}, k = 1, · · · , NgUL and output ŝUL ∈ {0, 1} as in Eq. (18) ik UL and thus, the wireless channel between sUL ik and ŝik can be characterized by the binary asymmetric channel (BAC) [37]. UL Hence, the UL spatial rate (defined as IS,r ) can be expressed UL as a contribution of Ng parallel BACs as N UL UL IS,r UL = I s , ŝ NgUL = X k=1 H  UL  = g X I sUL , ŝUL k=1 P0k + 1 − P1k 2  −  H(P0k ) + H(1 − P1k ) . 2 (23)  αUL Pt , σ2 ! , (26) D. Spectral efficiency of the reduced complexity detector with M -PSK xUL N UL = 4π SNRUL i,l e . UL,ERC = I xUL ; ycUL,ERC sUL IM,r g 2X  Pr |ykUL | ≷ γkUL | sUL ik After that, we apply minimum distance detector on the combined signal to detect the M -PSK symbol. In the sequel, we provide closed form expressions for the UL SE for both combiners. As we detect the spatial and the modulation symbols independently, the SE of the proposed reduced complexity detection UL HTSM scheme can be expressed as 1 log2 2 Pr sUL i i=1 UL 2 |1ULT NgUL ti |  1 log2 2 αUL Pt . 2 σ2 k1ULT N UL Wk2  4π SNRUL i e  where (27) g According to Eq. (23), for higher spatial  rates, the difference UL between the amplitude levels aUL , a should be maximized. H L In contrast, the signal-to-noise-ratio (SNR) of the modulation symbol shown in Eq. (26) and in Eq. (27) increases with both UL aUL H and aL . Thus, in Algorithm 2 in Sec. V, we optimise the UL values of aUL H and aL to maximize the sum of spatial plus modulation rates. To conclude the evaluation of performance for the UL, let us compute the mutual information between inputs (sUL , xUL ) and the outputs (ŝUL , ycUL,ERC ) and show that the proposed low complexity detection scheme with ERC achieves a tight lower bound on the mutual information as follows   I sUL , xUL ; ŝUL , ycUL,ERC = I sUL , xUL ; ŝUL  + I sUL , xUL ; ycUL,ERC ŝUL with    I sUL , xUL ; ŝUL = I sUL ; ŝUL + I xUL ; ŝUL sUL ,   I sUL , xUL ; ycUL,ERC ŝUL = I sUL ; ycUL,ERC ŝUL  + I xUL ; ycUL,ERC ŝUL , sUL . (28) 8 Note that, since xUL belongs to the constant amplitude constellation (M -PSK), the received power levels do not depend on xUL . Thus, the detected spatial symbol ŝUL and xUL are independent and they are also independent given sUL , this implies I xUL ; ŝUL sUL = 0. Eq. (21) shows that ycUL,ERC does not ŝUL and there depend ULon UL,ERC UL UL,ERC UL UL sUL and = I x ; yc fore, I x ; yc  ŝ , s UL UL,ERC UL UL,ERC UL = I s ; yc . Finally, Eq. (28) can ŝ I s ; yc be simplified as follows   I sUL , xUL ; ŝUL , ycUL,ERC = I xUL ; ycUL,ERC sUL   + I sUL ; ŝUL + I sUL ; ycUL,ERC ,  UL UL,ERC = SEUL . (29) r + I s ; yc Therefore, the SE in Eq. (22) is a lower bound on the mutual information with a gap ratio  Lg = I sUL ; ycUL,ERC /SEUL (30) r .  The mutual information I sUL ; ycUL,ERC can be bounded as    I sUL ; ycUL,ERC = h ycUL,ERC − h ycUL,ERC |sUL with    h ycUL,ERC 6 h ycUL,ERC |sUL = 1NgUL  (a) 1 2 and = log2 4π 3 e max (Pi )σc,ERC 2 N UL h ycUL,ERC |s  UL = 1 NgUL 2 g 2X h ycUL,ERC |sUL = sUL i g 2X  1 2 log2 4π 3 ePi σc,ERC 2 i=1 N UL = 1 UL 2Ng  1 I sUL ; ycUL,ERC 6 log2 2 i=1 2 max (Pi ) r Q2NgUL . UL Ng i=1  (31) Pi UL 2 UL 2 ULT 2 2 Therein, Pi = |1ULT N UL ti | α Pt , σc,ERC = k1N UL Wk2 σ , g g 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 IV. DL H YBRID R ECEIVE S PATIAL M ODULATION 4 5 6 7 8 9 10 UL Fig. 2: Upper bound of the gap ratio of the SEUL r at different values of Ng . combiner at the UT boosts the received power at the UT and enhances SE. Similarly to what was done for the UL, the incoming DL bit stream is mapped into two streams. The first NgDL spatially modulated bits are mapped to the received power levels from the NgDL receive analog combiners such that the ith combiner receives high or low power if the ith spatial bit is 1 or 0, respectively. The remaining bits are mapped to an M -ary constellation. In this way, the BS transmits the following signal p DL αDL Pt PtDL with xDL i x t = DL DL DL tDL i = (1 − 2a0 )si + a0 1NgDL and ( DL aDL if sDL H = 1 − a0 ik = 1 tDL ik = DL DL aL = a0 if sDL ik = 0 (32) where xDL ∈ C1×1 is the DL modulation symbol, {sDL ∈ i DL DL RNg ×1 , i = 1, · · · , 2Ng } is the DL spatial symbol that DL NgDL ×1 includes NgDL data bits and {tDL , i = 1, · · · , 2Ng } i ∈R is the mapped version of the spatial symbol sDL i . Moreover, DL P ∈ RNt ×Ng is the ZF precoder and αDL is a normalization coefficient that fixes the average transmit power (a) max(·) denotes the maximum operator and step = follows from the shaping loss of the M -PSK symbols [38], [39]. As 2 an illustrative example, let us consider σc,ERC = 1, aUL H = 1 1 UL and aL = 2 . Hence, Fig. 2 shows that the upper bound of the gap ratio is in range of 0.05% 6 Lg 6 0.4% and decreases with the transmit power at different number of groups and wide range of received powers. Thus, the SEUL r of the reduced complexity detection scheme achieves a tight lower bound on the mutual information. 3 αDL = 1 1 =  with DL PH} DL k2 Tr {PR E kPtDL x ss 2 i N DL RDL ss = 1 NgDL 2 g 2X DLH tDL . i ti (33) i=1 The received vector at the UT can be expressed as p DL + nDL rDL = αDL Pt HPtDL i x (34) where H is the DL channel matrix which, assuming channel reciprocity, is the transpose of the UL channel matrix H = (HUL )T and nDL ∈ CNu ×1 is the noise vector with i.i.d. circularly symmetric complex Gaussian elements CN (0, σ 2 ). At the UT, we apply a two stages analog combiner on the received vector rDL as follows p DLH DL DL yDL = αDL Pt ADLH PS ASW HPti x In the DL transmission phase, we propose a novel two stages analog combining aided HRSM scheme at the UT, assuming ZF precoding at the BS. First, a switches stage is used to select the UT antennas and connect them to the phase shifters arrays to ensure full rank equivalent channel and thus, enable the ZF DLH DL precoding at the BS. Next, NgDL receive analog beamformers, + ADLH (35) PS ASW n each containing NaDL active phase shifters, are used to enhance the receive beamforming gain. We apply the ZF precoder where variables are defined in Table II. Moreover, flDL has DL(i) DL(j) | = |fl | = 1 ∀ i, j, l = at the BS on the effective channel (MIMO channel + DL constant amplitude and thus |fl DL DL RF combiner at the UT). Thus, smart design of the analog 1, · · · , Ng , ASW connects the phase shifters to the UT 9 TABLE II: Elements of the downlink signal model DL NgDL ×NgDL ADL PS DL analog phase shifters matrix ∈ CNa flDL DL analog combining response vector ∈ CNa ADL SW DL analog switches matrix ∈ RNU ×Na DL DL , ADL PS = blockdiag{f1 , · · · , fN DL } g DL DL NgDL ×1 DL (i,j) , ASW DL (i,j) antennas such that ASW = 1 means that the j th phase th shifter represented by the j column of ADL SW is connected to the ith UT antenna represented by the ith row of ADL SW so DL(j) DL DL kASW k0 = j = 1, · · · , Na Ng , the phase shifter inside certain combiner is connected to specific UT antenna and DL(i,(k−1)NaDL +1:kNaDL ) thus kASW k0 ∈ {0, 1}, i = 1, · · · , NU , k = 1, · · · , NgDL . The received signal after the combining in Eq. (35) can be cast as p ′′ DL DL + n with yDL = αDL Pt HDL e Pti x i h DLH DLH DLH DL DLH DL HDL H1 ; · · · ; fN H DL , e = APS ASW H = f1 N g g ′′ DLH DL n = ADLH PS ASW n , DLH((k−1)NaDL +1:kNaDL ) HDL k = ASW H. (36) DL Ng ×NBS Herein, HDL is the effective DL channel matrix. e ∈C DL Moreover, the RF combiner satisfies kADL SW APS k2 = 1 and ′′ thus, the n entries have i.i.d CN (0, σ 2 ) distribution. We design the precoder P at the BS to zero force the effective DL channel as −1  DLH DL DLH P = He He He . (37) where P can be implemented as a fully connected hybrid architecture with no performance penalty, according to [28]. The k th entry of yDL can be expressed as (√ ′′ DL αDL Pt aDL + nk if sDL DL H x ik = 1 (38) yk = √ DL ′′ DL DL if s α Pt aDL x + n L k ik = 0 ′′ th ∈ {0, 1}, i = 1, · · · , NU , j = 1, · · · , NaDL NgDL  1 p DL DL α Pt aDL . (39) H + aL 2 Similar to the UL reduced complexity modulation symbol detection, we combine all the signals of ykDL . Thereafter, the combined signal passes through the RF chain and the high resolution ADC to detect the DL modulation symbol. The combined signal can be expressed as p ′′ T ycDL,ERC = 1TNgDL yDL = αDL Pt 1TNgDL tDL (40) i + 1NgDL n . γ DL = B. Downlink spectral efficiency The reduced complexity detection method in the DL given with Eq. (39) and Eq. (40) and in the UL given with Eq. (18) and Eq. (21) are similar. Therefore, the SE of the DL transDL mission SEDL = ISDL + IM can be derived in similar way according to Eq. (23) and Eq. (27)    H(P0 )+H(1−P1 ) P0 +1−P1 DL DL IS = Ng H , (41) − 2 2 where the sum in Eq. (23) is not needed as the noise power is the same for all groups and  DL P1 = Pr ŝDL ik = 1|sik = 0 p √ DL  = 1 − Q1 2αDL Pt aDL /σ, 2γ /σ , H  P0 = Pr ŝDL = 0|sDL ik = 1 ik p √ DL  DL = Q1 2α Pt aDL /σ, 2γ /σ , L N DL DL IM = ′′ where nk is the k entry of n . Note that, the ZF precoder in Eq (37) could increase the received noise power in case of imperfect CSI at the BS as discussed in [32]. A. Downlink detection For the sake of improving the EE at the UT, we consider energy efficient UT circuitry as depicted in Fig. 1 and thus, avoid applying ML detection. Instead, we propose using the reduced complexity detection method with ERC presented in Section III-C : we exploit the analog devices (AD and 1bit ADCs) to detect the DL spatial symbol and the digital devices (RF chain and high resolution ADC) to detect the DL modulation symbol. The k th AD connected to the k th receive analog combiner in Fig. 1 measures the amplitude of the k th signal |ykDL | in the RF domain and next, we detect the k th DL spatial bit by comparing the measured amplitude to a threshold through the k th 1-bit ADC ( 1 if |ykDL | > γ DL DL with ŝik = 0 if |ykDL | < γ DL SNRDL i = 1 UL 2N g g 2X 1 log2 2 i=1 DL 2 |1DLT t DL Ng i | NgDL  4π SNRDL i e αUL Pt . σ2  with (42) Similar to the UL case, we select the amplitude levels DL to maximize the SEDL . aDL H , aL V. S YSTEM OPTIMIZATION A. Low complexity uplink/downlink optimization algorithm In HTSM/HRSM systems, we consider to include an analog phase shifting stage to achieve high gain. On the other side, employing many phase shifters increases the power consumption and could degrade the EE. Thus, we design a hybrid system that reaches the maximum EE such that its SE is equal or larger than the SE achieved by (GTSM/GRSM, same architecture as in Fig. 1 but without phase shifters [36]). In Algorithm 1, we evaluate the SE of the GTSM/GRSM systems for comparison with the proposed hybrid systems and as an input to Algorithm 2. We apply the QR decomposition [40] to sort the channel matrix rows. Specifically, the set A in step 3 of Algorithm 1 includes the most uncorrelated UT antennas 10 Algorithm 1 UL and DL system parameters optimization of generalized SM 1: Input : H and Ng,max DL 2: Output : SEUL GTSM and SEGRSM  H 3: [Q R A] = QR H , 0 such that HH (:, A) = QR 4: for Ng = 1 : Ng,max (A(1),:) ; . . . ; H(A(Ng ),:) ], HUL = HDLH 5: HDL e = [H e e UL UL UL 6: SE UL (N g ) = maximize IS + IM , s.t. 0 6 a0 6 GTSM aUL 0  DL DL DL 7: SE DL GRSM (Ng ) = maximize IS + IM , s.t. 0 6 a0 6 aDL 0 1 2 1 2 8: end for  UL 9: return SEUL GTSM = max SE GTSM  DL SEDL GRSM = max SE GRSM Algorithm 2 UL and DL system parameters optimization of the hybrid SM UL 1: Input : H = AR DAH , SEDL t , SEGTSM GRSM⋆and Pt ⋆ ⋆ ⋆ ⋆ UL UL UL and αDL⋆ . DL 2: Output : ASW , APS , ASW , ADL 0 PS , α0 3: θmax = θi : i = arg max |Dj,j |, j = 1, · · · , L j 4: for Na = 1 : NU 5: Generate all possible antennas arrays sub-channels Hi (l,m) (k) NU  , ASW,i ∈ {0, 1}, ASW,i ASW,i H, i = 1, · · · , K = N (i,:) 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: a 0 = = 1, k = 1, · · · , Na , ASW,i 0 ∈ {0, 1}, i = 1, · · · , NU . q Nr AR,i (:, θmax ), i = 1, · · · , K fi = N a HH ] He = [f1H H1 ; . . . ; fK K  [Q R A] = QR HH , 0 such that HH e e (:, A) = QR for Ng = 1 : Ng,max h i ASW (Na , Ng ) = ASW,A(1) , · · · , ASW,A(Ng )   APS (Na , Ng ) = blockdiag fA(1) , · · · , fA(Ng ) H H HUL e = [HA(1) fA(1) , . . . , HA(Ng ) fA(Ng ) ]  UL s.t. 0 6 aUL SE UL (Na , Ng ) = maximize ISUL + IM 0 6 1 2 ΩUL (Na , Ng ) = aUL 0 H H HDL e = [fA(1) HA(1) ; . . . ; fA(Ng ) HA(Ng ) ]  DL DL s.t. 0 6 aDL SE (Na , Ng ) = maximize ISDL + IM 0 6 1 2 aUL 0 aDL 0 17: ΩDL (Na , Ng ) = aDL 0 18: end for 19: end for 20: Solve theoptimization problem in Eq. 43 to obtain the operating points Na⋆ , Ng⋆ for the UL  and the DL transmissions    ⋆ UL⋆ , N UL⋆ , UL⋆ , N UL⋆ , AUL⋆ = A 21: return AUL = A PS Na SW Na g g PS SW     ⋆ ⋆ ⋆ ⋆ DL⋆ , N DL⋆ , ADL⋆ = aUL = Ω NaUL , NgUL , ADL g 0 SW = ASW Na PS    ⋆ ⋆ ⋆ ⋆ ⋆ = Ω NaDL , NgDL . APS NaDL , NgDL , aDL 0 sorted in descending order according to the strength of the path between the UT antenna and the BS and Ng,max represents the maximum number of groups, a value that is upper bounded by the number of channel clusters C. Next, we select the best sub-channel matrix that maximizes the SE2 . In Algorithm 2, we optimize the analog beamforming and combining matrices and the spatial amplitude levels in UL and DL at a given Pt . Specifically, for a given number of phase shifters (Na ) inside U the group, we have N Na possible ways of connecting the Na phase shifters to the NU antennas of the UT and this leads to 2 We cannot consider the full channel matrix when the matrix is rank deficient as the ZF precoding/combining does not exist. NU Na  possible different groups of phase shifters. Each phase shifters group is designed to steer the beam in the direction of the strongest path. Next, we generate a large effective channel matrix that includes all of the possible antenna arrays groups. Thanks to the QR decomposition [40], we can sort the linearly independent groups in one step3 . Note that in Algorithm 2, steps 5 through 8 are common for the UL and DL. Thereafter, we evaluate the UL and DL SE and the EE (defined as the SE divided by the UT hardware power consumption) with optimized amplitude levels for number of groups starts from one to Ng,max . We repeat the procedure for every number of active antennas in a group (Na = 1 : NU ) until we complete NU × Ng,max grids for the SEUL , SEDL , EEUL and EEDL . Finally, the BS selects the UL and DL operating points in the grids that maximize the EE such that the SE is better than that of systems without phase shifters (GTSM/GRSM) evaluated in Algorithm 1. This is formulated as:  Na⋆ , Ng⋆ = SE(Na , Ng , Pt ) maximize EE = PC (Na , Ng , Pt ) Na ∈{1,··· ,NU },Ng ∈{1,··· ,Ng,max } subject to SE > t (43) where problem (43) is solved for the UL considering SE = SE UL , PC = PCUL , t = SEUL GTSM and for the DL considering SE = SE DL , PC = PCDL and t = SEDL GRSM . The optimization of the amplitude levels in steps 13 and 17 of Algorithm 2 leads to non linear objective function in one unknown and one linear constraint that can be efficiently evaluated using bisection method. Algorithm 2 maximizes the EE at a given transmit power, but the optimal transmit power still needs to be computed. This can be done in two steps. First, evaluating the the minimum transmit power that ensures the SE constraint of problem (43). Second, determine the transmit power within the evaluated feasible interval that maximizes the EE. In step 1, we apply the bisection method in Algorithm 3 with initial lower bound (Pt = 0) and upper bound (Pt = maximum transmit power Pt,max ). At each iteration, we apply Algorithm 2 using the value of Pt in the middle of the upper and lower bounds. The updated lower bound is Pt if problem (43) is infeasible, otherwise, the updated upper bound is Pt . The bisection iterations stop when the gap between the bounds satisfies specific accuracy. The output of step 1 is the minimum transmit power Pt,min that ensures the SE constraint of problem (43). In step 2, we apply another bisection method as illustrated in Algorithm 3 with initial lower bound (Pt = 0) and upper bound (Pt = Pt,min ). At each iteration, we solve Algorithm 2 at Pt in the middle of the bounds. The updated lower bound is Pt if the optimized EE at Pt is greater than or equals to the optimized EE at the lower bound, otherwise, the updated upper bound is Pt . The iterations stop when the gap between the bounds satisfies a given accuracy level. The output of step 2 is the optimal transmit power that maximizes the EE under SE constraint. In exhaustive search based design, line 8 in Algorithm 2 should 3 Note that we need linearly independent groups to perform ZF precoding/combining matrices. 11 Algorithm 3 Transmit power optimization using bisection method 1: Input : lower bound, upper bound and ǫ 2: if |lower bound − upper bound| > ǫ then bound 3: Apply Algorithm 2 at Pt = lower bound+upper 2 4: if optimization condition is satisfied, problem (43) is infeasible for step 1 or the EE resulting from solving problem (43) at Pt = lower bound+upper bound is greater than or equals to the EE at Pt = 2 lower bound for step 2 then 5: lower bound = Pt , else, upper bound = Pt 6: end if 7: end if 8 7 6 5 4 3 2 1 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 18 Fig. 4: Optimum number of antennas per group and number of groups of the HTSM/HRSM schemes vs. received SNR at NBS = 128, NU = 16, Ng,max = C = 4 (average over 1000 channel realizations). 16 14 12 10 8 6 4 2 0 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 Fig. 3: EE of the proposed schemes evaluated by Algorithm 2 and by exhaustive search vs. received SNR at NBS = 128, NU = 4, Ng,max = C = 3 (average over 1000 channel realizations). be replaced with exhaustive search selection of Ng groups out  u possible groups for all values of Ng . The number of of N Na grid points of the proposed algorithm Ng,p and the exhaustive search Ng,es can be expressed as Ng,p = NU × Ng,max ,    NX g,max  NU  U −1 N X NU Na > Ng . (44) ,∀ Ng,es = 1 + Na Ng N =1 N =1 a g As an illustrative example, consider NU = 16 and Ng,max = C = 4. The corresponding number of grid points are Ng,p = 64 and Ng,es = 2.9 × 1015 and thus, the proposed algorithm significantly reduces the computational complexity. VI. SIMULATION RESULTS In this section, we evaluate the performance of the proposed HTSM/HRSM schemes compared to the GTSM/GRSM schemes in terms of SE and EE. We show the achievable SE, EE, SE-EE trade-off and illustrate the optimal numbers of needed groups and phase shifters per group for the UL and the DL systems. In order to validate the efficiency of the proposed algorithm, we compare the EE obtained from the reduced complexity algorithm with the one obtained from the exhaustive search. We evaluate the system performance in both stochastic and deterministic channel environments. A. Performance evaluation in stochastic channel In the stochastic simulation environment, we consider σ 2 = −84 dBm and Pl = 90 dB. Fig. 3 shows the EE comparison of the UL HTSM and the DL HRSM schemes when we apply the proposed fast Algorithm 2 compared to the exhaustive search. Thanks to the QR decomposition in Algorithm 2, we obtain the same performance as the exhaustive search with significant reduction in the computational complexity as explained in Eq. (44). Fig. 4 shows the proposed system behavior in terms of the optimized number of groups and antennas per groups of the HTSM/HRSM designs. The number of groups and antennas per group are obtained from Algorithm 2 and are designed to ensure full rank effective channel matrix and enable the ZF combining and precoding in the UL and DL, respectively. As we maximize the EE, we keep the total number of the phase shifters small. Therefore, the increase in number of groups is necessarily associated to the decrease in the number of phase shifters per group. At low SNR, we need high beamforming/combining gains. Hence, the number of phase shifters per group is high and thus, the number of groups is small. Increasing the SNR reduces the required beamforming/combining gains. As a result, the number of phase shifters per group decreases and the number of groups increases to attain high spatial multiplexing gain. Fig. 5a shows the SE of the proposed UL HTSM and DL HRSM designs compared to the UL GTSM and the DL GRSM schemes. At low SNR regime (common assumption associated with outdoor mmWave propagation), the proposed hybrid designs achieve superior SE as the phase shifters stages in the HTSM and HRSM schemes provide high beamforming and combining gains; respectively and combat the severe path-loss. At high SNR, each group may contain one or two phase shifters as explained in Fig. 4. Since the small number of antennas at an array is not sufficient to provide high beamforming gains, the GTSM approaches the SE of the HTSM. On the other hand, the HRSM still outperforms GRSM at high SNR even with the small number of phase shifters per group. The SE of the HRSM is higher than HTSM as the ZF combiner in the HTSM system could amplify the UL noise power. In contrast, the RF combiner of the HRSM does not affect the DL noise power. Fig. 5b shows the SE of the UL HTSM scheme when we apply Algorithm 2 with (Gaussian input distribution and optimum detector) shown in Eq. (11) and Eq. (17) and (M PSK modulated input and reduced complexity detector) as in Eq. (23) and Eq. (27). The reduced complexity scheme approaches the optimal performance specifically at low SNR. 12 14 11 10 12 9 8 10 7 6 8 5 6 4 3 4 2 1 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 2 -20 (a) HTSM-HRSM compared to GTSM-GRSM -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 15 (b) Optimal and reduced complexity detectors Fig. 5: SE of the proposed HTSM-HRSM compared to GTSM-GRSM schemes and SE of HTSM with Gaussian modulation symbol and optimal detector compared to the scheme with M -PSK modulation symbol and reduced complexity detector at NBS = 128, NU = 16, Ng,max = C = 4 (average over 1000 channel realizations). 8 20 7 18 16 6 14 5 12 4 10 3 8 2 6 1 0 -14 4 2 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 1 2 3 (a) UL EE 4 5 6 7 8 9 10 11 (b) DL EE-SE trade-off Fig. 6: UL EE and DL EE-SE trade-off of the proposed HTSM and HRSM schemes compared to GTSM and GRSM methods at NBS = 128, NU = 16, Ng,max = C = 4 (average over 1000 channel realizations). 9 18 8 16 7 14 6 12 5 10 4 8 3 6 2 1 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 (a) UL EE 4 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 (b) DL EE Fig. 7: UL and DL energy efficiency at the UT of the proposed scheme compared to hybrid SM in [21] and hybrid MIMO in [16] assuming single RF chain, NBS = 128, NU = 16, Ng,max = C = 6 (average over 1000 channel realizations). Fig. 6a shows the UL EE of the proposed HTSM scheme when the transmit power is optimized, or the maximum available transmit power is used. We compare it to the UL EE of the GTSM system. The proposed scheme outperforms the GTSM system especially at low SNR when the beamforming gain is needed. Optimizing the transmit power slightly improves the EE due to the SE constraint shown in Eq. (43). Fig. 6b represents the DL EE-SE trade-off of the proposed HRSM design compared to the GRSM scheme. At low SNR, the proposed hybrid design achieves superior SE and EE due to the high gain of the phase shifters stage. At high SNR regime, generalized system achieves slightly higher EE as the number of groups tends to be one and the SE gap of the two systems reduces. Fig. 7 shows the UL and DL EE at the UT of the proposed scheme compared to hybrid SM in [21] and hybrid MIMO in [16] assuming single RF chain at the UT. Transmitting M PSK modulation symbols and applying reduced complexity detector, the proposed hybrid SM with optimized grouping explained in Algorithm 2 attains higher EE than hybrid SM with uniform grouping proposed in [21] in UL and DL. Moreover, in DL, it outperforms the the hybrid MIMO in [16]. Considering M -QAM symbols and applying ML detector, the proposed UL HTSM achieves superior EE than the hybrid 13 Fig. 8: Top view of realistic users distribution (red dots) served by three sectors mMIMO BSs inside mmWave small-cell in Manhattan area in New York City where the farthest user at 220 metre distance from the small-cell. MIMO in [16]. B. Performance evaluation for a ray-trace model With the aim of evaluating the performance of the proposed system in typical small-cell scenario at 28 GHz, we consider a realistic user distribution and generate deterministic channels per user, and then compare the system performance with the stochastic and deterministic channels. Several outdoor smallcell mmWave channel samples have been predicted from the ray-based propagation model VolcanoUrban [33]. Those samples are the result of physical interactions between the electromagnetic wave and the real representation of a dense urban environment, more precisely, a district in New York Manhattan. A small-cell is positioned at 8 meters above the ground, at a typical location for a lamppost. Three sectors are installed at the small-cell. Each sector is feeding a linear antenna array with boresight direction oriented towards azimuth 0◦ , 120◦ and 240◦ , as depicted in Fig. 8. Each linear antenna array is formed of 128◦ vertically-polarized antenna elements, which are uniformly distributed in the horizontal plane, at frequency 28 GHz, and with half-wavelength separation. All antenna elements have same radiation pattern with 60◦ halfpower horizontal beamwidth. The users are assumed to be pedestrians distributed on the surrounding pavements at a maximum 220 meters range from the small-cell. The user equipment is located at 1.5 meter above the ground. Its antenna is a uniform linear array with 16 vertically-polarized isotropic elements positioned in the horizontal plane. The channel samples are produced from 142 different user positions. Users are positioned either in a wide or a narrow street, or even in a small square. Few of them are in non line-of-sight (NLoS) situation. Finally, a total number of 180 channel samples are created: 50, 69 and 61 for respectively sector 0, 1 and 2 with 37 NLoS samples. The SE of the proposed UL HTSM and DL HRSM designs evaluated on the stochastic channel model with C = 2 and C = 6 scatterers and the deterministic channel model for the scenario proposed in Fig. 8 assuming the same path-loss for the two models is depicted in Fig. 9. The noise level is σ 2 = −84 dBm, the transmit power is Pt = 20 dBm, the carrier frequency is fc = 28 GHz, bandwidth BW = 10 MHz and 76% of the users have delay spread smaller than the symbol time for the simulation setup so that we can consider non-frequency selective channel. Sector 0 has the lowest scattering environment due to the LoS users and the vegetation. Sector 1 has more NLoS users and thus, the users achieve high SE. Sector 2 users are farther away than the users in the other sectors and thus, its users have greater pathloss and lower SE. From this experiment, we show that the proposed design not only attains high performance with the theoretical stochastic channel model in Eq. (3) but it also achieves similar performance with the realistic channel model. Moreover, the performance evaluation of the 28 GHz channels at BW = 10 MHz based on stochastic channel model gives a realistic assessment if the number of clusters is in the range of C = 2 and C = 6. VII. C ONCLUSION In this paper, we proposed novel and energy efficient hybrid transceiver architecture based on two stages analog beamformer in the UL and combiner in the DL, respectively. The analog switches stage smartly allocate the UT antennas on the phase shifters groups to minimize the spatial correlation. Moreover, the analog phase shifters stage maximizes the beamforming/combining gains to combat the path-loss. We proposed a novel and computationally efficient optimization algorithm to design the analog stages. The proposed design achieves the same performance as the exhaustive search method but with much lower computational complexity. The flexibility of the architecture allows optimising the hybrid transceiver at any SNR regime: At low SNR regime, we activate only one group of phase shifters and maximize the number of phase shifters inside the group to attain high post processing SNR. At high SNR regime, the number of groups increases and as a result the spatial rate increases. Moreover, the number of phase shifters per group decreases as optimizing the EE implies reducing the total number of phase shifters. We validated the performance of the proposed design on a realistic deployment in Manhattan area in New York City. The performance evaluation for mmWave small-cell at 28 GHz shows that the stochastic channel models provides results close to those obtained with the deterministic channel if the number of clusters is chosen to emulate the real-world scenario. 14 14 14 14 12 12 12 10 10 8 8 10 8 6 6 6 4 4 4 2 -10 -9 2 -15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (a) UL SE of sector 0 2 -20-19-18-17-16-15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 (b) UL SE of sector 1 (c) UL SE of sector 2 15 15 15 14 14 14 12 12 12 10 10 10 8 8 8 6 6 6 4 4 2 -15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 2 -20-19-18-17-16-15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 4 2 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (d) DL SE of sector 0 (e) DL SE of sector 1 (f) DL SE of sector 2 Fig. 9: SE of the proposed UL and DL hybrid design evaluated on stochastic (in blue, for C = 2, C = 6 and average over 100 realizations) and deterministic channel samples (in dots) assuming the same path-loss for the two models, σ 2 = −84 dBm, Pt = 20 dBm, fc = 28 GHz and BW = 10 MHz. APPENDIX A Proof of Lemma 1       (a) UL(1:k) UL(1:k) UL(1:k) UL UL − h y1UL , · · · , yk−1 ti , · · · , y1UL , ti = h y1UL , · · · , ykUL ti h ykUL yk−1  (b) UL(1:k−1) UL(1:k−1)H (1:k−1,1:k−1) UL(1:k) UL(1:k)H (1:k,1:k) UL αUL Pt ti ti + R n ′ n′ ti = log2 πe α Pt ti + R n ′ n′ (1:k,1:k) (c) = log2 πe R n′ n′ (1:k,1:k)−1 UL(1:k) ti T −1 UL(1:k−1) UL(1:k−1) (1:k−1,1:k−1) ti αUL Pt ti R n′ n′ UL(1:k)T 1 + αUL Pt ti R n′ n′ (1:k−1,1:k−1) 1 + R n′ n′    (d) (1:k,1:k) (1:k−1,1:k−1) 2 2 = log2 πe σK + Pi,K , where σK = Rn′ n′ and R n′ n′ Pi,K = (a) UL(1:k)T (1:k,1:k)−1 UL(1:k) UL(1:k−1)T (1:k−1,1:k−1)−1 UL(1:k−1) ti R n′ n′ ti − ti R n′ n′ ti 2 UL σ K α Pt T −1 UL(1:k−1) UL(1:k−1) (1:k−1,1:k−1) 1 + αUL Pt ti R n′ n′ ti (b) (c) Step = follows from chain rule of entropy, step = follows from Gaussian distributions, step = follows from applying the  (d) identity |A + ttH | = |A| 1 + tH A−1 t and step = follows from noise and signal powers separation. Therefore, !   2 UL UL UL UL(1:k) = CN 0, σK + Pi,K . f yk yk−1 , · · · , y1 , ti 15 R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] International Telecommunication Union (ITU). IMT traffic estimates for the years 2020 to 2030, Report ITU-R M.2370-0, Jul. 2015. T. S. Rappaport, R. W. Heath Jr., R. C. Daniels, and J. N. Murdock, Millimeter wave wireless communications. Englewood Cliffs, NJ, USA: Prentice-Hall, Sep. 2014. Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broadband systems,” IEEE Communications Magazine, vol. 49, no. 6, pp. 101–107, Jun. 2011. S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter-wave cellular wireless networks: Potentials and challenges,” Proceedings of the IEEE”, vol. 102, no. 3, pp. 366–385, Mar. 2014. T. L. Marzetta, “Massive MIMO: An introduction,” Bell Labs Technical Journal, vol. 20, pp. 11–22, Mar. 2015. F. Rusek et al., “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40–60, Jan. 2013. L. Sanguinetti, E. Björnson, M. Debbah, and A. L. Moustakas, “Optimal linear precoding in multi-user MIMO systems: A large system analysis,” in IEEE Global Communications Conference (GLOBECOM), pp. 3922–3927, Dec. 2014. N. Shariati, E. Björnson, M. Bengtsson, and M. Debbah, “Lowcomplexity channel estimation in large-scale MIMO using polynomial expansion,” in 24th IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp. 1157–1162, Sep. 2013. A. Alkhateeb, O. E. Ayach, G. Leus, and R. W. Heath Jr., “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 831–846, Oct. 2014. Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels,” IEEE Transactions on Signal Processing, vol. 52, no. 2, pp. 461–471, Feb. 2004. R. W. Heath et al., “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE journal of selected topics in signal processing, vol. 10, no. 3, pp. 436–453, Apr. 2016. O. E. Ayach et al., “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Transactions on Wireless Communications, vol. 13, no. 3, pp. 1499–1513, Mar. 2014. A. Alkhateeb, G. Leus, and R. W. Heath Jr., “Limited feedback hybrid precoding for multi-user millimeter wave systems,” IEEE Transactions on Wireless Communications, vol. 14, no. 11, pp. 6481–6494, Nov. 2015. R. Méndez-Rial et al., “Hybrid MIMO architectures for millimeter wave communications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267, Jan. 2016. F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for large-scale antenna arrays,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, Aoril 2016. X. Yu et al., “Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 485–500, Apr. 2016. R. Y. Mesleh et al., “Spatial modulation,” IEEE Transactions on Vehicular Technology, vol. 57, no. 4, pp. 2228–2241, Jul. 2008. R. Zhang, L.-L. Yang, and L. Hanzo, “Generalised pre-coding aided spatial modulation,” IEEE Transactions on Wireless Communications, vol. 12, no. 11, pp. 5434–5443, Nov. 2013. M. D. Renzo et al., “Spatial modulation for generalized MIMO: Challenges, opportunities and implementation,” Proceedings of the IEEE, vol. 102, no. 1, pp. 56–103, Jan. 2014. Y. Cui, X. Fang, and L. Yan, “Hybrid spatial modulation beamforming for mmwave railway communication systems,” IEEE Transactions on Vehicular Technology, vol. 65, no. 12, pp. 9597–9606, Dec. 2016. M. Yüzgeçcioğlu and E. Jorswieck, “Hybrid beamforming with spatial modulation in multi-user massive MIMO mmwave networks,” in 28th [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp. 1–6, Oct. 2017. M. R. Akdeniz et al., “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1164–1179, Jun. 2014. N. S. Perovic, P. Liu, M. Di Renzo, and A. Springer, “Receive spatial modulation for los mmwave communications based on TX beamforming,” IEEE Communications Letters, Dec. 2016. A. Raafat et al., “Energy efficient transmit-receive spatial modulation for uplink-downlink large-scale MIMO systems,” in the proceedings of IEEE Global Communications Conference (GLOBECOM), pp. 1–6, Dec. 2018. L. He, J. Wang, and J. Song, “On generalized spatial modulation aided millimeter wave MIMO: Spectral efficiency analysis and hybrid precoder design,” IEEE Trans. on Wireless Communications, vol. 16, no. 11, pp. 7658–7671, Nov. 2017. ——, “Spatial modulation for more spatial multiplexing: Rf-chainlimited generalized spatial modulation aided MM-Wave MIMO with hybrid precoding,” IEEE Trans. on Communications, vol. 66, no. 3, pp. 986–998, Mar. 2018. S. Rami, W. Tuni, and W. R. Eisenstadt, “Millimeter wave MOSFET amplitude detector,” Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), pp. 84–87, Jan. 2010. O. El Ayach et al., “Low complexity precoding for large millimeter wave MIMO systems,” in IEEE international conference on communications (ICC), pp. 3724–3729, Jun. 2012. S. Shakib et al., “A highly efficient and linear power amplifier for 28-GHz 5G phased array radios in 28-nm CMOS,” IEEE Journal of Solid-State Circuit, vol. 51, no. 12, pp. 3020–3036, Dec. 2016. J. Lagos et al., “A single-channel, 600-MS/s, 12-b, ringamp-based pipelined ADC in 28-nm cmos,” IEEE Journal of Solid-State Circuits, vol. 54, no. 2, pp. 403–416, Feb. 2018. N. Rostomyan, M. Özen, and P. Asbeck, “28 GHz doherty power amplifier in cmos soi with 28% back-off pae,” IEEE Microwave and Wireless Components Letters, vol. 28, no. 5, pp. 446–448, May 2018. C. Wang et al., “On the performance of the MIMO zero-forcing receiver in the presence of channel estimation error,” IEEE Transactions on Wireless Communications, vol. 6, no. 3, pp. 805–810, Mar. 2007. Y. Corre and Y. Lostanlen, “Three-dimensional urban EM wave propagation model for radio network planning and optimization over large areas,” IEEE Transactions on Vehicular Technology, vol. 58, no. 7, pp. 3112–3123, Sep. 2009. A. Raafat, A. Agustin, and J. Vidal, “MMSE precoding for receive spatial modulation in large MIMO systems,” in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5, Jun. 2018. M. F. Huber, T. Bailey, H. Durrant-Whyte, and U. D. Hanebeck, “On entropy approximation for Gaussian mixture random vectors,” in IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 181–188, Aug. 2008. A. Raafat, A. Agustin, and J. Vidal, “Receive spatial modulation for massive MIMO systems,” in IEEE Global Communications Conference (GLOBECOM), pp. 1–6, Dec. 2017. F. Chapeau-Blondeau, “Noise-enhanced capacity via stochastic resonance in an asymmetric binary channel,” Physical Review E, vol. 55, no. 2, p. 2016, Feb. 1997. P. E. McIllree, Channel capacity calculations for M-ary N-dimensional signal sets. M.S. thesis, The U. South Australia, School of Electronic Eng., Feb. 1995. B. Goebel et al., “Calculation of mutual information for partially coherent Gaussian channels with applications to fiber optics,” IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 5720–5736, Sep. 2011. Y. Wu et al., “Receive antenna selection in the downlink of multiuser MIMO systems,” 62nd Vehicular Technology Conference, vol. 1, pp. 477–481, Sep. 2005.