Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture

Published: 01 May 2004 Publication History

Abstract

Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper presents the multiprocessor architecture of the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus), and examines the performance of representative algorithms for matrix operations, merging and sorting, using the message-passing and distributed-shared-memory paradigms. It shows that simple enhancements to the network interface and the cache and directory controllers can result in communication time of O(1) for the matrix-vector multiplication algorithm using DSM. The SOME-Bus is a low-latency, high-bandwidth, fiber-optic interconnection network which directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over 100 nodes. It contains a dedicated channel for the data output of each node, eliminating the need for global arbitration and providing bandwidth that scales directly with the number of nodes in the system. Each of P nodes has an array of receivers, with one receiver dedicated to each node output channel. No node is ever blocked from transmitting by another transmitter or due to contention for shared switching logic. The entire P receiver array can be integrated on a single chip at a comparatively minor cost resulting in O(P) complexity. The SOME-Bus has much more functionality than a crossbar by supporting multiple simultaneous broadcasts of messages, allowing cache consistency protocols to complete much faster.

References

[1]
{1} B. Abali, F. Ozguner, A. Bataineh, Balanced parallel sort on hypercube multiprocessors, IEEE Trans. Parall. Distr. Syst. 4 (5) (1993) 572-581.
[2]
{2} A. Agarwala, C.R. Das, Experimenting with a shared virtual memory enviromnent for hypercubes, J. Parall. Distr. Comput. 29 (2) (1995) 228.
[3]
{3} A. AL Ayyoub, M. Ould Khaoua, K. Day, On the performance of parallel matrix factorisation on the hypermesh, J. Supercomput. 20 (1) (2001) 37-53.
[4]
{4} C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, W. Zwaenepoel, TreadMarks: shared memory computing on networks of workstations, IEEE Comput. 29 (2) (1996) 18-28.
[5]
{5} L. Bhuyan, Generalized hypercube and hyperbus structures for a computer network, IEEE Trans. Comput. C-33 (4) (1984) 323-333.
[6]
{6} A. Bouzid, M.A.G. Abushagur, Thin-film approximate modeling of in-core fiber gratings, Opt. Eng. 35 (10) (1996) 2793-2797.
[7]
{7} G.T. Byrd, M.J. Flynn, Producer-consumer communication in distributed shared memory multiprocessors, Proc. IEEE 87 (3) (1999) 456-466.
[8]
{8} C. Cerinm, J.L. Gaudiot, Algorithms for stable sorting to minimize communications in networks of workstations and their implementations in BSP, in: Proceedings of the IEEE Computer Society International Workshop on Cluster Computing, ICWC'99, 1999, pp. 112-20.
[9]
{9} C. Jaeyoung, J.J. Dongarra, D.W. Walker, Parallel matrix transpose algorithms on distributed memory concurrent computers, in: Proceedings of the Scalable Parallel Libraries Conference, 1994, pp. 245-252.
[10]
{10} F. Dahlgren, P. Stenstrom, Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors, IEEE Trans. Parall. Distr. Syst. 7 (4) (1996) 385.
[11]
{11} F. Dahlgren, M. Dubois, P. Stenstrom. Performance evaluation and cost analysis of cache protocol extensions for shared memory multiprocessors, IEEE Trans. Comput. 47 (10) (1998) 1041-1055.
[12]
{12} L. Dong, B. Ortega, L. Reekie, Coupling characteristics of cladding modes in tilted optical fiber gratings, Appl. Opt. 37 (22) (1998) 5099-5105.
[13]
{13} T. Erdogan, J. Sipe, Tilted fiber phase gratings, J. Opt. Soc. Am. 13 (2) (1996) 296-313.
[14]
{14} G. Gravenstreter, R. Melhem, Realizing common communication patterns in partitioned optical passive stars (POPS) networks. IEEE Trans. Comput. 47 (9) (1998).
[15]
{15} A. Grujic, M. Tomasevic, V. Milutinovic. A simulation study, of hard-ware-oriented DSM approaches, IEEE Parall. Distr. Technol. 4 (1) (1996) 74.
[16]
{16} M. Hamdi, J. Tong, C.W. Kin, Fast sorting algorithms on reconfigurable array of processors with optical buses, in: Proceedings of the International Conference on Parallel and Distributed Systems, 1996, pp. 183-188.
[17]
{17} H.B. Lim, P.C. Yew, Efficient integration of compiler directed cache coherence and data prefetching, in: Proceedings of the 14th International Parallel and Distributed Processing Symposium, 2000, pp. 331-340.
[18]
{18} C. Katsinis, Performance analysis of the simultaneous optical multiprocessor exchange bus, Parall. Comput. J. 27 (8) (2001) 1079-1115.
[19]
{19} K. Li, Scalable parallel matrix multiplication on distributed memory parallel computers, J. Parall. Distr. Comput. 61 (12) (2001) 1709-1731.
[20]
{20} K. Li, V.Y. Pan, Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system, IEEE Trans. Comput. 50 (5) (2001) 519-525.
[21]
{21} D.M. Koppelman, Neighborhood prefetching on multiprocessors using instruction history, in: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2000, pp. 123-132.
[22]
{22} M. Lee, G. Little, Study of radiation modes for 45-deg tilted fiber phase gratings, Opt. Eng. 37 (10) (1998) 2687-2698.
[23]
{23} Y. Li, T. Wang, Distribution of light power and optical signals using embedded mirrors inside polymer optical fibers, IEEE Photon. Technol. Lett. 8 (10) (1996) 1352-1354.
[24]
{24} Y. Li, T. Wang, K. Fasanella, Cost-effective side-coupling polymer fiber optics for optical interconnections, J. Lightwave Technol. 16 (5) (1998) 892-901.
[25]
{25} S.A. Mabbs, K.E. Forward, Performance analysis of MR-1, a clustered shared-memory multiprocessor, J. Parall. Distr. Comput. 20 (2) (1994) 158.
[26]
{26} A. Milenkovic, V. Milutinovic. Cache injection on bus based multiprocessors, in: Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems, 20-23 October, 1998, pp. 341-346.
[27]
{27} S.S. Nemawarkar, R. Govindarajan, G.R. Gao, V.K. Agarwal, Analysis of multithreaded multiprocessors with distributed shared memory, in: Proceedings of the IEEE Symposium on Parallel Distributed Processing, 1993, pp. 114-121.
[28]
{28} C.D. Norton, T.A. Cwik. Early experiences with the myricom 2000 switch on an SMP Beowulf class cluster for unstructured adaptive meshing, in: Proceedings of the International Conference on Cluster Computing, 2001, pp. 7-14.
[29]
{29} A.G. Nowatzyk. et al., S-Connect: from networks of workstations to supercomputer performance, in: Proceedings of the 22nd International Symposium on Computer Architecture, June 1995, pp. 71-82.
[30]
{30} D. Ortega, E. Ayguade, J.L. Baer, M. Valero, Cost effective compiler directed memory prefetching and bypassing, in: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2002, pp. 189-198.
[31]
{31} M. Ould-Khaoua, Comparative evaluation of hypermesh and multi-stage interconnection network, Comput. J. 39 (3) (1996) 232.
[32]
{32} H.F.B. Ozelo, L.E.M. de Barros Jr., B. Nabet, L.G. Neto, M.A. Romero, J.W. Swart, MSM photodetector with an integrated microlens array for improved optical coupling, in: Proceedings of the International Microwave and Optoelectronics Conference (IMOC'99), Rio de Janeiro, Brazil, 9-12 August, 1999, pp. 472-475.
[33]
{33} V.S. Pai, S.V. Adve, Comparing and combining read miss clustering and software prefetching, in: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 292-303.
[34]
{34} Y. Pan, K. Li, Linear array with a reconfigurable pipelined bus system concepts and application, J. Inform. Sci. 106 (3-4) (1998) 237-258.
[35]
{35} D.V. Plant, M.B. Venditti, E. Laprise, J. Faucher, K. Razavi, M. Chateauneuf, A.G. Kirk, J.S. Ahearn, 256 channel bidirectional optical interconnect using VCSELs and photodiodes on CMOS, J. Lightwave Technol. 19 (8) (2001) 1093-1103.
[36]
{36} S. Rajasekaran, S. Sahni, Sorting, selection, and routing on the array with reconfigurable optical buses, in: Proceedings of the IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 11, 1997.
[37]
{37} R.H. Saavedra, W. Mao, D. Park, J. Chame, S. Moon, The combined effectiveness of unimodular transformations, tiling, and software prefetching, in: Proceedings of the 10th International Parallel Processing Symposium, 15-19 April, 1996, pp. 39-45.
[38]
{38} E. Speight, J.K. Bennett, Brazos: a third generatiun DSM system, in: Proceedings of the 1997 USENIX Windows/NT Workshop, August 1997.
[39]
{39} T. Szymanski, Hypermeshes: optical interconnection network for parallel computing, J. Parall. Distr. Comput. 26 (1) (1995) 1.
[40]
{40} S.P. Vander Wiel, D.J. Lilja, When caches aren't enough: data prefetching techniques, Computer 30 (7) (1997) 23-30.
[41]
{41} L. Xiang, K. Ushijima, On time bounds, the work time scheduling principle, and optimality for BSR, IEEE Trans. Parall. Distr. Syst. 12 (9) (2001) 912-921.
[42]
{42} L. Xiang, K. Ushijima, Optimal parallel merging algorithms on BSR, in: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks 2000, pp. 12-17.
[43]
{43} K.K. Lau, M.J. Kumar, R. Venkatesh, Parallel matrix inversion techniques, in: Proceedings of the IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, 1996, pp. 515-521.
[44]
{44} Q. Ping Gu, J. Gu, Algorithms and average time bounds of sorting on a mesh connected computer, IEEE Trans. Parall. Distr. Syst. 5 (3) (1994) 308-315.
[45]
{45} http://www.dolphinics.com.
[46]
{46} http://www.myrinet.com.
[47]
{47} http://www.quadrics.com.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 20, Issue 4
Special issue: Advanced services for clusters and internet computing
May 2004
192 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 May 2004

Author Tags

  1. broadcast architectures
  2. multiprocessors
  3. numerical algorithms

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A new constant-time parallel algorithm for mergingThe Journal of Supercomputing10.1007/s11227-018-2623-z75:2(968-983)Online publication date: 1-Feb-2019
  • (2013)A new light-based solution to the Hamiltonian path problemFuture Generation Computer Systems10.1016/j.future.2012.07.00829:2(520-527)Online publication date: 1-Feb-2013
  • (2010)Application of self organizing maps for investigating network latency on a broadcast-based distributed shared memory multiprocessorExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.09.04237:4(2937-2942)Online publication date: 1-Apr-2010
  • (2010)Merging data records on EREW PRAMProceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II10.1007/978-3-642-13136-3_40(391-400)Online publication date: 21-May-2010
  • (2008)Parallel merging with restrictionThe Journal of Supercomputing10.1007/s11227-007-0141-543:1(99-104)Online publication date: 1-Jan-2008

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media