Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1851476.1851534acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Providing a cloud network infrastructure on a supercomputer

Published: 21 June 2010 Publication History

Abstract

Supercomputers and clouds both strive to make a large number of computing cores available for computation. More recently, similar objectives such as low-power, manageability at scale, and low cost of ownership are driving a more converged hardware and software. Challenges remain, however, of which one is that current cloud infrastructure does not yield the performance sought by many scientific applications. A source of the performance loss comes from virtualization and virtualization of the network in particular. This paper provides an introduction and analysis of a hybrid supercomputer software infrastructure, which allows direct hardware access to the communication hardware for the necessary components while providing the standard elastic cloud infrastructure for other components.

References

[1]
}}Cray XT5. http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf.
[2]
}}libmemcached. http://tangent.org/552/libmemcached.html.
[3]
}}Netperf. http://www.netperf.org/netperf/.
[4]
}}Rackable MicroSlice#8482; Architecture and Products. http://www.rackable.com/products/microslice.aspx?nid=servers_5.
[5]
}}ZeptoOS - The Small Linux for Big Computers. http://www.mcs.anl.gov/research/projects/zeptoos/.
[6]
}}N. R. Adiga, M. A. Blumrich, D. Chen, P. Coteus, A. Gara, M. E. Giampapa, P. Heidelberger, S. Singh, B. D. Steinmacher-Burow, T. Takken, M. Tsao, and P. Vranas. Blue Gene/L torus interconnection network. IBM Journal of Research and Development, 49(2/3):265--276, 2005.
[7]
}}D. P. Agrawal and W. E. Alexander. B-HIVE: A heterogeneous, interconnected, versatile and expandable multicomputer system. ACM Computer Architecture News, 12(2):7--13, June 1984.
[8]
}}J. Appavoo, V. Uhlig, and A. Waterland. Project Kittyhawk: building a global-scale computer: Blue Gene/P as a generic computing platform. 42(1):77--84, Jan 2008.
[9]
}}M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, Feb 2009.
[10]
}}A. Barak and R. Wheeler. MOSIX: An integrated multiprocessor UNIX. In Proc. of the Winter 1989 USENIX Conference, San Diego, CA., Jan.-Feb. 1989.
[11]
}}L. A. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Synthesis Lectures on Computer Architecture. Morgan & Claypool, 2009.
[12]
}}P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, and A. Nataraj. Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Computing, 11(1):3--16, 2008.
[13]
}}E. Bugnion, S. Devine, and M. Rosenblum. Disco: Running commodity operating systems on scalable multiprocessors. In Proc. of the 16th Symposium on Operating System Principles, Saint Malo, France, Oct. 1997.
[14]
}}G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. of the 21st ACM Symposium on Operating Systems Principles (SOSP), Stevenson, Washington, Oct. 2007. ACM.
[15]
}}B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124):5, 2004.
[16]
}}A. Ganguly, A. Agrawal, P. O. Boykin, and R. J. Figueiredo. IP over P2P: Enabling self-configuring virtual IP networks for grid computing. In IPDPS'06: Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece, Apr. 2006. IEEE Computer Society. U. Florida, USA.
[17]
}}A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2: a scalable and flexible data center network. SIGCOMM Comput. Commun. Rev., 39(4):51--62, 2009.
[18]
}}E. V. Hensbergen and R. Minnich. System support for many task computing. In Proc. of the Workshop on Many-Task Computing on Grids and Supercomputers, 2008 (MTAGS 2008). IEEE, Nov. 2008.
[19]
}}K. Iskra, J. W. Romein, K. Yoshii, and P. Beckman. Zoid: I/o-forwarding infrastructure for petascale architectures. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 153--162, New York, NY, USA, 2008. ACM.
[20]
}}O. Krieger, M. Auslander, B. Rosenburg, R. W. Wisniewski, J. Xenidis, D. D. Silva, M. Ostrowski, J. Appavoo, M. Butrico, M. Mergen, A. Waterland, and V. Uhlig. K42: Building a complete operating system. In Proc. of the First European Systems Conference, Leuven, Belgium, Apr. 2006.
[21]
}}S. J. Mullender, G. van Rossum, A. S. Tanenbaum, R. van Renesse, and H. van Staveren. Amoeba: A distributed operating system for the 1990s. Computer, 23(5):44--53, May 1990.
[22]
}}J. Napper and P. Bientinesi. Can cloud computing reach the top500? In UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, pages 17--20, New York, NY, USA, 2009. ACM.
[23]
}}S. Neuner. Scaling Linux to new heights: the SGI Altix 3000 system. Linux Journal, 106, Feb. 2003.
[24]
}}R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. Portland: a scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pages 39--50, New York, NY, USA, 2009. ACM.
[25]
}}D. Presotto, R. Pike, K. Thompson, H. Trickey, and P. Winterbottom. Plan 9, A distributed system. In Proc. of the Spring EurOpen'91 Conference, Tromso, May 1991.
[26]
}}A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware 2001, IFIP/ACM International Conference on Distributed Systems Platforms, Heidelberg, Germany, 2001.
[27]
}}P. Ruth, X. Jiang, D. Xu, and S. Goasguen. Virtual distributed environments in a shared infrastructure. Computer, 38:63--69, 2005.
[28]
}}J. P. Singh, T. Joe, A. Gupta, and J. L. Hennessy. An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors. In ACM Supercomputing 93, 1993.
[29]
}}I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proc. of the ACM SIGCOMM 2001 Conference, Aug.
[30]
}}A. Sundararaj, A. Gupta, and P. Dinda. Dynamic topology adaptation of virtual networks of virtual machines. In Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, page 8. ACM, 2004.
[31]
}}W. Vogels. A Head in the Cloud: The Power of Infrastructure as a Service. http://www.youtube.com/watch?v=9AS8zzUaO3Y, 2008.
[32]
}}Z. Vranesic, S. Brown, M. Stumm, S. Caranci, A. Grbic, R. Grindley, M. Gusat, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian, Z. Zilic, T. Abdelrahman, B. Gamsa, P. Pereira, K. Sevcik, A. Elkateeb, and S. Srbljic. The NUMAchine multiprocessor. Technical Report 324, University of Toronto, Apr. 1995.
[33]
}}G. Wang and E. Ng. The impact of virtualization on network performance of amazon ec2 data center. In INFOCOM'10: Proceedings of The IEEE Conference on Computer Communications. IEEE, 2010.
[34]
}}Wikipedia. Virtual private network --- wikipedia, the free encyclopedia, 2010. {Online; accessed 12-March-2010}.
[35]
}}K. Yoshii, K. Iskra, H. Naik, P. Beckmanm, and P. C. Broekema. Characterizing the performance of big memory on blue gene linux. Parallel Processing Workshops, International Conference on, 0:65--72, 2009.
[36]
}}L. Yousef, M. Butrico, and D. DaSilva. Towards a unified ontology of cloud computing. In GCE08: Proceedings of The IEEE Conference on Computer Communications. IEEE, 2008.

Cited By

View all
  • (2024)RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy InterfaceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340439435:8(1488-1505)Online publication date: Aug-2024
  • (2021)Deterministic latency networks: the enabler of edge data center synchronous operation [Invited]Journal of Optical Communications and Networking10.1364/JOCN.42579413:9(D115)Online publication date: 19-Aug-2021
  • (2019)HPC as a Service: A naïve model2019 8th International Conference on Information and Communication Technologies (ICICT)10.1109/ICICT47744.2019.9001912(174-179)Online publication date: Nov-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high performance cloud computing
  2. high performance computing
  3. supercomputer network models
  4. supercomputing infrastructure as a service
  5. user-level networking

Qualifiers

  • Research-article

Conference

HPDC '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy InterfaceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340439435:8(1488-1505)Online publication date: Aug-2024
  • (2021)Deterministic latency networks: the enabler of edge data center synchronous operation [Invited]Journal of Optical Communications and Networking10.1364/JOCN.42579413:9(D115)Online publication date: 19-Aug-2021
  • (2019)HPC as a Service: A naïve model2019 8th International Conference on Information and Communication Technologies (ICICT)10.1109/ICICT47744.2019.9001912(174-179)Online publication date: Nov-2019
  • (2019)Towards Adaptive Replication for Hot/Cold Blocks in HDFS using MemCached2019 2nd International Conference on Data Intelligence and Security (ICDIS)10.1109/ICDIS.2019.00035(188-194)Online publication date: Jun-2019
  • (2019)Improving the Performance of MongoDB with RDMA2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00144(1004-1010)Online publication date: Aug-2019
  • (2018)Standards of the interoperability in a high-performance environmentProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2018-9-4-383-3979:4(383-397)Online publication date: 2018
  • (2018)Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPIOpenSHMEM and Related Technologies. Big Compute and Big Data Convergence10.1007/978-3-319-73814-7_8(114-129)Online publication date: 10-Jan-2018
  • (2017)High-Performance Key-Value Store On OpenSHMEMProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.49(559-568)Online publication date: 14-May-2017
  • (2016)Improving the performance of HDFS by reducing I/O using adaptable I/O system2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)10.1109/ICEEOT.2016.7755280(3139-3144)Online publication date: Mar-2016
  • (2016)SHMemCache: Enabling Memcached on the OpenSHMEM Global Address ModelOpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments10.1007/978-3-319-50995-2_9(131-145)Online publication date: 15-Dec-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media