Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Big data and extreme-scale computing

Published: 01 July 2018 Publication History

Abstract

Over the past four years, the Big Data and Exascale Computing BDEC project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery introduced by the ongoing revolution in high-end data analysis HDA might be integrated with the established, simulation-centric paradigm of the high-performance computing HPC community. Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the methods for analyzing and using that data are radically reshaping the landscape of scientific computing. The most critical problems involve the logistics of wide-area, multistage workflows that will move back and forth across the computing continuum, between the multitude of distributed sensors, instruments and other devices at the networks edge, and the centralized resources of commercial clouds and HPC centers. We suggest that the prospects for the future integration of technological infrastructures and research ecosystems need to be considered at three different levels. First, we discuss the convergence of research applications and workflows that establish a research paradigm that combines both HPC and HDA, where ongoing progress is already motivating efforts at the other two levels. Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network in a scalable manner to meet the highly diverse requirements for processing, communication, and buffering/storage of massive data workflows of many different scientific domains. Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics. We close by offering some conclusions and recommendations for future investment and policy review.

References

[1]
Abraham A, Michael P, Milham A . 2017Deriving reproducible biomarkers from multi-site resting-state data: an autism-based example. NeuroImageVolume 147 : pp.736-–745.
[2]
Albrecht J 2016Challenges for the LHC Run 3: computing and algorithms. Presentation at International workshop on Advanced Computing and Analysis Techniques in physics research, <conf-date>January 2016</conf-date>, UTFSM, Valparaso, Chile.
[3]
Anderson T, Peterson L, Shenker S . 2005Overcoming the Internet impasse through virtualization. ComputerVolume 38 Issue 4: pp.34-–41.
[4]
Asch M, Bocquet M, Nodet M 2017Data Assimilation: Methods, Algorithms and Applications. Philadelphia, PA: SIAM.
[5]
Attig N, Gibbon P, Lippert T 2011Trends in supercomputing: the European path to exascale. Computer Physics CommunicationsVolume 182 Issue 9: pp.2041-–2046.
[6]
Baker VR 1996The pragmatic roots of American quaternary geology and geomorphology. Geomorphology, Volume 16 Issue 3:pp.197-–215.
[7]
Baker AH, Xu H, Dennis JM . 2014A methodology for evaluating the impact of data compression on climate simulation data. In: Proceedings of the 23rd international symposium on high-performance parallel and distributed computing.
[8]
Banerjee S, Wu DO 2013Final report from the NSF workshop on future directions in wireless networking. Washington, USA : National Science Foundation.
[9]
Bassi A, Beck M, Fagg G . 2002The Internet backplane protocol: a study in resource sharing. In: Cluster computing and the grid, 2nd IEEE/ACM international symposium on, pp. pp.194-–194.
[10]
Bastug E, Bennis M, Debbah M 2014Living on the edge: the role of proactive caching in 5G wireless networks. IEEE Communications Magazine, Volume 52 Issue 8:pp.82-–89.
[11]
Beck M 2016On the hourglass model, the end-to-end principle and deployment scalability.
[12]
Beck M, Moore T, Luszczek P 2017Interoperable convergence of storage, networking and computation. .
[13]
Bellucci F, Pietarinen AV 2017Charles Sanders Peirce: Logic, <ext-link ext-link-type="uri" xlink:href="http://www.iep.utm.edu/peir-log/">http://www.iep.utm.edu/peir-log/</ext-link>.
[14]
Bennett JC, Abbasi H, Bremer P-T . 2012Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In: Proceedings of the international conference on high performance computing, networking, storage and analysis SC '12, pp. pp.1-–9. Salt Lake City, UT, USA: IEEE Computer Society Press.
[15]
Bethel EW, Greenwald M, van Dam KK . 2016Management, analysis, and visualization of experimental and observational data-the convergence of data and computing. In: e-Science e-Science, 2016 IEEE 12th international conference on, Baltimore, MD, USA</publisher-loc>, <conf-date>23-27 October 2016</conf-date>. pp. pp.213-–222. <publisher-loc>Piscataway: IEEE.
[16]
Bonomi F, Milito R, Zhu J . 2012Fog computing and its role in the Internet of things. In: Proceedings of the first edition of the MCC workshop on mobile cloud computing, Helsinki, Finland, pp. pp.13-–16. New York: ACM.
[17]
Calyam P, Ricart G 2016 In: NSF Workshop on Applications and Services in the Year 2021. Washington, DC, <conf-date>2016</conf-date>. : <ext-link ext-link-type="uri" xlink:href="https://asw2016.wordpress.com/">https://asw2016.wordpress.com/</ext-link>
[18]
Cao VH, Chu KX, Le-Khac NA . 2015Toward a new approach for massive LiDAR data processing. In: Spatial data mining and geographical knowledge services ICSDM, 2015 2nd IEEE international conference on, Fuzhou, China, pp. pp.135-–140. Piscataway: IEEE.
[19]
Chang WL 2015NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey. Gaithersburg: Special Publication NIST SP-1500-5.
[20]
Chard K, Caton S, Rana O . 2012A social content delivery network for scientific cooperation: vision, design, and architecture. In: High performance computing, networking, storage and analysis SCC, 2012 SC companion, Salt Lake City, UT, pp. pp.1058-–1067. Piscataway: IEEE.
[21]
Chen M, Mao S, Liu Y 2014Big data: a survey. Mobile Networks and ApplicationsVolume 19 Issue 2: pp.171-–209.
[22]
2015Forecast and methodology, 2014-2019. Cisco White Paper.
[23]
Clark D 1988The design philosophy of the DARPA Internet protocols. ACM SIGCOMM Computer Communication ReviewVolume 18 Issue 4: pp.106-–114.
[24]
Clark DD 1997The unpredictable certainty: Information infrastructure through 2000. Washington, DC: National Academy Press.
[25]
Dennard RH, Gaensslen F, Yu H-N . 1974Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid State CircuitsVolume 9 Issue 5: pp.38-–50.
[26]
Desprez F, Lebre A 2016Research issues for future cloud infrastructures: Inria position paper. .
[27]
Dinov ID, Petrosyan P, Liu Z . 2014The perfect neuroimaging-genetics-computation storm: collision of petabytes of data, millions of hardware devices and thousands of software tools. Brain Imaging and BehaviorVolume 8 Issue 2: pp.311.
[28]
Dongarra J, Beckman P, Moore T . 2011The international exascale software project roadmap. International Journal of High Performance Computing ApplicationsVolume 25 Issue 1: pp.3-–60. .
[29]
Duranton M, Bosschere KD, Gamrat C . 2017HiPEAC Vision 2017. Technical Report, H2020 HiPEAC CSA. . HiPEAC network of excellence, pp.–138.
[30]
2015a National strategic computing initiative executive order.
[31]
2015b National strategic computing initiative fact sheet. .
[32]
Eyink G, Vishniac E, Lalescu C . 2013Flux-freezing breakdown in high-conductivity magnetohydrodynamic turbulence. NatureVolume 0 Issue 497: pp.466-–469.
[33]
Fälström P 2016, Market-Driven Challenges to Open Internet Standards. .
[34]
Feynman RP 1967The Character of Physical Law, Vol. Volume 66 . Cambridge, UK: MIT press.
[35]
Flach PA, Hadjiantonis AM 2013Abduction and Induction: Essays on their Relation and Integration, Vol. Volume 18 . Berlin, Germany: Springer Science & Business Media.
[36]
Foster I, Kesselman C, Tuecke S 2001The anatomy of the grid: enabling scalable virtual organizations. The International Journal of High Performance Computing ApplicationsVolume 15 Issue 3: pp.200-–222.
[37]
Fox G, Qiu J, Jha S . 2016White paper: Big data, simulations and HPC convergence. <ext-link ext-link-type="uri" xlink:href="http://dsc.soic.indiana.edu/publications/HPCBigDataConvergence.Summary_IURutgers.pdf">http://dsc.soic.indiana.edu/publications/HPCBigDataConvergence.Summary_IURutgers.pdf</ext-link> accessed 01 March 2018.
[38]
Fox G, Shantenu J, Ramakrishnan L 2016Final report: first workshop on streaming and steering applications: Requirements and infrastructure. .
[39]
Fu S, Liu J, Chu X . 2016Toward a standard interface for cloud providers: the container as the narrow waist. IEEE Internet ComputingVolume 20 Issue 2: pp.66-–71.
[40]
Gelenbe E, Caseau Y June 2015The impact of information technology on energy consumption and carbon emissions. Ubiquity New York: ACM. .
[41]
Gleckler PJ, Durack PJ, Stouffer RJ . 2016Industrial-era global ocean heat uptake doubles in recent decades. Nature Climate ChangeVolume 6 : pp.394-–398.
[42]
Gorenberg M, Schmidt E, Mundie C 2016Report to the President: Technology and the Future of Cities. Washington, DC, USA: President's Council of Science and Technology Advisors, pp. pp.1-–99.
[43]
Grady NW, Underwood M, Roy A . 2014Big data: challenges, practices and technologies: NIST big data public working group workshop at IEEE big data 2014. In: Big data big data, 2014 IEEE international conference on, Washington, DC, USA, pp. pp.11-–15. Piscataway: IEEE.
[44]
Hagel J, Brown JS 2017Shaping strategies for the IoT. ComputerVolume 50 Issue 8:pp.64-–68.
[45]
Hagel J, Brown JS, Davison L 2008Shaping strategy in a world of constant disruption. Harvard Business ReviewVolume 86 Issue 10: pp.80-–89.
[46]
Hashem IAT, Yaqoob I, Anuar NB . 2015The rise of "big data" on cloud computing: review and open research issues. Information SystemsVolume 47 : pp.98-–115.
[47]
Hey T, Trefethen A 2003The data deluge: an e-Science perspective. In: Berman F, Fox GC, Hey AJG eds Grid Computing: Making the Global Infrastructure a Reality. Hoboken: Wiley and Sons, pp. pp.809-–824.
[48]
Hey T, Tansley S, Tolle KM 2007Jim Gray on eScience: a transformed scientific method. .
[49]
Honavar VG, Hill MD, Yelick K 2016Accelerating science: a computing research agenda. .
[50]
Hu YC, Patel M, Sabella D . 2015Mobile edge computing? A key technology towards 5G. ETSI White PaperVolume 11 Issue 11: pp.1-–16.
[51]
Karpatne A, Atluri G, Faghmous J . 2016Theory-guided data science: a new paradigm for scientific discovery. IEEE Transactions on Knowledge and Data Engineering, Volume 29 Issue 10: pp.2318-–2331.
[52]
Kavassalis P, Solomon RJ, Benghozi PJ 1996The Internet: a paradigmatic rupture in cumulative telecom evolution. Industrial and Corporate ChangeVolume 5 Issue 4: pp.1097-–1126.
[53]
Kuntschke R, Scholl T, Huber S . 2006Grid-based data stream processing in e-Science. In: Second international conference on e-Science and grid technologies e-Science 2006, pp. pp.4-–6 .
[54]
Leiner BM, Cerf VG, Clark DD . 2009A brief history of the Internet. SIGCOMM Computer Communication ReviewVolume 39 Issue 5: pp.22-–31.
[55]
Li Y, Perlman E, Wan M . 2008A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence. Journal of TurbulenceVolume 9 : pp.N31.
[56]
Lu XY, Liang F, Wang B 2014DataMPI: extending MPI to Hadoop-like big data computing. In: International parallel and distributed processing symposium, Phoenix, AZ, USA, <conf-date>19-23 May 2014</conf-date>, pp. pp.829-–838.
[57]
Luu H, Winslett M, Gropp W . 2015A multiplatform study of I/O behavior on petascale supercomputers. In Proceedings of HPDC'15, 15-19 June, 2015. .
[58]
McGeer R, Berman M, Elliott C . eds 2016The GENI Book. Berlin, Germany: Springer. .
[59]
Messerschmitt DG, Szyperski C 2005Software Ecosystem: Understanding an Indispensable Technology and Industry. Cambridge, UK: MIT Press.
[60]
Miyoshi T, Kunii M, Ruiz J . 2016Big data assimilation revolutionizing severe weather prediction. Bulletin of the American Meteorological SocietyVolume 97 Issue 8: pp.1347-–1354.
[61]
Nahrstedt K, Cassandras C, Catlett C 2017a City-scale intelligent systems and platforms. Online. .
[62]
Ni J, Tsang DHK 2005Large-scale cooperative caching and application-level multicast in multimedia content delivery networks. Communications Magazine, IEEEVolume 43 Issue 5: pp.98-–105.
[63]
Papagianni C, Leivadeas A, Papavassiliou S 2013A cloud-oriented content delivery network paradigm: modeling and assessment. Dependable and Secure Computing, IEEE Transactions onVolume 10 Issue 5: pp.287-–300.
[64]
Reed DA, Dongarra J 2015Exascale computing and big data. Communication. ACMVolume 58 Issue 7: pp.56-–68.
[65]
Satyanarayanan M 2017The emergence of edge computing. ComputerVolume 50 Issue 1: pp.30-–39.
[66]
Satyanarayanan M, Bahl P, Caceres R . 2009The case for VM-based cloudlets in mobile computing. IEEE Pervasive ComputingVolume 8 Issue 4: pp.14-–23.
[67]
Shi W, Cao J, Zhang Q . 2016Edge computing: vision and challenges. IEEE Internet of Things JournalVolume 3 Issue 5: pp.637-–646.
[68]
Tennenhouse DL, Wetherall DJ 1996Towards an active network architecture. Computer Communication ReviewVolume 26 : pp.5-–18.
[69]
ur Rehman MH, Liew CS, Abbas A . 2016Big data reduction methods: a survey. Data Science and EngineeringVolume 1 Issue 4: pp.265-–284.
[70]
Wang S, Zhang X, Zhang Y . 2017A survey on mobile edge networks: convergence of computing, caching and communications. IEEE AccessVolume 5 : pp.6757-–6779.
[71]
2017a Ken batcher - Wikipedia, the free encyclopedia. .
[72]
2017b Lidar - Wikipedia, the free encyclopedia. .
[73]
2017c Multi-messenger astronomy - Wikipedia, the free encyclopedia. .
[74]
Williams DN, Ananthakrishnan R, Bernholdt DE . 2008Data management and analysis for the earth system grid. Journal of Physics: Conference SeriesVolume 1250 Issue 1: pp.012072. .
[75]
Xu ZW, Chi XB, Xiao N 2016High-performance computing environment: a review of twenty years of experiments in China. National Science ReviewVolume 3 Issue 1: pp.36-–48.

Cited By

View all
  • (2024)Practicable live container migrations in high performance computing cloudsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103157152:COnline publication date: 1-Jul-2024
  • (2023)INDIANA—In-Network Distributed Infrastructure for Advanced Network ApplicationsInternational Journal of High Performance Computing Applications10.1177/1094342023117966237:3-4(442-461)Online publication date: 1-Jul-2023
  • (2023)End-to-End Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing, and Machine LearningProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624283(2042-2052)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 July 2018

Author Tags

  1. Big data
  2. extreme-scale computing
  3. future software
  4. high-end data analysis
  5. traditional HPC

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Practicable live container migrations in high performance computing cloudsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103157152:COnline publication date: 1-Jul-2024
  • (2023)INDIANA—In-Network Distributed Infrastructure for Advanced Network ApplicationsInternational Journal of High Performance Computing Applications10.1177/1094342023117966237:3-4(442-461)Online publication date: 1-Jul-2023
  • (2023)End-to-End Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing, and Machine LearningProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624283(2042-2052)Online publication date: 12-Nov-2023
  • (2023)Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMAEuro-Par 2023: Parallel Processing10.1007/978-3-031-39698-4_22(323-338)Online publication date: 28-Aug-2023
  • (2022)Reshaping geostatistical modeling and prediction for extreme-scale environmental applicationsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571888(1-12)Online publication date: 13-Nov-2022
  • (2022)Task-parallel in situ temporal compression of large-scale computational fluid dynamics dataInternational Journal of High Performance Computing Applications10.1177/1094342022108500036:3(388-418)Online publication date: 1-May-2022
  • (2022)A Study of Industrial Convergence in the Context of Digital Economy Based on Scientific Computing VisualizationMobile Information Systems10.1155/2022/40258752022Online publication date: 1-Jan-2022
  • (2022)Pipeline risk big data intelligent decision-making system based on machine learning and situation awarenessNeural Computing and Applications10.1007/s00521-021-06738-534:18(15221-15239)Online publication date: 1-Sep-2022
  • (2021)The Choice of Multimodal Transport Mode of Agricultural By-Product Logistics in Land-Sea New Corridor in Western China Based on Big DataWireless Communications & Mobile Computing10.1155/2021/18806892021Online publication date: 1-Jan-2021
  • (2021)Clothing Material Design Concept Based on Big Data and Information Technology2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture10.1145/3495018.3495374(1244-1247)Online publication date: 23-Oct-2021
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media