Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2949550.2949556acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

BIC-LSU: Big Data Research Integration with Cyberinfrastructure for LSU

Published: 17 July 2016 Publication History

Abstract

In recent years, big data analysis has been widely applied to many research fields including biology, physics, transportation, and material science. Even though the demands for big data migration and big data analysis are dramatically increasing in campus IT infrastructures, there are several technical challenges that need to be addressed. First of all, frequent big data transmission between storage systems in different research groups imposes heavy burdens on a regular campus network. Second, the current campus IT infrastructure is not designed to fully utilize the hardware capacity for big data migration and analysis. Last but not the least, running big data applications on top of large-scale high-performance computing facilities is not straightforward, especially for researchers and engineers in non-IT disciplines.
We develop a campus IT cyberinfrastructure for big data migration and analysis, called BIC-LSU, which consists of a task-aware Clos OpenFlow network, high-performance cache storage servers, customized high-performance transfer applications, a light-weight control framework to manipulate existing big data storage systems and job scheduling systems, and a comprehensive social networking-enabled web portal. BIC-LSU achieves 40Gb/s disk-to-disk big data transmission, maintains short average transmission task completion time, enables the convergence of control on commonly deployed storage and job scheduling systems, and enhances easiness of big data analysis with a universal user-friendly interface. BIC-LSU software requires minimum dependencies and has high extensibility. Other research institutes can easily customize and deploy BIC-LSU as an augmented service on their existing IT infrastructures.

References

[1]
M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal Near-optimal Datacenter Transport. In SIGCOMM '13, pages 435--446, 2013.
[2]
CERN. Fast Data Transfer, 2015.
[3]
M. Chowdhury, Y. Zhong, and I. Stoica. Efficient Coflow Scheduling with Varys. In SIGCOMM '14, pages 443--454, 2014.
[4]
C. Clos. A study of non-blocking switching networks. The Bell System, Technical Journal, 32(2):406--424, March 1953.
[5]
DiaGrid. Deploying Simulation/Modeling Tools, 2016.
[6]
DiaGrid. DiaGrid, 2016.
[7]
EGI. Long-tail of science, 2016.
[8]
Elgg. Elgg, 2016.
[9]
Ensembl. Accessing Ensembl Data, 2016.
[10]
ESNet. ESnet Data Transfer Nodes, 2014.
[11]
N. S. Foundation. CC-NIE, 2012.
[12]
GENI. OpenFlow Discovery Protocol, 2010.
[13]
T. Gregory. Animal Genome Size Database, 2016.
[14]
A. Hanushevsky. BBCP, 2015.
[15]
iRODS. Integrated Rule-Oriented Data System, 2016.
[16]
iRODS. iRODS Consortium Members, 2016.
[17]
JCraft. JSch - Java Secure Channel, 2016.
[18]
G. K. Lockwood, M. Tatineni, and R. Wagner. Storage Utilization in the Long Tail of Science. In XSEDE '15, pages 32:1--32:8, 2015.
[19]
LONI. Louisiana Optical Network Initiative, 2016.
[20]
LSU. myLSU, 2016.
[21]
T. Madden. The blast sequence analysis tool. 2013.
[22]
R. L. Moore, C. Baru, D. Baxter, G. C. Fox, A. Majumdar, P. Papadopoulos, W. Pfeiffer, R. S. Sinkovits, S. Strande, M. Tatineni, R. P. Wagner, N. Wilkins-Diehr, and M. L. Norman. Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science. In XSEDE '14, pages 39:1--39:8, 2014.
[23]
D. Shah. Maximal matching scheduling is good enough. In Global Telecommunications Conference, 2003. GLOBECOM '03. IEEE, volume 6, pages 3009--3013 vol.6, Dec 2003.
[24]
Shibboleth. Shibboleth, 2016.
[25]
M. Shreedhar and G. Varghese. Efficient fair queueing using deficit round robin. SIGCOMM Comput. Commun. Rev., 25(4):231--242, Oct. 1995.
[26]
J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. Jones, and I. Birol. Abyss: a parallel assembler for short read sequence data. Genome research, 19(6):1117--1123, 2009.
[27]
TCGA. TCGA Data Portal, 2016.
[28]
G. Toolkit. Globus Toolkit, 2016.
[29]
XSEDE. High Performance Computing, 2016.
[30]
XSEDE. Science Gateways, 2016.

Cited By

View all
  • (2017)We Have an HPC SystemPractice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact10.1145/3093338.3093341(1-8)Online publication date: 9-Jul-2017
  • (2017)Minimal Coflow Routing and Scheduling in OpenFlow-Based Cloud Storage Area Networks2017 IEEE 10th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2017.36(222-229)Online publication date: Jun-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale
July 2016
405 pages
ISBN:9781450347556
DOI:10.1145/2949550
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big data
  2. science gateway
  3. software-defined networking
  4. solid-state drive storage server
  5. task-aware network scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

XSEDE16

Acceptance Rates

Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)We Have an HPC SystemPractice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact10.1145/3093338.3093341(1-8)Online publication date: 9-Jul-2017
  • (2017)Minimal Coflow Routing and Scheduling in OpenFlow-Based Cloud Storage Area Networks2017 IEEE 10th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2017.36(222-229)Online publication date: Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media