Abstract
Service-oriented architecture (SOA), workflow, the Semantic Web, and Grid computing are key enabling information technologies in the development of increasingly sophisticated e-Science infrastructures and application platforms. While the emergence of Cloud computing as a new computing paradigm has provided new directions and opportunities for e-Science infrastructure development, it also presents some challenges. Scientific research is increasingly finding that it is difficult to handle “big data” using traditional data processing techniques. Such challenges demonstrate the need for a comprehensive analysis on using the above-mentioned informatics techniques to develop appropriate e-Science infrastructure and platforms in the context of Cloud computing. This survey paper describes recent research advances in applying informatics techniques to facilitate scientific research particularly from the Cloud computing perspective. Our particular contributions include identifying associated research challenges and opportunities, presenting lessons learned, and describing our future vision for applying Cloud computing to e-Science. We believe our research findings can help indicate the future trend of e-Science, and can inform funding and research directions in how to more appropriately employ computing technologies in scientific research. We point out the open research issues hoping to spark new development and innovation in the e-Science field.
Similar content being viewed by others
Notes
The Diamond Light Source—http://www.diamond.ac.uk.
MashMyData http://www.mashmydata.org.
OPeNDAP, http://www.opendap.org.
Web Processing Service, http://www.opengeospatial.org/standards/wps.
At the time of writing this paper, a paper for addressing this multi-step delegation problem is in preparation.
OAuth, http://oauth.net/.
W3C—http://www.w3.org/.
Turtle—Terse RDF Triple Language—http://www.w3.org/TeamSubmission/turtle/.
RDF Test Cases (N-Triples)—http://www.w3.org/TR/rdf-testcases/#ntriples.
OWL 2 Web Ontology Language—http://www.w3.org/TR/owl2-overview/.
RDF Schema—http://www.w3.org/TR/rdf-schema/.
SPARQL Query Language for RDF—http://www.w3.org/TR/rdf-sparql-query/.
A web server returns a representation of a resource based on the HTTP-Accept header of a client request.
Advanced Climate Research Infrastructure for Data (ACRID)—http://www.cru.uea.ac.uk/cru/projects/acrid/.
The Digital Object Identifier (DOI) System—http://www.doi.org/.
Open Archives Initiative Object Reuse and Exchange (OAI-ORE)—http://www.openarchives.org/ore/.
Open Archives Initiative Object Reuse and Exchange http://www.openarchives.org/ore/.
W3C Provenance Working Group http://www.w3.org/2011/prov/wiki/Main_, Page accessed 18 Dec 2011.
Semantic Publishing and Referencing Ontologies (SPAR) http://purl.org/spar/page. Accessed 18 Dec 2011.
PROV-O: The PROV Ontology http://www.w3.org/TR/prov-o/.
Open Annotation Collaboration http://www.openannotation.org/.
myGrid project—http://www.mygrid.org.uk.
Persistent Uniform Resource Locators http://purl.oclc.org/docs/index.html.
Digital Object Identifier http://www.doi.org/.
The Friend of a Friend (FOAF) vocabulary—http://xmlns.com/foaf/spec/.
Semantically Interlinked Online Communities (SIOC)—http://sioc-project.org/ontology.
Simple Knowledge Organization System Reference (SKOS)—http://www.w3.org/TR/swbp-skos-core-spec.
The Gene Ontology Project http://www.geneontology.org/.
Geography Markup Language http://www.opengeospatial.org/standards/gml.
JISC Biophysical Repositories in the Lab project (BRIL), http://www.jisc.ac.uk/whatwedo/programmes/inf11/digpres/bril.
Eduserve Managed Hosting and Cloud, http://www.eduserv.org.uk/hosting.
UK National Grid Service, http://www.ngs.ac.uk.
CDMI standard, SNIA, http://www.snia.org/cdmi.
DuraCloud, DuraSpace, http://duracloud.org.
A. Kumbhare, Y. Simmhan, V. Prasanna, Designing a secure storage repository for sharing scientific datasets using public clouds, http://ceng.usc.edu/~simmhan/pubs/kumbhare-datacloud-2011.pdf.
Personal data in the Cloud: a global survey of consumer attitudes. Fujitsu Research Institute: http://www.fujitsu.com/downloads/SOL/fai/reports/fujitsupersonaldata-in-the-cloud.pdf.
O. Qing Zhang, M. Kirchberg, R. K. L. Ko, B. S. Lee, How to track your data: the case for cloud computing provenance, HP Laboratories HPL-2012-11, http://www.hpl.hp.com/techreports/2012/HPL-2012-11.pdf.
MPEG-21 standard, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm.
EU FP7 project Contrail, http://contrail-project.eu/.
EU-FP7 Project VENUS-C, http://www.venus-c.eu.
Cloud Foundry, Open Source PaaS, http://www.cloudfoundry.com.
European Commission e-Infrastructure, European Grid Initiative, http://www.egi.eu.
References
Yang X, Wang L, von Laszewski G (2009) Recent research advances in e-Science. Cluster Comput (special issue). http://springerlink.com/content/f058408qr771348q/
Yang X, Wang L et al (2011) Guide to e-Science: next generation scientific research and discovery. Springer, Berlin
Hey AJG, Trefethen AE (2003) In: Berman F, Fox GC, Hey AJG (eds) The data deluge: an e-Science perspective, in grid computing–making the global infrastructure a reality. Wiley, New York, pp 809–824
Sutter JP, Alcock SG, Sawhney KJS (2011) Automated in-situ optimization of bimorph mirrors at diamond light source. In: Proc. SPIE 8139, 813906. doi:10.1117/12.892719.
Voss A, Meer EV, Fergusson D (2008) Research in a connected world (Edited book). http://www.lulu.com/product/ebook/research-in-a-connected-world/17375289
Zhang L, Zhang J, Cai H (2007) Services computing: core enabling technology of the modern services industry. Springer, New York
Yang X, Dove M, Bruin R et al (2010) A service-oriented framework for running quantum mechanical simulation for material properties over grids. IEEE Trans Syst Man Cybern Part C Appl Rev 40(3)
Yang X, Bruin R, Dove M (2010) User-centred design practice for grid-enabled simulation in e-Science. New Gener Comput 28(2):147–159. doi:10.1007/s00354-008-0082-4, Springer
Hamre T, Sandven S (2011) Open service network for marine environmental data. EuroGOOS, Sopot
Browdy SF (2011) GEOSS common infrastructure: internal structure and standards. GeoViQua First Workshop, Barcelona
Yang X, Dove M, Bruin R, Walkingshaw A, Sinclair R, Wilson DJ, Murray-Rust P (2012) An e-Science data infrastructure for simulations within grid computing environment: methods, approaches, and practice. Concurr Comput Pract Exp.
Yang X (2011) QoS-oriented service computing: bring SOA into cloud environment. In: Liu X, Li Y (eds) Advanced design approaches to emerging software systems: principles, methodology and tools. IGI Global USA
Zhang S, Wang W, Wu H, Vasilakos AV, Liu P (2013) Towards transparent and distributed workload management for large scale web servers. Future Generation Comp Syst 29(4):913–925
Yang X, Nasser B, Surridge M, Middleton S (2012) A business-oriented cloud federation model for real-time applications. Elsevier, Amsterdam, Future generation computer systems. doi:10.1016/j.future.2012.02.005
Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2005) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp 18(10):1039–1065
Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054, Oxford University Press, London.
Taylor I, Shields M, Wang I, Harrison A (2007) The Triana workflow environment: architecture and applications. In: Taylor I, Deelman E, Gannon D, Shields M (eds) Workflows for e-Science. Springer, New York, pp 320–339
Deelman E, Mehta G, Singh G, Su M, Vahi K (2007) Pegasus: mapping large-scale workflows to distributed resources. In: Taylor I, Deelman E, Gannon D, Shields M (eds) Workflows for e-Science. Springer, New York, pp 376–394
Fahringer T, Jugravu A, Pllana S, Prodan R, Seragiotto Jr, C, Truong H (2005) ASKALON: a tool set for cluster and Grid computing. Concurr Comput Pract Exp 17(2–4):143–169, Wiley InterScience.
Zhao Y, Hategan M, Clifford B, Foster I, von Laszewski G, Nefedova V, Raicu I, Stef-Praun T, Wilde M (2007) Swift: fast, reliable, loosely coupled parallel computation. Proceedings of 2007 IEEE congress on services (Services 2007), pp 199–206.
Yang X, Bruin R, Dove M (2010) Developing an end-to-end scientific workflow: a case study of using a reliable, lightweight, and comprehensive workflow platform in e-Science. doi:10.1109/MCSE.2009.211.
Ludäscher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, Roure DD, Freire J, Goble C, Jones M, Klasky S, McPhillips T, Podhorszki N, Silva C, Taylor I, Vouk M (2009) Scientific process automation and workflow management. In Shoshani A, Rotem D (eds) Scientific data management: challenges, existing technology, and deployment, computational science series. Chapman & Hall/CRC, pp 476–508.
Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540
Taylor I, Deelman E, Gannon D, Shields M (eds) (2007) Workflows for e-Science. Springer, New York, ISBN: 978-1-84628-519-6.
Yu Y (2006) Buyya R (2006) A taxonomy of workflow management systems for grid computing. J Grid Comput 3:171–200
Wang J, Korambath P, Kim S, Johnson S, Jin K, Crawl D, Altintas I, Smallen S, Labate B, Houk KN (2011) Facilitating e-science discovery using scientific workflows on the grid. In: Yang X, Wang L, Jie W (eds) Guide to e-Science: next generation scientific research and discovery. Springer, Berlin, pp 353–382. ISBN 978-0-85729-438-8
MacLennan, BJ (1992) Functional programming: practice and theory. Addison-Wesley.
Plale B, Gannon D, Reed DA, Graves SJ, Droegemeier K, Wilhelmson R, Ramamurthy M (2005) Towards dynamically adaptive weather analysis and forecasting in LEAD. In: International conference on computational science (2), pp 624–631.
Wang J, Crawl D, Altintas I (2012) A framework for distributed data-parallel execution in the Kepler scientific workflow system. In: Proceedings of 1st international workshop on advances in the Kepler scientific workflow system and its applications at ICCS 2012 conference.
Islam M, Huang A, Battisha M, Chiang M, Srinivasan S, Peters C, Neumann A, Abdelnur A (2012) Oozie: towards a scalable workflow management system for hadoop. In: Proceedings of the 1st international workshop on scalable workflow enactment engines and technologies (SWEET’12).
El-Rewini H, Lewis T, Ali H (1994) Task scheduling in parallel and distributed systems. PTR Prentice Hall, ISBN: 0-13-099235-6.
Yu J, Buyya R, Ramamohanarao K (2008) Workflow scheduling algorithms for grid computing. In: Xhafa F, Abraham A (eds) Metaheuristics for scheduling in distributed computing environments. Springer, Berlin, pp 173–214. ISBN 978-3-540-69260-7
Dong F, Akl S (2006) Scheduling algorithms for grid computing: state of the art and open problems, Technical Report 2006–504. Queen’s University.
Wieczorek M, Prodan R, Fahringer T (2005) Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Record 34(3):56–62
Wang J, Korambath P, Altintas I, Davis J, Crawl D (2014) Workflow as a service in the cloud: architecture and scheduling algorithms. In: Proceedings of international conference on computational science (ICCS 2014).
Vazirani VV (2003) Approximation algorithms. Springer, Berlin. ISBN 3-540-65367-8
Morton T, Pentico DW (1993) Heuristic scheduling systems: with applications to production systems and project management. Wiley, New York. ISBN 0-471-57819-3
Kosar T, Balman M (2009) A new paradigm: data-aware scheduling in grid computing. Future Gener Comput Syst 25(4):406–413
Yuan D, Yang Y, Liu X, Zhang G, Chen J (2012) A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurr Comput Pract Exp 24(9):956–976
Viana V, de Oliveira D, Mattoso M (2011) Towards a cost model for scheduling scientific workflows activities in cloud environments. IEEE World Congress on Services, pp 216–219.
Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis YE (2011) Schedule optimization for data processing flows on the Cloud. In: SIGMOD conference, pp 289–300.
De Roure D, Goble C, Stevens R (2009) The design and realisation of the myexperiment virtual research environment for social sharing of workflows. Future Gener Comput Syst 25:561–567. doi:10.1016/j.future.2008.06.010
Karasavvas K, Wolstencroft K, Mina E, Cruickshank D, Williams A, De Roure D, Goble C, Roos M (2012) Opening new gateways to workflows for life scientists. In: Gesing S et al. (eds) HealthGrid applications and technologies meet science gateways for life sciences. IOS Press, pp 131–141.
Terstyanszky G, Kukla T, Kiss T, Kacsuk P, Balasko A, Farkas Z (2014) Enabling scientific workflow sharing through coarse-grained interoperability. Future Gener Comput Syst 37:46–59, ISSN 0167–739X. doi:10.1016/j.future.2014.02.016.
Plankensteiner K, Montagnat J, Prodan R (2011) IWIR: a language enabling portability across grid workflow systems. In: Proceedings of workshop on workflows in support of large-scale science (WORKS’11), Seattle. doi:10.1145/2110497.2110509.
Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-Science. SIGMOD Record 34(3):31–36
Ikeda R, Park H, Widom J (2011) Provenance for generalized map and reduce workflows. In: Proceedings of CIDR’2011, pp 273–283.
Crawl D, Wang J, Altintas I (2011) Provenance for mapreduce-based data-intensive workflows. In: Proceedings of the 6th workshop on workflows in support of large-scale science (WORKS11) at supercomputing 2011 (SC2011) conference, pp 21–29.
Muniswamy-Reddy K, Macko P, Seltzer M (2010) Provenance for the cloud. In: Proceedings of the 8th conference on file and storage technologies (FAST’10), The USENIX Association.
Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: Grid computing environments workshop, 2008 (GCE’08), pp 1–10.
Bell G, Hey T, Szalay A (2009) Beyond the data deluge. Science 323(5919):1297–1298. doi:10.1126/science.1170411
Chang W-L, Vasilakos AV (2014) Molecular Computing: Towards A Novel Computing Architecture for Complex Problem Solving. Springer, March 2014 (Book in Big Data Series).
Illumina Company, HiSeqTM Sequencing Systems. http://www.illumina.com/documents/systems/hiseq/datasheet_hiseq_systems.pdf
Wang J, Crawl D, Altintas I, Li W (2014) Big data applications using workflows for data parallel computing. IEEE Comput Sci Eng.
Dean J, Ghemawat S, Mapreduce S (2008) Simplified data processing on large clusters. Commun ACM 51(1):107–113
Moretti C, Bui H, Hollingsworth K, Rich B, Flynn P, Thain D (2010) All-pairs: an abstraction for data-intensive computing on campus Grids. IEEE Trans Parallel Distrib Syst 21:33–46
Gu Y, Grossman R (2009) Sector and sphere: the design and implementation of a high performance data Cloud. Philos Trans R Soc A 367(1897):2429–2445
Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, 2nd edn. MIT Press, Cambridge, Scientific and Engineering Computation Series
Chapman B, Jost G, van der Pas R, Kuck D (2007) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge
Schatz M (2009) Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25(11):1363–1369
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Searching for snps with Cloud computing. Genome Biol 10(134)
Kalyanaraman A, Cannon WR, Latt B, Baxter DJ (2011) MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics, Advance online access. doi:10.1093/bioinformatics/btr523
Dahiphale D, Karve R, Vasilakos AV, Liu H, Yu Z, Chhajer A, Wang J, Wang C (2014) An advanced mapreduce:cloud mapreduce, enhancements and applications. IEEE Trans Netw Serv Manag 11(1):101–115
Wang J, Crawl D, Altintas I (2009) Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems. In: Proceedings of the 4th workshop on workflows in support of large-scale science (WORKS09) at supercomputing 2009 (SC2009) conference. ACM, ISBN 978-1-60558-717-2.
Zhang C, Sterck HD (2009) CloudWF: a computational workflow system for clouds based on hadoop. In: Proceedings of the 1st international conference on cloud computing (CloudCom 2009).
Fei X, Lu S, Lin C (2009) A mapreduce-enabled scientific workflow composition framework. In: Proceedings of 2009 IEEE international conference on web services (ICWS 2009), pp 663–670.
Olston C, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VBN, Sankarasubramanian V, Seth S, Tian C, ZiCornell T, Wang X (2011) Nova: continuous pig/hadoop workflows. ACM SIGMOD 2011 international conference on management of data (Industrial Track), Athens.
Mateescu G, Gentzsch W, Ribbens CJ (2011) Hybrid computing–where HPC meets grid and cloud computing. Future Gener Comput Syst 27(5):440–453, ISSN 0167–739X. doi:10.1016/j.future.2010.11.003.
Parashar M, AbdelBaky M, Rodero I, Devarakonda A (2013) Cloud paradigms and practices for computational and data-enabled science and engineering. Comput Sci Eng 15:10–18. doi:10.1109/MCSE.2013.49
Basney J, Gaynor J (2011) An oauth service for issuing certificates to science gateways for teragrid users. TeraGrid ‘11, Salt Lake City.
Pearlman J, Craglia M, Bertrand F, Nativi S, Gaigalas G, Dubois G, Niemeyer S, Fritz S (2011) EuroGEOSS: an interdisciplinary approach to research and applications for forestry, biodiversity and drought. http://www.eurogeoss.eu/Documents/publications%20-%20papers/2011%2034ISRSE%20EuroGEOSS%20Pearlman%20et%20al.pdf
Baker CJO, Cheung K-H (eds) (2006) Semantic Web: Revolutionizing knowledge discovery in the life sciences.
Berners-Lee T (2009) Linked data–design issues, W3C. http://www.w3.org/DesignIssues/LinkedData.html
Shaon A, Woolf A, Crompton S, Boczek R, Rogers W, Jackson M (2011) An open source linked data framework for publishing environmental data under the UK location strategy, Terra Cognita workshop, the ISWIC 2011 conference. http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/Terra/paper6.pdf
Shaon A, Callaghan S, Lawrence B, Matthews B, Osborn T, Harpham C (2011) Opening up climate research : a linked data approach to publishing data provenance, 7th international digital curation conference (DCC11), Bristol. http://epubs.stfc.ac.uk/work-details?w=60958
Callaghan S, Pepler S, Hewer F, Hardaker P, Gadian A (2009) How to publish data using overlay journals: the OJIMS project, Publication: Ariadne Issue 61, Originating URL: http://www.ariadne.ac.uk/issue61/callaghan-et-al/. Last modified: Thursday, 19-Nov-2009 10:59:06 UTC
Callaghan S, Hewer F, Pepler S, Hardaker P, Gadian A (2009) Overlay journals and data publishing in the meteorological sciences, Publication Date: 30-July-2009 Publication: Ariadne Issue 60 Originating. http://www.ariadne.ac.uk/issue60/callaghan-et-al/ File last modified: Thursday, 30-Jul-2009 15:46:43 UTC
Lawrence B, Pepler S, Jones C, Matthews B, Callaghan S (2011) Citation and peer review of data: moving towards formal data publication. Int J Digital Curation 6(2):2011. http://www.ijdc.net/index.php/ijdc/article/view/181/265
Bechhofer S, Ainsworth J, Bhagat J, Buchan I, Couch P, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Goble C, Michaelides D, Missier P, Owen S, Newman D, De Roure S, Sufi S (2010) Why linked data is not enough for scientists. In: Proceedings of the 6th IEEE e-Science conference, Brisbane.
Zhao J, Goble C, Stevens R (2004) Semantic web applications to e-Science in silico experiments. In: Proceedings of the 13th international World Wide Web conference on alternate track papers and posters. http://www.iw3c2.org/WWW2004/docs/2p284.pdf
Sauermann L, Cyganiak R (2008) Cool URIs for the Semantic Web. W3C Interest Group Note. http://www.w3.org/TR/cooluris/
Haase P, Schmidt M, Schwarte A (2011) The information workbench as a self-service platform for linked data applications. In: Proceedings of the second international workshop on consuming linked data (COLD2011), Bonn. http://ceur-ws.org/Vol-782/HaaseEtAl_COLD2011.pdf
Earl T (2011) SOA, cloud computing and semantic web technology: understanding how they can work together. 3rd annual SOA and semantic technology symposium, 2011. http://www.afei.org/events/1a03/documents/daytwo_keypm_erl.pdf
Foster I, Kesselman C (eds) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann, ISBN 1-55860-475-8
Fitzgerald S (2003) Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE international symposium on high performance distributed computing.
Laure E, Fisher SM, Frohner A, Grandi C, Kunszt P (2006) Programming the grid with gLite. Comput Methods Sci Technol 12(1):33–45
Romberg M (2002) The UNICORE grid infrastructure. J Sci Program Arch 10(2). IOS Press Amsterdam.
Risch M, Altmann J, Guo L, Fleming A, Courcoubetis C (2009) The GridEcon platform: a business scenario testbed for commercial cloud services. In: Grid economics and business models. LNCS, vol 5745/2009. Springer, Berlin.
Toni F, Morge M et al. (2008) The ArguGrid platform: an overview. In: Grid economics and business models. LNCS, vol 5206/2008. Springer, Berlin.
Wei G, Vasilakos AV, Zheng Y, Xiong N (2010) A game-theoretic method of fair resource allocation for cloud computing services. J Supercomput 54(2):252–269
Dustdar S, Guo Y, Satzger B, Truong HL (2011) Principles of elastic processes. IEEE Internet Comput 15(5):66–71
Guo L, Guo Y, Tian X (2010) IC cloud: a design space for composable cloud computing. In: Proceedings of IEEE cloud computing, Miami.
Duan Q, Yan Y, Vasilakos AV (2012) A Survey on Service-Oriented Network Virtualization Toward Convergence of Networking and Cloud Computing. Network and Service Management, IEEE Transactions, 9(4):373–392, 10 Dec 2012.
Xu F, Liu F, Jin H, Vasilakos AV (2014) Managing Performance Overhead of Virtual Machines in Cloud Computing: A Survey, State of the Art, and Future Directions. Proceedings of the IEEE, 102(1):11–31, 17 Dec 2013.
Wang J, Korambath P, Altintas I (2011) A physical and virtual compute cluster resource load balancing approach to data-parallel scientific workflow scheduling. In: Proceedings of IEEE 2011 fifth international workshop on scientific workflows (SWF 2011), at 2011 congress on services (Services 2011), pp 212–215.
Chadwick K et al. (2012) FermiGrid and FermiCloud update. International symposium on grids and clouds 2012 (ISGC 2012), Taipei.
Schaffer HE, Averitt SF, Hoit MI, Peeler A, Sills ED, Vouk MA (2009) NCSU’s virtual computing lab: a Cloud computing solution. Computer 42(7):94–97
Berriman GB, Deelman E, Juve G, Rynge M, Vöckler JS (1983) The application of cloud computing to scientific workflows: a study of cost and performance. Philos Trans R Soc A Math Phys Eng Sci 371:2013
Mell P, Grance T (2009) The NIST definition of cloud computing. http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
EMC Report (2008) The diverse and exploding digital universe, IDC White Paper. http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf
Jensen J, Downing R, Waddington S, Hedges M, Zhang J, Knight G (2011) Kindura–federating data clouds for archiving. In: Proceedings of international symposium on grids and clouds.
Hedges M, Hasan A. Blanke T (2007) Management and preservation of research data with iRODS. In: Proceedings of the ACM first workshop on CyberInfrastructure: information management in e-Science. doi:10.1145/1317353.1317358.
Moore RW, Wan M, Rajasekar A (2005) Storage resource broker; generic software infrastructure for managing globally distributed data. In: Proceedings of local to global data interoperability–challenges and technologies, Sardinia. doi:10.1109/LGDI.2005.1612467.
Chine K (2010) Open science in the cloud: towards a universal platform for scientific and statistical computing, handbook of cloud computing, part 4, pp 453–474.
Vogels W (2009) Eventually consistent. Commun ACM 52:40. doi:10.1145/1435417.1435432
Schatz MC, Langmead B, Salzberg SL (2010 July) Cloud computing and the DNA data race. Nat Biotechnol 28(7):691–693
EMC Report: managing information storage: trends 2011–2012. http://www.emc.com/collateral/emc-perspective/h2159-managing-storage-ep.pdf
Excel DataScope, Microsoft Research. http://research.microsoft.com/en-us/projects/exceldatascope
Greenwood D, Khajeh-Hosseini A, Smith J, Sommerville I (2012) The cloud adoption toolkit: addressing the challenges of cloud adoption in enterprise. http://arxiv.org/pdf/1008.1900
Loutas N, Peristeras V, Bouras T, Kamateri E, Zeginis D, Tarabanis K (2010) Towards a reference architecture for semantically interoperable clouds. 2010 IEEE second international conference on cloud computing technology and science, pp 143–150.
Andreozzi S, Burke S, Ehm F, Field L, Galang G, Konya B, Litmaath M, Millar P, Navarro JP (2009) GLUE Specification v. 2.0 (ANL).
Ruiz-Alvarez A, Humphrey M (2011) A model and decision procedure for data storage in Cloud computing. ScienceCloud’11, San Jose.
EPSRC Policy Framework on Research Data (2011). http://www.legislation.gov.uk/ukpga/2000/36/contents
NERC Data Policy (2011). http://www.nerc.ac.uk/research/sites/data/policy.asp
Nair SK, Porwal S, Dimitrakos T, Ferrer AJ, Tordsson J, Sharif T, Sheridan C, Rajarajan M, Khan AU (2010) Towards secure cloud bursting, brokerage and aggregation, 2010 eighth IEEE European conference on web services, pp 190–196. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5693261
Wang C, Wang Q, Ren K, Lou W (2010) Privacy-preserving public auditing for data storage security in cloud computing. In: INFOCOM, 2010 proceedings IEEE. doi:10.1109/INFCOM.2010.5462173.
Yang X, Blower JD, Bastin L, Lush V, Zabala A, Maso J, Cornford D, Diaz P, Lumsden J (2012) An integrated view of data quality in earth observation. Philos Trans R Soc A. doi:10.1098/rsta.2012.0072
Wei L, Zhu H, Cao Z, Jia W, Vasilakos AV (2010) SecCloud: Bridging Secure Storage and Computation in Cloud. Distributed Computing Systems Workshops (ICDCSW), 2010 IEEE 30th International Conference, IEEE, Genova, 21–25 June 2010.
Wei L, Zhu H, Cao Z, Dong X, Jia W, Chen Y, Vasilakos AV (2014) Security and privacy for storage and computation in cloud computing. Inf Sci 258:371–386
Bose R, Frew J (2005) Lineage retrieval for scientific data processing: a survey. ACM Comput Surv 37(1):1–28
Muniswamy-Reddy K-K, Braun U, Holland DA, Macko P, Maclean D, Margo D, Seltzer M, Smogor R (2009) Layering in provenance systems. In: Proc of the USENIX Technical Conf. USENIX Association, pp 129–142.
Muniswamy-Reddy K-K, Macko P, Seltzer MI (2009) Making a cloud provenance-aware. In: Cheney J (ed) First workshop on the theory and practice of provenance. USENIX, San Francisco
Ahmed W, Wu YW (2013) A survey on reliability in distributed systems. J Comput Syst Sci 79(8):1243–1255. doi:10.1016/j.jcss.2013.02.006
Dai YS, Yang B, Dongarra J, Zhang G (2009) Cloud service reliability: modeling and analysis. In: PRDC. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.143&rep=rep1&type=pdf
Rellermeyer JS, Bagchi S (2012) Dependability as a cloud service–a modular approach. In: Dependable systems and networks workshops (DSN-W), 2012 IEEE/IFIP 42nd international conference. doi:10.1109/DSNW.2012.6264688.
Berners-Lee T, Fielding R, Masinter L (2005) Uniform resource identifiers (URI): generic syntax. Internet Engineering Task Force (IETF) Request for Comments (RFC) 3986. http://www.ietf.org/rfc/rfc3986.txt
Sollins K, Masinter L (1994) Functional requirements for uniform resource names. Internet Engineering Task Force (IETF) Request for Comments (RFC) 1737. http://tools.ietf.org/html/rfc1737
Paskin N (2010) Digital object identifier (DOI) system. Encyclopaedia of library and information sciences, 3rd edn, pp 1586–1592 (ISBN: 978-0-8493-9712-7). http://www.doi.org/overview/DOI_article_ELIS3.pdf
Bizer C, Heath T, Berners-Lee T (2009) Linked data–the story so far. Int J Semantic Web Inf Syst 5(3):1–22
Delbru R, Campinas S, Tummarello G (2011) Searching web data: an entity retrieval and high-performance indexing model. J Web Semantics.
Rochwerger B, Breitgand D, Levy E, Galis A, Nagin K, Llorente IM, Montero R, Wolfsthal Y, Elmroth E, Caceres J, Ben-Yehuda M, Emmerich W, Gala F (2009) The reservoir model and architecture for open federated Cloud computing. IBM J Res Dev 53(4):1–11
Plank G, Burton RAB et al (2009) Generation of histo-anatomically representative models of the individual heart: tools and application. Philos Trans R Soc A 367(1896):2257–2292. doi:10.1098/rsta.2009.0056
He Q, Zhou S, Kobler B, Duffy D, McGlynn T (2010) Case study for running HPC applications in public clouds. In: Proceedings of the 19th ACM Lting. ACM, pp 395–401.
Bientinesi P, Iakymchuk R, Napper J (2010) HPC on competitive cloud resources. In: Handbook of cloud computing. Springer, pp 493–516.
Vouk MA, Sills E, Dreher P (2010) Integration of high-performance computing into cloud computing services. Handbook of cloud computing. Springer, US, pp 255–276
Kindura, JISC FSD Programme case study. http://jiscinfonetcasestudies.pbworks.com/w/page/45197715/Kindura
Acknowledgments
We thank the anonymous reviewers for their constructive and insightful suggestions. Professor Michael Wilson of STFC suddenly passed away during the preparation of this paper. He was closely involved with its drafting, and we are indebted to his ideas and insights.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, X., Wallom, D., Waddington, S. et al. Cloud computing in e-Science: research challenges and opportunities. J Supercomput 70, 408–464 (2014). https://doi.org/10.1007/s11227-014-1251-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1251-5