Abstract
Ocean observation plays an essential role in ocean exploration. Ocean science is entering into big data era with the exponentially growth of information technology and advances in ocean observatories. Ocean observatories are collections of platforms capable of carrying sensors to sample the ocean over appropriate spatiotemporal scales. Data collected by these platforms help answer a range of fundamental and applied research questions. Many countries are spending considerable amount of resources on ocean observing programs for various purposes. Given the huge volume, diverse types, sustained measurement, and potential uses of ocean observing data, it is a typical kind of big data, namely marine big data. The traditional data-centric infrastructure is insufficient to deal with new challenges arising in ocean science. New distributed, large-scale modern infrastructure backbone is urgently required. This paper discusses some possible strategies to solve marine big data challenges in the phases of data storage, data computing, and analysis. Some applications in physics, chemistry, geology, and biology illustrate the significant uses of marine big data. Finally, we highlight some challenges and key issues in marine big data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Schofield O, Glenn S, Orcutt J, Arrott M, Meisinger M, Gangopadhyay A, Brown W, Signell R, Moline M, Chao Y, Chien S, Thompson D, Balasuriya A, Lermusiaux P, Oliver M (2010) Automated sensor network to advance ocean science. Eos, Trans Am Geophys Union 91(39):345–346. doi:10.1029/2010EO390001
Chave AD, Arrott M, Farcas C, Farcas E, Krueger I, Meisinger M, Orcutt JA, Vernon FL, Peach C, Schofield O, Kleinert JE (2009) Cyberinfrastructure for the US ocean observatories initiative: enabling interactive observation in the ocean. In: IEEE OCEANS 2009—EUROPE, Bremen, 11–14 May 2009, pp 1–10. doi:10.1109/OCEANSE.2009.5278134
Beyer MA, Laney D (2012) The Importance of ‘Big Data’: a definition. Gartner Inc, Stamford
Farcas C, Meisinger M, Stuebe D, Mueller C, Ampe T, Arrott M, Chave A, Farcas E, Graybeal J, Krueger I, Manning M, Orcutt J, Schofield O, Vernon F(2011) Ocean Observatories Initiative Scientific Data Model. In: IEEE OCEANS 2011, Waikoloa, HI, 19–22 Sept. 2011, pp 1–10
Park K, Nguyen MC, Won H (2015) Web-based collaborative big data analytics on big data as a service platform. In: IEEE Advanced Communication Technology (ICACT), 2015 17th International Conference on, Seoul, 1–3 July 2015, pp 564–567. doi:10.1109/ICACT.2015.7224859
Bellatreche L, Furtado P, Mohania MK (2015) Guest editorial: a special issue in physical design for big data warehousing and mining. Distrib parallel databases 34(3):289–292. doi:10.1007/s10619-015-7182-1
Demchenko Y, Laat Cd, Membrey P (2014) Defining architecture components of the Big Data Ecosystem. In: IEEE Collaboration Technologies and Systems (CTS), International Conference on Minneapolis, MN, 19–23 May 2014, pp 104–112. doi:10.1109/CTS.2014.6867550
Du Y, Wang Z, Huang D, Yu J (2012) Study of migration model based on the massive marine data hybrid cloud storage. In: IEEE Agro-Geoinformatics (Agro-Geoinformatics), First International Conference on, Shanghai, 2–4 Aug. 2012, pp 1–4. doi:10.1109/Agro-Geoinformatics.2012.6311684
Huang D, Zhao D, Wei L, Wang Z, Du Y (2015) Modeling and analysis in marine big data: advances and challenges. Math Probl Eng. doi:10.1155/2015/384742
Yang K, Jia X, Ren K, Xie R, Huang L (2014) Enabling efficient access control with dynamic policy updating for big data in the cloud. In: IEEE INFOCOM, 2014 Proceedings IEEE, Toronto, ON, April 27 2014-May 2 2014, pp 2013–2021. doi:10.1109/INFOCOM.2014.6848142
Schofield O, Glenn SM, Moline MA, Oliver M, Irwin A, Chao Y, Arrott M (2013) Ocean Observatories and Information: building a global ocean observing network. In: Orcutt J (ed) Earth system monitoring: selected entries from the encyclopedia of sustainability science and technology. Springer, New York, pp 319–336. doi:10.1007/978-1-4614-5684-1_14
Siriweera THAS, Paik I, Kumara BTGS, Koswatta KRC (2015) Intelligent Big Data Analysis Architecture Based on Automatic Service Composition. In: IEEE Big Data (BigData Congress), 2015 IEEE International Congress on, New York, NY, June 27 2015–July 2 2015, pp 276–280. doi:10.1109/BigDataCongress.2015.46
Antonia C, Andrei N, María-Jesús G (2011) DAMAR: Information management system for marine data. In: OCEANS, 2011 IEEE—Spain, Santander, 6–9 June 2011. IEEE, pp 1–6. doi:10.1109/Oceans-Spain.2011.6003456
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209. doi:10.1007/s11036-013-0489-0
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. SIGOPS Oper Syst Rev 37(5):29–43. doi:10.1145/1165389.945450
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop Distributed File System. In: IEEE Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, Incline Village, NV, 3–7 May 2010, pp 1–10. doi:10.1109/MSST.2010.5496972
Chaiken R, Jenkins B, Larson P-Å, Ramsey B, Shakib D, Weaver S, Zhou J (2008) SCOPE: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276. doi:10.14778/1454159.1454166
Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in Haystack: Facebook’s photo storage. Paper presented at the Proceedings of the 9th USENIX conference on Operating systems design and implementation, Vancouver
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. SIGOPS Oper Syst Rev 41(6):205–220. doi:10.1145/1323293.1294281
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a Distributed Storage System for Structured Data. ACM Trans Comput Syst 26(2):1–26. doi:10.1145/1365815.1365816
Chodorow K (2013) MongoDB: the definitive guide, 2nd edn. O’Reilly Media, Sebastopol
Murty J (2008) Programming amazon web services: S3, EC2, SQS, FPS, and SimpleDB. O’Reilly Media, Sebastopol
Anderson JC, Lehnardt J, Slater N (2010) CouchDB: The Definitive Guide. O’Reilly Media, Sebastopol
Cho S (2015) Fast memory and storage architectures for the big data era. In: IEEE Solid-State Circuits Conference (A-SSCC), 2015 IEEE Asian, Xiamen, 9–11 Nov. 2015, pp 1–4. doi:10.1109/ASSCC.2015.7387515
Mühlbauer T, Rödiger W, Seilbeck R, Reiser A, Kemper A, Neumann T (2013) Instant loading for main memory databases. Proc VLDB Endow 6(14):1702–1713. doi:10.14778/2556549.2556555
Raynaud T, Haque R, Aït-kaci H (2014) CedCom: A high-performance architecture for Big Data applications. In: IEEE Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on, Doha, 10–13 Nov. 2014, pp 621–632. doi:10.1109/AICCSA.2014.7073257
Ousterhout J, Agrawal P, Erickson D, Kozyrakis C, Leverich J, Mazières D, Mitra S, Narayanan A, Parulkar G, Rosenblum M, Rumble SM, Stratmann E, Stutsman R (2010) The case for RAMClouds: scalable high-performance storage entirely in DRAM. SIGOPS Oper Syst Rev 43(4):92–105. doi:10.1145/1713254.1713276
Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107. doi:10.1109/TKDE.2013.109
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. Paper presented at the Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, Lisbon, Portugal
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. Paper presented at the Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Indianapolis, Indiana, USA, 6–11 June 2010
Moretti C, Bulosan J, Thain D, Flynn PJ (2008) All-pairs: An abstraction for data-intensive cloud computing. In: IEEE Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on Miami, FL, 14–18 April 2008, pp 1–11. doi:10.1109/IPDPS.2008.4536311
Wu XD, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37. doi:10.1007/s10115-007-0114-2
Chang EY, Bai H, Zhu K (2009) Parallel algorithms for mining large-scale rich-media data. Paper presented at the Proceedings of the 17th ACM international conference on Multimedia, Beijing, China
Leung CK-S, Hayduk Y (2013) Mining frequent patterns from uncertain data with MapReduce for big data analytics. In: 18th International Conference on Database Systems for Advanced Applications, DASFAA 2013, Wuhan, China, 22–25 April 2013. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, pp 440–455. doi:10.1007/978-3-642-37487-6_33
Leung CKS, MacKinnon RK, Jiang F (2014) Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data. In: IEEE Big Data (BigData Congress), 2014 IEEE International Congress on, Anchorage, AK, June 27 2014–July 2 2014, pp 315–322. doi:10.1109/BigData.Congress.2014.53
Xindong W, Shichao Z (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 15(2):353–367. doi:10.1109/TKDE.2003.1185839
Domingos P, Hulten G (2000) Mining high-speed data streams. Paper presented at the Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston
Zhu W, Cui P, Wang Z, Hua G (2015) Multimedia Big Data Computing. IEEE Multimedia 22(3):96-c3. doi:10.1109/MMUL.2015.66
Kantere V A (2014) Holistic Framework for Big Scientific Data Management. In: IEEE Big Data (Big Data Congress), 2014 IEEE International Congress on, Anchorage, AK, June 27 2014–July 2 2014, pp 220–226. doi:10.1109/BigData.Congress.2014.39
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. Paper presented at the Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive—a petabyte scale data warehouse using Hadoop. In: IEEE Data Engineering (ICDE), 2010 IEEE 26th International Conference on Long Beach, CA, 1–6 March 2010 , pp 996–1005. doi:10.1109/ICDE.2010.5447738
Das S, Sismanis Y, Beyer KS, Gemulla R, Haas PJ, McPherson J (2010) Ricardo: integrating R and Hadoop. Paper presented at the Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Indianapolis, Indiana, USA
Wegener D, Mock M, Adranale D, Wrobel S (2009) Toolkit-Based High-Performance Data Mining of Large Data on MapReduce Clusters. In: IEEE Data Mining Workshops, 2009. ICDMW ‘09. IEEE International Conference on Miami, FL 6 Dec. 2009, pp 296–301. doi:10.1109/ICDMW.2009.34
Lin YC, Wu C-W, Tseng VS Mining high utility itemsets in big data. In: 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19–22, 2015 2015. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, pp 649-–661. doi:10.1007/978-3-319-18032-8_51
Palumbi SR, Sandifer PA, Allan JD, Beck MW, Fautin DG, Fogarty MJ, Halpern BS, Incze LS, Leong J-A, Norse E, Stachowicz JJ, Wall DH (2009) Managing for ocean biodiversity to sustain marine ecosystem services. Front Ecol Environ 7(4):204–211. doi:10.1890/070135
Acknowledgements
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61572448, 61379127, 61673357, and by the Shandong Provincial Natural Science Foundation, China under Grant No. ZR2014JL043.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Y., Qiu, M., Liu, C. et al. Big data challenges in ocean observation: a survey. Pers Ubiquit Comput 21, 55–65 (2017). https://doi.org/10.1007/s00779-016-0980-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-016-0980-2